Saturday, September 04, 2004

How to enter Vietnamese inside Blogger (or anywhere else)

Most of the previous postings have had some Vietnamese characters somewhere inside them. How hard is it to do this? It's piss easy. Firstly, let's get rid of one common misconception.

One of the misconceptions about Vietnamese is that you need special fonts to display it. We are talking about proprietary fonts as provided by the VNI Corporation, as opposed to preinstalled fonts like Times New Roman on Windows. This misconception may have been true in the mid 90s, but it isn't true in the 00s. Windows fonts such Times New Roman, Arial and Courier New (among others) have supported Vietnamese since Win98and Win2K. I'm happy with them.

(I cannot speak for other platforms such as Linux or MacOS. I assume that MacOS would have better support for languages than Linux - given then centralization of its development.) Still, entering Vietnamese is not as easy as entering English, where pretty much all the characters you want are on the keyboard. It is a little harder than entering German (with umlaut characters such as ü) or French (with é and è), and other European languages. But it becomes easy with practice. All you have to understand is the diacritics that this weird and wonderful language uses in much abundance.

Vietnamese Characters

Many languages use diacritics - additional marks above or below letters. In the last paragraph, we saw the umlaut (two dots above the letter), the acute accent (a raising slash above the letter) and a grave accent (a falling slash below the letter). Diacritics are common with vowels in Latin alphabets, and are sometimes used with consonants as well.

English stands at one extreme: it hardly uses diacritics at all, except with the odd loan word. Vietnamese is at the other extreme - many letters use two diacritics at once. For example, the most common family name in Việt Nam is "Nguyễn". This makes it hard for foreigners to read and hard for many to remember. But there is method in all this madness. Each diacritic you encounter has only one purpose:

  • One set indicates the type of the vowel: beet versus bet, cart versus cut, and so on.
  • Another set indicates the tone: is it raising, falling, dipping or flat? Vietnamese is a tonal language. The writing style (or Quốc Ngữ) has tone built into it.
Let's look at the vowels first. Firstly, there are six vowels without diacritics; "a", "e", "i", "o" and "u" should be familiar to you. In Vietnamese, "y" is also always used as a vowel. Then there are 3 other vowels which use the circumflex "^" diacritic: "â", "ê" and "ô". Another diacritic used is the breve: "ă". Finally, there are two letters which have a pseudo-diacritic hook or "'": "ư" and "ơ". Note that the presence or absence of the hook, the breve and the circumflex says nothing about the tone of the letter. However, they are pronounced differently in the language, and are considered asseparate letters. This is important. Just as important are the tone markers or "dấu". I will provide the Vietnamese names, as they provide their own examples. You have these 5 tones to remember:
  • The acute accent, known in the tongue as "dấu Sắc". This indicates a rising tone.
  • The grave accent or "dấu Huyền". This indicates a shallow drop.
  • The dot below or "dấu Nặng". This is a deep, low drop - the Marianas Trench of tones.
  • Then you have the "question mark" tone, or "dấu Hỏi". Think of a low dip and then a rise.
  • Penultimately, you have the tilde or "dấu Ngã". This is similar to the "dấu Hỏi" except that you make it creaky and tighten in larynx - well, that is if you live in Hà Nội. In Sài Gòn, the Hỏi and the Ngã sound pretty similar. However, you should distinguish them in your writing.

The final tone is its absence: "không dấu" or no tone at all. Here, you keep the vowel flat, and by that I mean flat: no dipping or rising to intonate your emotions! Those vowels absent the 5 tonal markers are assumed to be flat in tone. That doesn't mean the absence of any other diacritics, such as the circumflex.

We must finally finish by mentioning there is an extra consonant in Vietnamese: Đ (lower case đ). This is just mentioned to get this out of the way. Don't confuse this with D and d: they are different letters, and have different sounds.

All of this may seem daunting for the Vietnamese beginner. The total range of vowels is 2 (lower case and upper case) by 6 (for the six tones) by 12 (for the 12 vowels in the language) = 144 possible vowels. Then you've got Đ and đ. How do you enter all these characters? There are two methods, as we shall see.

Entering Vietnamese

Firstly, there's the character map method. That's basically a program that shows you all the character for a given font inside a table. One example is the Character Map (charmap.exe) program inside Windows. Microsoft Office also provides a similar utility from the "Insert Symbol" menu command. The idea is that you click, copy and paste the characters you want to your given program. Here's a screenshot of Character Map in action:

Character Map on WinXP

You can use this if you want to display the odd Vietnamese character inside your file. I advise against it in the long run: it's tedious. After 10 point and clicks, you will get tired of the whole activity.

I recommend that you use a Vietnamese keyboard or keyboard driver for the task. Despite their name, they are not hardware: they are small programs that sit in your OS and convert your keystrokes into nice, lovely Vietnamese. And do I have a particular program in mind? Boy howdy, I do: Unikey. I've used it for about a year and a half without complaint. I like it so much that I've shut off rival keyboard drivers running on the same machine. The advantages of it are:

  • It's free. Nice to know, isn't it?
  • It's just a download away: for NT/2000/XP, for 95/98/MEor for Linux.
  • Installation is simple: just unzip it and it is ready to go.
  • It lacks bloat. It's a small program that does what it is does without any unnecessary feature.
  • It sits on the taskbar. This makes it easy to switch between "English" mode and "Vietnamese" mode: just click on the icon on the taskbar.
  • The user interface actually provides for English speakers, which makes it easier to understand.

(Of course, if you aren't happy with Unikey, you could look for other utilities. Look at the Vietnamese Unicode FAQs for more information. But rather than comparing all the utilities, I want one that works for me.)

Setup

When you start up Unikey, you see the following dialogue:

UniKey at Startup in Vietnamese

What does it all mean? Fortunately, you can find out what is happening by clicking on the "Mở rộng" button. "Mở rộng" means expand, and that's what you need to do.

UniKey in Vietnamese - now expanded

See the checkbox with "Vietnamese interface"? Uncheck it. The whole interface will turn into English:

UniKey now in English

That makes it a lot easier to use, doesn't it? Okay, here's what I recommend you do:

  • I recommend you always set the "Character Set" to Unicode - always. A character set is basically how characters like "ư" and "a" are represented as numbers that computers can handle. The Microsoft Office utilities and Blogger are set to handle Unicode by default. Unicode is an international standard, so you can't go much wrong with it. The only exception to this is if you have the misfortune to use one of the old VNI Fonts from years ago. But Unicode - good.
  • The "Input method" is what keystrokes will form a character like "ư". I prefer TELEX, but I will give instructions for using Unikey with VNI and VIQR as well. See the next section for instructions.
  • Advanced options: uncheck them all. Especially uncheck the "Use oa', uy' (instead of o'a, u'y)". This is an irritating preset that doesn't allow you to write "hoà"; instead it alwayscomes out as "hòa". You don't want that.
  • There's also the "Help" button - which provides you "Help" in Vietnamese. If you understand Vietnamese, it's nice to look at. If you don't, it's not of much assistance. Anyway, that's what this document is here for, isn't it?
  • Finally, there's "Auto-run UniKey at boot time". If it's your machine, I see no problem with it. If it's someone else's, then I advise against it.

Then click on "Close". The program will now sit on the taskbar - unobtrusive, yet available. If you see a big "V":

Sitting on the task bar - waiting for Vietnamese...

That means that it is set up to enter Vietnamese. But if you want to enter pure English, just click on the "V" and you will see:

Now it just outputs English, as it has done a million times before...

It's easy to toggle from one to another: left-click on the letter. And if you want to remove the program altogether: right-click on the letter, and on the resulting menu, click "exit".

Okay, now that it is running: what do I do? Reading the next section is a good way to start...

Input Methods

The idea of a keyboard driver is that it makes it easy to enter desired characters using the keyboard you have. UniKey doesn't even assume you have the "Alt" or "Ctrl" buttons. Instead, you press a combination of letters that tend to follow the following order:

  • If you want characters without diacritics, like "a", "b", or "c", then type them.
  • If you want characters with diacritics but no tone markers, then type the combination. For example "dd" in TELEX will create a "đ", and "ow" will create a "ơ".
  • Always add the tone afterwards.

The following table gives the combinations for all the Vietnamese characters in lower case. If you want upper case, then use upper case letters instead. For example, "DD" in TELEX will create "Đ", and so on. Here are the tables:

Desired letterTELEX VNIVIQR
âType "aa"Type "a6"

Type "a^"

ă

Type "aw"

Type "a8"

Type "a("

đ

Type "dd"

Type "d9"

Type "dd"

ê

Type "ee"

Type "e6"

Type "e^"

ô

Type "oo"

Type "o6"

Type "o^"

ơ

Type "ow"

Type "o7"

Type "o+"

ư

Type "w" or "uw"

Type "u7"

Type "u+"

Add a "dấu Sắc"

Type a "s"

Type "1"

Type single quote "'"

Add a "dấu Huyền"

Type a "f"

Type "2"

Type reverse quote "`"

Add a "dấu Hỏi"

Type a "r"

Type "3"

Type "?"

Add a "dấu Ngã"

Type a "x"

Type "4"

Type tilde "~"

Add a "dấu Nặng"

Type a "j"

Type "5"

Type period "."

Remove tone

Type a "z"

Type "0"

Type "0"

To understand this, I will provide some examples:

To TypeTELEXVNIVIQR
Hai Bà TrưngType "Hai Baf Trwng""Hai Ba1 Tru7ng""Hai Ba` Tru+ng"
Tiếng ViệtType "Tieesng Vieejt"Type "Tie61ng Vie65t"Type "Tie^'ng Vie^.t"
ĐƯỜNGType "DDWOWFNG"Type "D9U7O72NG"Type "DDU+O+`NG"

Yes, it all seems a little tedious to learn. So choose one of the methods, and practice. I admit you may need a good motivation to do this. My motivations were (a) learning Vietnamese, and (b) retyping the names of Vietnamese students that had been provided sans diacritics.

Conclusion

What I've tried to do her is set up a tutorial for those unfamiliar with Vietnamese, and also unfamiliar with computers. Alot of this was learnt from consulting the original Vietnamese documentation, and also a lot of practice. Now if you are interested, practice as well. You may still encounter difficulties. For example:

  • You are trying to enter Vietnamese in a font that does not have Vietnamese characters. For example, fonts like "Georgia" and "Garamond" do not support them. That's a shame. For the time being, stick with "Arial", "Times New Roman" and "Courier New". There are others.
  • You are trying to enter Vietnamese in a pre-UNICODE "Vietnamese" font like VNI-Times. The result looks like poo. One way around it to set the "character set" to "VNI". However, I'd recommend against it, unless (a) you are printing it, or (b) you know the people you are sending the document to also have aVNI-font installed.
  • There's one problem that I've had with Excel. You enter a Vietnamese word in a cell. You try to enter another word in another cell. Then the "Auto-complete" feature tries to guess what you are entering, and make a mess of it. This has happened to me a few times. I suggest you turn "Auto-complete" off.
  • Finally, the program you are using doesn't support UNICODE at all, and cannot even understand what you are typing. For example, the main interface for the popular editor HTML-Kit cannot handle it.

But if you have a reason to learn Vietnamese, and if you are determined: go for it. I wish you the joy of discovery!

All mistakes in this document are mine.