Friday, May 20, 2005

How to enter IPA inside Blogger (or anywhere else)

I use the International Phonetic Alphabet (or IPA) to teach English. It is required, as English has extremely erratic pronunciation. Take how "gh" in "through" is said as opposed to "gh" in "though" as compared to "gh" in "ghoul". Then you have the two "th"s as displayed in "this" and "then" - they're different in utterance. You can say them, but learners may not be able to hear the difference. So us teachers supplement this by using IPA as a transcription scheme for sounds. IPA has the underlying maxim of "one sound - one symbol". For example, "cat" is represented as "/kæt/" in IPA, "yes" is "/jes/", and "you" is "ju:".1  That helps English learners: they should be able to "read off" the pronunciation of a word from its phonemic transcription. Actually articulating the words is not always so easy, but learners should now have an idea of what the words sounds like. In theory, IPA to represent almost any sound in any human language, with roughly a hundred symbols for that purpose. In practice, English speakers use about a fifth of them.

So what's the best way of  typing IPA on your computer, such as for a nice Word or HTML handout printed for the students? The user requirements are quite different from Vietnamese, where it makes sense to memorize all the keystrokes for diacritics. Native Vietnamese may be using the program all the time. IPA users are different; they will want to use the program occasionally for handouts and submissions, but not in a letter to mum. So you make it easy for them: make them able to select the characters from a grid and copy them. In fact, you want something like Charmap, Windows's own character map. However, teachers don't want to wade through Cyrillic and Greek to find the symbols. So you cut down the list of available characters interface so that only the IPA ones are available. Even better, you restrict it to the IPA for English. And there are three things I've seen that do this...

For beginners, you could try PhonMap, a free and simple tool from Jan Mulder. It's a simple dialog that presents all the IPA for English on one dialog. You click on the symbols to copy them, and then you paste them somewhere else like Word (using a Paste button for that purpose). It's small, uses little memory, and it is free. For my fellow teachers (who are computer literate, but not overly so), it's a godsend. 

I wish I could like it, but I don't, really. First, PhonMap only uses its own, specialized font. Now that's understandable. The program is a couple of years old, dating from the time when IPA fonts cost money. Jan was pissed off, and made his own. That took a lot of work to do, and he deserves the highest praise for doing it gratis. But now PhonMap's font is unnecessary, with Lucinda Sans Unicode available on pretty much all Windows machines, and others such as SILDoulosUnicodeIPA downloadable for free. I would let users select their own. 

Worse still, PhonMap's font does not do the right thing by Unicode. Basically,Unicode is a convention where each character has a number, and only that number. In making his fonts, Jan Mulder has assigned characters to the wrong numbers. To see how it is a problem, let's consider a teacher at my school. She's made some nice handouts using PhonMap in Word. They look lovely. They contain IPA for phrases such as "These cats" (/ði:z kætz/). The kids are happy. Then she's got a mate in another school who wants a little bit of help in her school. So she sends it over. The problem is that PhonMap is not installed over there. Word tries to do the right thing, and use another font in place, with better conformance to the Unicode standard. The result is that it comes out looking like "¶iÉz k¾ts". Ugly. 

Both the teacher and her mate are intelligent people, with a Bachelor's degree at a minimum. However, they aren't technically minded, and have no idea about Unicode and code points. They see the computer as a tool, not a toy. They know there is a problem, but they may not have time or the knowledge to to fix it. 

(Needless to say, you cannot use PhonMap with Blogger. Web developers should make limited assumptions about their readers, such as use the commonest fonts available. PhonMap would only be used by English teachers, which would make a small readership indeed.) 

The other two applications are web-pages. They show the full IPA - not just the English proportion. You click on it, and by the magic of JavaScript, the characters appear at the bottom or the top in a text box. You have to paste them somewhere else to make use of them. But they are free as well, and both conform to the Unicode convention, and thus can be used with Blogger. There are two that I've seen:

  • First, there's the Linguiste.org IPA keyboard. It's amazing. It loads the whole IPA chart. It's beautiful. Unfortunately, giant picture files take time to load. You may have to reload it a few times for it to work
  • Finally, there's IPACLICK, which uses little buttons instead of pictures. That takes less bandwidth. Unfortunately, it is using some sort of broken, Internet Explorer-only JavaScript. So the characters come out looking as crap. You have to manually set the character set to "utf-8" every time. I don't know why, but it happens. And since the actual application is loaded in a pop up window, it is impossible to set the character set. The menu isn't there, as happens with pop ups. However, if you load the actual IPACLICK application directly, and then set "utf-8", it works. I actually like it, but my fellow teachers would run screaming.

There is no perfect solution. You can use PhonMap as long as you share it only with fellow PhonMap users. You can use the Linguiste IPA keyboard if you've got high bandwidth, but that's not always the case. Or you can use IPACLICK as long as you remember to reset the character set. But none of these are ideal.

What would be great is if Jan Mulder made PhonMap open source. I guess he's too busy to change it, or uninterested at the moment. So why not give it out to the "community"? Personally, I've got the skills in C and C++ to toy around with it, and it seems a simple matter to correct the Unicode character numbers. His font would have to go: it's done its duty, but now it is redundant. I'd add an field so that the author can select the right font suitable for his or her use.

I think Jan's concern is that he wants the application to remain free. He doesn't want someone else to grab the source and then make money off it. I understand. I sympathize. But wouldn't putting an open source license in there prevent this? (I'll email him about this.) As it stands, PhonMap is close to, but not quite, perfect tool for English teacher. It's just not good enough for my use.


1 Not "/yes/". The IPA is international, and "j" is used as a "y"-sound consonant in such languages as German and Finnish. IPA uses "y" instead for the close rounded front vowel, such as the  "ü" in German.