Over the past few years, I’ve participated in localizing quite a few diverse bits of software. This includes desktop applications, content management systems, websites, and even an SMS service.
While gaining that experience, I’ve noticed some patterns and practices that the developers often apply without realizing how harmful their use can be to the localization quality of the end product. In fact, I suspect that pretty often those practices are being thought of as helpful, and not harming… Until it’s a little too late: code has been written and tested, and rewriting it would cost lots of both time and money.
That’s why I’m writing this post: I hope that at least those developers who will read this before starting a new product, will be able to avoid headaches later in the process, when actual localization begins. Basically, main rule is simple: do not assume. Don’t assume English sentence structure, do not assume that every language only has only one plural form, don’t assume every culture uses same date and number formats… However, these phrases alone look quite vague, so let me expand them a little.
Rule #1: Don’t produce localized strings by concatenating. While you may think that this is a good idea, it’s often not. Here’s an example: a few days ago I came upon a translatable string with only one word:
ago
A few lines above this was another translatable string:
hours
The author of the software in question expected that they can simply concatenate the number of hours, word „hours” (or „minutes” or „seconds”) and word „ago” to produce a string. Well, indeed they could – for English. But for Lithuanian it didn’t make much sense. If produced similarly following Lithuanian rules, the English sentence would become „ago 5 hours”… not something you would expect, is it? By the way, this rule should apply to any concatenation whatsoever, even if it’s just a space or colon.
A much better solution would have been to make a translatable string like this:
%d hours ago
Note: there may be cases when you will have no other choice but to concatenate. Just make sure to make everything as flexible as possible in those cases.
Rule #2: Don’t re-use same translated strings in different contexts. This rule is actually related to previous one, but a bit broader. What looks good as a menu item may not look as good when used for a window title. Of course, having to translate only one occurrence of a string instead of 20 may look optimal, but only until you actually find yourself having to translate different instances differently. To make things really optimal, you, the developer, may have to learn to balance what to reuse and what not to (in other words, how to define which contexts are the same and which aren’t). In fact, I just realized I can’t give any definite advice regarding this except perhaps to consult your localizers about what may or may not be a problem.
Rule #3: Implement (and use!) an ability to add comments to translatable strings. Not all strings are self-explanatory. It will be much easier if the localizer will know where and/or how each particular string is being used, especially when it’s non-obvious.
Rule #4: Don’t assume there’s only one plural form of a word. While true for English, it’s a wrong assumption for many other languages. Keep that in mind whenever possible.
Rule #5: Don’t assume same date and number formats and measurements as English. That’s obvious, but still worth mentioning. Probably the easiest way to avoid this problem is to employ respective formatting features of the underlying platform your product runs on.
Rule #6: If your product has images that contain text, these images should also be localizable. While this rule doesn’t apply to product logos, it shouldn’t be forgotten: you do want stuff like that fancy Download button localized. Digging further, it’s sometimes suggested that even simplest icons may need localization in order to carry their intended meaning.
Rule #7: Whenever possible, leave enough space for the strings to grow. English labels and sentences are often much shorter than those in other languages. Keep that in mind when designing your user interface. Note: it’s not always hard. Some platforms can take care about this issue themselves, and you just have to check if everything is fine in translated interface. Similarly, if your platform allows localizing not just texts, but full dialogs, this issue is probably irrelevant to you.
Rule #8: Use Unicode. Everywhere where non-ASCII characters can occur. In fact, this rule is so obvious I wasn’t even going to mention it.
And finally, the Last Rule: Don’t reinvent the wheel. If there is a good localization framework that you can use in your project, use it. While such frameworks can not prevent you from making mistakes, they reduce the chances of doing so. One example of such framework is GNU gettext. While not totally perfect, it has really solid feature base, and its license (LGPL) is quite flexible.
Actually, the last rule could’ve been the first one, but I was afraid you wouldn’t continue reading…
Basically, that’s it! Though I won’t be surprised if I missed something in my list, so feel free to comment and suggest additions. Furthermore, you can find additional inspiration by reading chapter 6 of this book.
I am experiencing these failures on Disq.us translation — thanks for your great Eight Commandments!
Dažniausiai nesilaikau 1, 2, 3 (yra galimybė, bet nenaudojam), iš esmės 4 ir 7 (ir dėl to labai keikiasi lokaliatoriai). Eh.. Koks aš blogas
Dabar kai tai surašiau, būtų pats metas pradėt taisytis. Nemanai?
Great post. Sometimes it’s challenging to chop up copy or to decide when to use contexts. I think l10n is an important, but also really challenging aspect of web development.
Great point about comments. Strings should be well documented just like code, so you can document what %1$s means for strings like „Hello %1$s”.
Hey Austin! I’m totally glad and proud that you liked this post. Feel free to share it with everyone who would benefit from reading it.
Ne taip daug susiję, bet į temą: idėja apie tai, kaip automatizuoti lokalizaciją ne tik vartotojo sąsajos, bet ir turinio : http://www.definitionary.com ir http://concept.wikia.com