|
|
||||||||||||||||||
|
|
||||||||||||||||||
![]() |
![]() |
Issue 10 - Revision 3 / March 25, 2005
|
|||
|
Localization - of Applications (Part II) - - - - - - - - - - - - By Milos Prudek | February 13, 2005 Defining localization In this series of articles on localization, I would like to show how collaboration between software developers, UI designers and translators is essential for successful localization, and how (and why) you should go about organizing such a team. The present (second) article covers pitfalls related to automatically constructed sentences, screen design and sentence length, and we also touch upon the business side of translation. Calculated sentencesThe table at the end of the previous installment introduced a popular programming technique that I call "calculated sentences“: these are sentences (or sentence forms) that consist of some fixed, static text, and one of more variables. As a software developer, you have been trained to algorithmize everything that seems to lend itself to algorithmization. You do this to save work, both for yourself and your client. It's a noble goal, no question about that. The problem is that when you construct text sentences, an algorithm that works linguistically in your language will probably fail in most other languages. Even if you check that your procedure works for, say, English, Spanish and German, it may fail for Slavic languages, or some other language group, or a specific language. In my experience (for translation from English or German to Czech or another Slavic language) the likeliness of failure is actually well over 70% . Some structures look like very enticing, attractive subjects for "calculated sentence“ algorithmization. Look at the following example:
You (as the creator of the software) do not know the number of tables, it could be 1, 2, or 2034 tables: that is the advantage of the use of variables, you don't need to know the number of tables to write the program. Perhaps you know that foreign languages create plurals in different ways than, say, English. Perhaps you have also been told that for the target language there are about 12 ways of creating a plural, depending on the noun that is being counted (i.e. the syntactic form for building the plural of "table" can be different from that for "chair" in the target language, whereas it is the same in English – addition of –s). Note: It's generally thought that English is simple here. In the overwhelming number of cases the plural is formed by addition of –s, but there are common exceptions - for example, for some animals: mouse-mice and louse-lice (but not house- hice, so one couldn't form a general rule: a noun ending in –ouse has –ice as a plural), for others the plural is identical with the singular: fish, deer, moose. Then there are words that take Latin or Greek plurals and words ending in silibants (s sounds) take –es not –s. Other languages (German, for example) are far more complicated here. So you ask the translator to provide a translation of "table“ and a translation for "tables“ and you get "stul“ and "stoly“ (that's in Czech). Then you can write the following code:
if x = 1
This results in the following output:
1 stul
Now let's say that, unfortunately, you've never heard that your target language actually has not one, but two different plural forms for any specific noun, such as "table". One plural form is used when there are 2, 3 or 4 items. The other plural form is used when there are 5 or more items. For Czech, the other plural form is "stolu“. So now you have a program that works for numbers 1-4, and fails for any other number.
If you extrapolate this, you can have an unlimited number of plural forms for a single word. Still, you must provide the output " Calculated sentences described in the previous chapter are intended to save translation costs. I've shown that they can bring disastrous results. Does it mean that you have to pay through the nose if you need translation of 30 sentences that only differ in one word? Not at all. For more than a decade, Computer Aided Translation software has been widely used by the translation community. CAT software is not machine translation; it's merely a database of the whole text of the translation, which is indexed and automatically searched. Let's say you have a sentence like this one:
To open the Properties window, please click the Properties button.
Translated into a foreign language, it would look symbolically like this:
aa bbbb ccc DDDDDD eeeeee, ffffff gggg hhh KKKKKKKK jjjjj.
A few paragraphs later, the translator encounters a very similar sentence:
To open the TASKS window, please click the TASKS button.
The CAT software will automatically step in, find the closest sentence, and present it to the translator as a possible translation:
aa bbbb ccc DDDDDD eeeeee, ffffff gggg hhh KKKKKKKK jjjjj.
The software will even highlight "DDDDDD“ and "KKKKKKKK“ to tell the translator that these words are the only ones that need changing. To change the sentence offered by the CAT software to the translation of the new sentence, only a fraction of the time required for normal translation is needed. Consequently, translators offer a discount if there is a significant number of similar sentences (more than 10%) in the whole translation job. Generally, you will be offered: 60%-80% discount for sentences with similarity rate 99-95%, 30-60% discount for sentences with similarity rate 94-50%, and no discount for sentences with similarity rate below 50%. Discounts lag behind similarity rates simply because not only the words that are different must be translated, but also the surrounding words may need to be edited due to inflectional and similar issues, and it takes quite some time to do the editing. So much time in fact that if there is less than 50% similarity it is actually better to translate from scratch, i.e. not using the sentence suggested by CAT software. Some sentences in your source may be identical; this is called 100% match in CAT terminology. You should not expect to get such repeated sentences translated for free. Be ready to pay about 10% of the normal price for the translation of totally identical sentences, because the translator will then check whether the translation should really be identical – which is not always the case. Moreover, if you change translators in the middle of the job – a perfectly reasonable thing to do since the CAT translation memory will guarantee vocabulary consistency – and your former translator translated some sentences incorrectly and your new translator recognizes this when they show up as proposals for a 100% match, he may be tempted not to correct the mistakes if he is not paid for a 100% match. Vocabulary consistency is another important point, particularly in technical translations – it's not infrequent for a single translator to use different translations for the same term in the course of a translation: for example, the term in German "symmetrisches Paar" can be translated either as "symmetric pair" or "balanced pair" with the same meaning and it's easy to forget which term one has already used, if one is familiar with both. It's better to remain with one term throughout a text, especially since the creator(s) of a text often have a preference for one term over an equivalent one. CAT software solves this problem by letting the translator create a vocabulary list that is specific for the given source text. Such a vocabulary list contains pairs like "symmetrisches Paar" = "balanced pair". Whenever the translator (and the CAT software which is silently watching in the background) encounters "symmetrisches Paar" anywhere in a sentence, CAT steps in and highlights "symmetrisches Paar" in the source text, and it also politely offers "balanced pair" as a translation. The translator can press a hotkey to paste the "balanced pair"; in other words, he does not have to retype it. The CAT software intervention is very unobtrusive and the translator can always turn down the software suggestion by simply ignoring it - he does not have to click "Cancel"... CAT tools bring the additional benefit that I touched upon in the previous paragraph: if the translator delivers the flat file translation database (so called "translation memory") with the resulting translation text, you will be able to change translators on the fly, or add more workpower to the translation team, without compromising the consistency of the translation vocabulary. To sum up, CAT provides most of the savings that you would have achieved by using "calculated sentences“ without compromising translation quality. Allocated space and line feedsYou cannot predict the length of a translated text if you do not know the language yourself and even if you do know the language a precise prediction is not easy. . If translation to several languages is needed, even an informed guess is unlikely, because the length will be different in each language. These differences are sometimes unexpectedly large – two or three times the length of the original text (or correspondingly shorter, depending on which direction one is going in). It's surprising how many localization projects are submitted with no regard for this. The translator can often abbreviate his translation if the length requirements are conveyed to him – which rarely happens, however. Occasionally, abbreviation may be impossible: your two word expression with a text-cell length of 9 characters may need to be translated into four words, and no one can abbreviate four words in the space of 9 characters. You should be ready to change your layout if there is no other linguistic solution. The source text sometimes contains line feeds (hard line-breaks). This can be accommodated in the translation, but bear in mind that the word order will almost certainly be different; furthermore, each line must provide sufficient space. What you originally wrote like this:
xxx xxxxx xxxxx
xxxxxxxxx xx
xxx xxxxx xxxxxx xxxx xxx.
… might wind up looking like this when translated (and you might not like the new appearance, or the text might run over a figure):
xx xxxxxxxxxxxxxxxxxxxxx
xxx xxxx xxxx xxxxx xxxxx
xxx.
Symbols and proper names Symbols or pictograms are widely believed to be internationally comprehensible. This is true provided that you think carefully when choosing them. You can choose a misleading symbol. For instance, it's a safe bet to use a question mark symbol for anything related to "question“, although I imagine that there are languages which use something else than a question mark to mark question sentences. On the other hand, you can act in good faith and still choose incorrectly. Some pictures could have no meaning abroad, because the thing depicted does not exist there. Want an example? A picture of a piggy bank is clear in a Euro-American context, but do all kids throughout the world save money inside a pig? Might this even be insulting to a Muslim, for instance? Could you use a dollar note to illustrate money? Probably, since the "$“ symbol is internationally recognizable, but it could also be interpreted to mean "American Dollars“ or even "Foreign Exchange“ instead of simply "money“. Try to remove proper names (company brand names) that are used in your source language to represent a class of products. For example, in American English, Band-Aid is used to refer to adhesive patches for covering wounds in general, yet it is a brand name virtually unknown in most countries of the world. Chained translationIt is often cheaper to translate from English than from any other language, simply because in any language there are more translators who know English than who know any other language. This can tempt you to chain translations and lead to a disaster. A practical example will clarify this: Suppose your source language is Finnish, and you need translations to English and German, but also to Arabic, Chinese, Japanese etc. It is either impossible to find a Finnish/Chinese translator, or if you can find such translator but he is prohibitively expensive. Translators from English to Chinese are, on the other hand, plentiful and much cheaper. Therefore you decide to cut costs by first translating the whole text into English, and then using English as the source text. You have all the background information (contextual remarks) in your source text, in Finnish. Your Finnish-English translator uses these remarks to create a perfect translation to English, but he does not translate the background information. Will you get good results from the translators who then use the perfect English translation? Of course not. This is obvious once you realize that you just transported the problem we've been discussing here (lack of background information) "one language up". Even if the Finnish-English translator translates all the background information, it is still possible that all other translations will have somewhat substandard quality or errors, unless you make sure that the source translation (the Finnish-English translation) is of the utmost quality, triple-checked and proofread and based on consultation with the software developers. ConclusionThe examples above are just that: examples of what can go wrong with translations based on seemingly innocuous input. I tried to list everything that I have ever faced in my work as a translator and localizer. But I only know three languages well and have a passing knowledge of two others, all of which are European languages. What it boils down to is that you should constantly talk to your translators to help them deliver the best possible result, and you should budget both funds and time for that. One approach to achieve a high-quality localization would be to provide a detailed context for each sentence. The amount of this context may and most likely will exceed the amount of translated text. You should offer to pay a premium to translators for studying this context. Furthermore, even if you provide contextual information for all messages, you should also provide all screens and windows that appear in your program or website as a reference. Sadly, quite a few translation agencies and freelance translators market localization but when you provide them with insufficient context, they do not have the guts to ask you for more material, perhaps fearing that you might consider them incompetent. One fundamental problem here is that usually the authors of texts are simply unaware of the difficulties involved with translation. Or even worse, knowing that the result will come to haunt you later on the translators find it more profitable to sell a quick and dirty translation and hope that you will come months later to have it translated again and properly, for much more money. And one last, often overlooked, fact: translations tend to uncover errors in the original text - typos, terminology inconsistencies, and even text that is confused or difficult (or impossible) to comprehend. Most translators will deal with these kinds of problems without telling you for fear of hurting your feelings and consequently losing any chance of getting more jobs. This is not good for you or your customers. The translators will correct typos, but instead of correcting terminology inconsistencies and unclear text, they are likely to transport these problems into the translation. Deep down inside they are bothered by having to translate a low quality text into a low quality result, and they will be happy to tell you about these issues in your text – probably at no extra charge. If you are smart, you will tell them unambiguously that you want all the feedback they can provide. Opening all communication channels is the key to successful localization.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZopeMag is committed to bringing you the best in Zope Documentation. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
![]() |
Reproduction of material from any of ZopeMag's pages without prior written permission is strictly prohibited. Copyright 2003 - 2005 ZopeMag |
|