Curva Fin Bloque
POSTS 19 MARCH, 2013

Multilingual web is more than translation (1/2)

by Manuel Herranz

It is beyond doubt that the web has become a multilingual. The work, experiences and cross-pollination with other disciplines, from machine translation to localization and semantics, were shared at EU-sponsored Multilingual Web event which took place in Rome during 12-13th March 2013.

Whilst technologies such as machine translation are already well-integrated for fast web page translation, it was reassuring to see that even large web actors, such as Google consider there is plenty of work to do in making the web truly multilingual. The release of ITS 2 and the new features and possibilities that html5 opens made the venue a meeting point for professionals, practitioners and academics dealing with the semantic web, translation, applied machine translation and CMS tool providers.

Google’s experiences were shared by Mark Davis and Vladimir Weinstein and pinpointed translation and localization issues which are often overseen. We already assume that a page can be easily translated for gisting, but smaller issues like plurals  & Gender (Alice added 1 people to his circle) remain unsolved even in the likes of Facebook.

Google got my language settings wrong
Everything’s better with a little sense of humour

Presenting people’ s names in a locale is not so easy as quoting them. Patterns are different and English, for example prefers to add nicks after the given name, where other cultures have a second name, the father’s and the mother’s name. This would be later dealt with by Richard Ishida in a full presentation.

It was encouraging to see that when localizing, Google faces the same hurdles as most translation companies,

  • Different messages to different translators
  • Most translators are not software engineers
  • Most engineers don’t speak 60 languages
  • May not know the gender

Making the web multilingual is not about translating (that may be fine for the content) but about presenting the web in a format and experience that may be user friendly in a culture. Google has gone a long way to present plurals as  numbers written as digits, cardinals (1, 2…) ordinals (1st, 2nd…), converting currencies, and even identifying which is the likely country when you type a phone number like (011) 34345345 (much easier when it is +54 9 98408374). This is used in technologies such as geolocation in Android and it also includes ways of resolving addresses, handling detailed validation for many regions and presenting a layout and basic validation for all regions.

Studying the language trends of Gmail users, Google knows that a fairly large number of Gmail users are multilingual. I would agree with some theories that state that about half of the world’s population knows at least another language, is familiar with it or is bilingual. But, embarrassingly, things can get complicated if one is signing up for a service (let’s say Google+) in 1 language from a location IP where a different language is used. This means that you may get mixed languages if you are a Spanish-speaking user signing up in Youtube whilst in Japan – and that’s personal experience…

Questions google cannot answer
Questions google cannot answer 🙂

Rendering names local
Richard Ishida gave a nice and funny presentation on the classification of names and how presenting and cataloging/field them varies greatly (and the issues this involves) if you come from India (where a “caste” tag would be necessary) to Spanish cultures where people carry both the father and the mother’s surname (although there are exceptions to the order they are presented), and don’t change their surnames when they get married – just as it happens in Chinese. This may look strange if you happen to come from a Northern-European culture.

Russian, where women also adopt the husband’s family names (thus presenting a “search” challenge on the web), things get slightly different as there are masculine and feminine inflection happens on your surname. So the surname of the wife is not exactly the same as her husband’s (who will often carry the name of his own father as a middle name (-ich) to state he is a son of [name of father]

Борис                        Николаевич             Ельцин
(Given name Boris)  (Father’s name, Nikolaivich masculine)  (Family name, masculine, Yeltsin)

Наина                         Иосифовна              Ельцина
(Given name Naina)  (Father’s name, Josefovna female)  (Family name, female, Yeltsina)

Arabic, a language where you can add being the “father of” later on in your life as part of your name, as well as your place of origin and your qualities. This obviously affects forms of address.
arabic name convention

One extreme case, but not so different from a classification perspective is Icelandic, where what we might take for surname is the father’s name plus a collection of family identifiers
Imagine, then, the challenge of finding then identifying and presenting people across different languages in an automated way…

The second part will wrap up the event with use cases, applied machine translation, CMS and Translation Management Systems.

Next time you think languages, think Pangeanic
Machine Translation Engines from PangeaMT

follow us on –> Follow manuelhrrnz on Twitter  @Pangeanic   @manuelhrrnz