Entering content area for The Web of Language

showing results for: March, 2007

blog posts

  • Google translates text untouched by human hands

    Google, the company that became a household word by searching websites without actually reading them, wants to translate text untouched by human hands, or by the human voice.

    It is working on a program that will translate a text instantly into any language.  Unlike translation machines that must be programmed to recognize both grammar and vocabulary in order to translate anything, Google’s e-translator requires no linguistic knowledge at all.  Instead of parsing sentences and defining words, the new Google translator employs a statistical model to compare large numbers of already-translated texts and then applies the patterns that it distills from this database to documents that haven’t been translated yet.  The company claims that the more translations it uploads, the better the Googler’s ability to translate new documents becomes. 

    The project’s director defended the company’s decision to mechanize the translation process because computers make fewer mistakes than humans.  The computer won't be distracted by the meaning of a word.  Instead, it will determine that certain words are statistically more likely to occur with other words.  As a result, computer translations are likely to avoid gaffes like that committed when Russian premier Vladimir Putin’s translator referred to the German Chancellor as “Fuehrer Gerhard Schroeder.”  Regardless of their past, or in some cases because of it, post-war German leaders have shunned the title “Fuehrer,” at least in public, preferring the less stigmatized “Bundeskanzler,” a term that the computer will see as typically accompanying the name of recent German chancellors.

    On the other hand, a computer is just as likely as president Jack Kennedy was to translate the German word Berliner as ‘a resident of Berlin.’  When Kennedy told a crowd of cheering citizens of West Berlin, “Ich bin ein Berliner,” he thought he was saying that he was just like them.  He was really saying, “I am a jelly doughnut,” because that’s what Berliner means (though for obvious reasons, just as the French don't call french fries french fries, or even liberty fries, Berliners call the jelly doughnut something else).   

    It has long been the dream of computer programmers to produce translations that are both instantaneous and people-proof, a goal which clashes with the common belief that while language is natural to humans, it’s not easily mastered by machines.  Preliminary results suggest to developers at Google that while their machine translations may not qualify as literary masterpieces, they should be good enough for documents whose meaning doesn’t turn on nuance and style.

    Of course, not too many texts don’t turn on nuance and style.  Laws and contracts, formulaic though some of their language may be, are subject to endless disputes over interpretation.  Business documents have to be exact in any language.  Religious texts, often purporting to be the divine word made clear and manifest, spark debates that steer quickly away from polite to warlike.  News articles are supposed to be neutral – maybe – but critics are quick to charge slant and bias for all but the most innocuous of publications.

    Even apparently obvious meanings turn out to be subtle and slippery.  The shopping list is a good example.  Simple strings of words like “bread, milk, cereal, some fruit if it looks nice, something for Wednesday?” seem to be totally forthright, but they too turn out to be heavily coded: what kind of bread? wheat or rye? Wonder Bread or 27-grain? and the milk: whole milk, 2%, or skim (and how much of each)?  One person’s “nice fruit” is another’s “you know I don’t eat that.”  And what’s supposed to be happening on Wednesday that will determine an appropriate “something for Wednesday?”?  And why that final question mark after Wednesday?  Is Wednesday likely not to be special after all?  Are we possibly going to skip Wednesday altogether and go straight from Tuesday to Thursday?  Worse yet, is there something ominous about the future that I don't know?

    We will all look forward to the Googler’s debut, but I for one won’t be surprised if the translations that it produces become the fodder for jokes and parodies or lawsuits in the same way that we make fun of or litigate the many mistranslations generated by humans.  At least the machines won’t come away with hurt feelings when we laugh at them.  Wait, machines don’t have feelings yet, do they?  

additional blog information