I recently ran across an article extolling the virtues of Google MT – http://www.independent.co.uk/life-style/gadgets-and-tech/features/how-google-translate-works-2353594.html. While I agree with many of the ideas in the article, a few of the points and the whole tone of the article seemed out of line with reality. First, the idea that MT should focus on statistics more than extracting meaning I agree with…at least for now. But lets at least concede that that is fundamentally different than what we do as humans. I do believe in statistical theory, and have in my linguistic background studied the role that statistics plays in human language but I do NOT believe that word sequences and alignment statistics are the only determining characteristics of acceptability for a sentence. I DO look at meaning. So given this fundamental difference in processing, we have to assume the introduction of errors. So while the article praises the virtues of statistical based processing, let’s temper our enthusiasm as that is only part of the puzzle, and probably not the most important one for real fully automated high-quality machine translation.
Which brings me to my next point. The most inexplicable part of the article is where the author, David Bellos, discusses how human translation errors are usually more dangerous than MT errors. I’m completely lost on the reasoning here. He says when Google MT makes an error, it’s obvious, but the human translators make an error, it’s not, so human translation errors are more dangerous. Analogously then, as a business owner, the data entry guy that 10 to 20% of the time spews junk is preferable because I know I can ignore his garbage vs. the guy that makes an error once every 100 or 200 entries. What? Isn’t the reason I’m getting data entry (analogously translation) because I WANT to understand, not disregard, the output? I’m baffled. The only reason that human errors would be more dangerous, is because the output is actually useful – and that’s kinda the idea.
Finally, the name of the article is “How Google Translate Works”. While I understand that he’s probably trying to write for a broader crowd, it doesn’t really go into any technical detail beyond “it uses human translations” and “it uses statistics”. No equations, no specifics. And then he makes the assumption that because our desires and needs are the same, the premise of everything-has-been-said-before MT should work. Once again, while I don’t outright disagree, linguistic nuances go a little deeper than that.
Whether or not you like Google, the virtues of a good MT engine (which Google’s system is) are numerous. Many would say that we’re just beginning to tap the MT-as-a-professional-translation-resource well, but let’s stay grounded here in reality.