Have you ever come to an intersection in a strange place and found yourself following the bulk of the cars, on the gamble that the most popular direction is the place you are most likely headed? That's a similar premise to statistical machine translation, and the Achilles' heel of today's 🏭 industry leader
, Google Translate. In their words
"Typically, when we produce a translation, our system searches through millions of possible translations, selecting the best -- that is, the most statistically likely -- translation.
Google Translate is useful for interpreting the general gist of a 📃 text, and can be quite good in certain circumstances for translations between English and a few lucrative languages. Were it called "Google Approximate", users would know that they are getting a best guess that has a high probability of choosing the wrong vocabulary. The likelihood of error is to a large degree a function of the number of senses a polysemous term has and the number of times that term appears in a parallel corpus with the translation language. Google passes all or most other-to-other translation tasks through English, thereby multiplying the polysemy error probability and eliminating the mitigating aid of parallel text. This makes non-English Google Translate pairs existential 🚂☠ train wrecks.
We contend that overlaying Kamusi's 🎓 knowledge-based structure and methods will ultimately lead to much more accurate machine translations
, among many more language pairs. Certainly Google and the other players in the MT 🏭 industry have made tremendous efforts that should be built upon. Kamusi is open to collaborating with anyone in MT who wishes to benefit from our sense-specific 👅👅👅🔢 multilingual data, including lexicalized 🎉 party terms
that are marked for separability
. We are currently programming our source side pre-disambiguation
tool that will be your first opportunity to see our claims in action.
Google makes lots of claims questioned by experts
about its leaps using neural networks
, and there is no doubt their precision numbers can improve as they tweak their methodology. Any Swahili speaker
, for example, who has tried to make it to their boarding gate using the embedded Google "translation" service in the Chicago O'Hare Airport
website, knows they have nowhere to go but up. Machine translation is only as good as the underlying data that lets you know a term in one language has the equivalent meaning of a term in another. Google's method is to draw inferences from texts that they think line up between languages. Kamusi's method is to look at each concept, have people determine the links, and lock down that knowledge for machines to learn from
. Of course, manually reviewing translation terms for every word
is a very large task, which is much more labor intensive and takes a lot longer than setting computers to whir through the numbers. In the long run, however, we suggest that the effort to have people determine mappings across languages will lead to translations that get it right, in ways that statistical methods such as Google Translate have not done and cannot ever do.