7 myths about auto-translation you should leave behind
Gregory Rosner
Helping CEOs flip their ‘me-too’ marketing into category-defining movements that make more sales with AI-integrated Sales & Marketing Enablement | 90-Day Marketing Sprints | Author of StoryCraft for Disruptors
Computers may never be able to translate as well as a human can, but in the meantime you should know there are teams of data scientists making strides towards that goal by using machine learning and artificial intelligence. One such team works at SDL, and their advancements are solving some very real world business communication problems that even 5 years ago were considered impossible. Recently SDL announced an auto-translation breakthrough called XMT, which is essentially the foundation of an auto-translation system which is customizable and learns over time. (That's the artificial intelligence bit. FYI SDL is offering a free Webinar on XMT July 16th 2015) Given this advancement and others in the industry, it's time for an upgrade of our perception around what's possible. Allow me to help dispel some of the most popular misconceptions.
1. All Machines are created equal. Not true. If you've tried auto-translation in the past, you were likely using a baseline engine, which is a general purpose system designed to deal with a very wide but shallow language-space domain. (i.e. Google Translate) For example, language used to describe making Power Plants is very different than making Planters Punch, so one can appreciate that Machine Translation, like other automatic systems can be trained and can be customized for very specific language domains.
As another example, there are machines that drive cars autonomously and there are machines that can play the violin, but there isn't one machine that can do both. Similarly, SDL trains machine translation engines for businesses who are looking to optimize their use of auto translation for very specific domains where there is deep and specific bi-lingual terminology. SDL has industry trained engines which can be licensed such as travel, IT, automotive and others. Engines can be trained on any big data specific to any industry or "spaces" that have particular terminology.
2. Businesses can't use auto-translation because of the risks associated with a bad translation. Not true. Businesses use auto-translation all the time. It really depends on the situation the business is using it for, also known as the "use case". A use case is really what the readers are expecting from the translation and what their goals are. And with a simple disclosure to the effect of "For your convenience this content was translated by a computer, please excuse any errors.", the business will absolve themselves of most of the risks associated with publishing unedited machine translated content.
Careeronestop.org, a career oriented website sponsored by the U.S. Department of Labor uses auto-translation for virtually all it's content and job postings.
Auto-translation (Machine Translation or MT) won't be likely good enough for marketing content, but it may be good enough for job listings or knowledge base articles. Auto-translation may be good for ratings and reviews, where someone's goal is to make a choice on a hotel stay or product, but may not good enough for that product's description. Thousands of businesses, law firms, and government agencies are using MT today to help deal with and understand content their employees can't read. Some organizations are using MT to help refine their marketing strategy through data mining and sentiment analysis. Many are using private cloud or on-premise machine translation to assist with evaluating internal communications, research and development. Some are using MT for publishing content where they have a massive volumes that changes all the time, repetitive content, or content that has a very limited audience - like email or chat. Facebook, Yelp and other social sites have already incorporated using MT into their interface to enable automatic translation.
But how do you know if the auto-translation output will be useful without having to proofread every word before publishing? There is a useful MT feature patented by SDL called TrustScore, which is an rating algorithm that scores the translated output on a scale of 1 to 5 based on the probability that the resulting translation will be useful. What some businesses do is publish output that has a score greater than 3 and automatically delete the rest. This way, they can have some assurance on publishing useful content.
3. Linguistic quality is the most important thing. It's not. The output of an auto-translation may be understandable enough to be useful to someone in a particular situation even if it's poor quality and the grammar is broken. Having a secure and confidential translation option about a thousand times cheaper than paying a professional translator or agency, (where perfect quality isn't necessarily guaranteed either by the way) and getting that translation in less than a second, machine translation is addressing an ever increasing gap filled with “I-need-to-know-now” information. This gap can’t be addressed by a translation that takes 2 weeks to complete that has a relevant life-span of perhaps about 2 months. It can, however be filled by sort-of-correct language generated by auto-translation, which just might be more useful than nothing at all.
Another interesting point to make on this is how quality is subjective. The source content which is being translated isn't always written well, and with written language there isn't one perfect way of saying anything. For example, if you have 3 professional translators all translating the same text, you will have 3 slightly different versions. And if you have 5 people judging the quality of a translation, you will have 5 different opinions. So when using auto-translation, the real goal is not to obtain "the perfect quality" translation because that's an elusive goal anyway. It's providing a translation which is useful.
The best way to measure the usefulness of your MT is to measure the actual effect of the content in the real world, which would be measured by the same key performance indicators that you're using for your English (or source language) web content. Here's the formula; Create an A/B test, with "A" being the machine translated content and "B" being a professionally translated or post MT edited piece of content. Then measure your desired outcomes such as page views, transactions, bookings, revenue, call deflections and the like. If "A" is as effective or nearly as effective as "B" results, then you've found a way to generate your desired outcome at a cost that is more than one thousand times cheaper than human translation. If not, then maybe your MT needs better training or maybe it just won't work for that use case.
4. Translators should fear Machine Translation. They should not, and most don't for 4 reasons.
- There is an explosion of content that global consumers want now and it is simply impossible and unnecessary for all of it to be translated by a very limited number of translators in the world today.
- For some content, post editing of machine translation output can double a translators efficiency getting them from 2,500 words per day to about 5,000 words per day. This boost in productivity can increase their profit margins as well as offer buyers of translation services a lower-cost option for less important content types.
- MT is not as good as professional human translation and there is an increasing amount of globalization going on in the world. There will continue to be a growing need for high quality translation and transcreation for the highest value communications given the growth of international business today.
- Being a translator will continue to be a desirable profession for the foreseeable future. According to Careerbuilders.com, translation and interpreting is forecasted to be the #1 Hottest Job Industry over the next five years. Also, by "2022, the Bureau of Labor Statistics projects 46 percent employment growth for interpreters and translators, which is much faster than the average for all occupations" in the U.S.
5. MT chokes on #socialLOL:) It's true that many MT engines do break when trying to translate text that contains hash tags, slang, emoticons, bad spelling, brand names, (i.e. Apple = fruit), or acronyms. SDL's has MT solutions which don't make many of these mistakes because they have been trained to recognize common social keywords and normalize social text first, prior to translating.
6. MT can't learn. Not true. SDL trains MT engines every day for businesses with results that are impressive, and over time the systems can be upgraded with more parallel language data and custom language modifications which level-up the quality of of the translation. Also, with the release of XMT, it has the capability of learning over time as corrections are made to the output by users.
7. Machine translation is a word-for-word translation. Not anymore. Most MT systems today are using the statistical approach instead of the rules-based approach. The jump to using statistical machine translation happened in the 1990's even as the statistical formula was developed in the 1950's.
Unlike the rules-based approach which is essentially a dictionary look-up table, the statistical approach crunches an algorithm on top of big data to get a result. (The big data is essentially bi-lingual parallel corpuses of text) It asks that data 2 questions:
- What is the probability that the target text is a translation of the source text? P(s/t) Translation Model
- What is the probability of seeing this target text in a relevant context?
P(t) Language Model
These statistics are combined with other techniques, (most of which I don't completely understand because they used by data scientists to create custom engines but) are called tokenization, normalization, word alignment, lexicon generation, phrase extraction, features computation, parameter optimization - all of which render the translated result.
- - -
What will auto-translation look like in 30 years? I know what Captain Kirk would say, but your guess is as good as mine.
- - -
I’m Greg Rosner and I wrote this blog. I’m a web content globalization expert and work with SDL Language Solutions helping companies bring their brand to the world. I believe there is only one language - the language of your customer. Talk to me about how language technology can make your global marketing and customer support easier and more effective.