What is the secret behind MT Quality Estimation?

What is the secret behind MT Quality Estimation?

At the start of this year I predicted that 2025 will be the year of MT Quality Estimation. We all know that 50% (or more) of translation budgets are wasted, because much of the MT output is so good, it doesn’t need a human review or post-editing. The trouble is we don’t know which 50%. MTQE technology can solve that problem.

But how does it work? What is the secret behind this technology?

In January, CSA Research published a report on TAUS EPIC, the new version of the TAUS API that combines Quality Estimation and Automatic Post-Editing. CSA Research highlights some of the unique strengths of EPIC.

EPIC’s unique strengths?

Alison Toon and Arle Lommel (authors of the CSA Research Report) attribute the reliability of the TAUS QE scores to the massive volume of language data that TAUS has aggregated over many years. They highlight the independent industry position of TAUS as a unique selling point. And what they also underline as especially attractive to the LSPs in the translation industry is the white-labeling offering of TAUS.

All true, but it doesn’t tell the whole story of how QE models (and TAUS EPIC here in particular) can be the key to another innovation jump in the translation industry. The secret, which is not really a secret, lies more in how we perceive and manage this technology.?

The replacement trap

The biggest challenge that both trainers and users of QE models are facing is what I would call the replacement trap:

  • Model trainers, spending an inordinate amount of time gathering and preparing the data and expertise of many years of human LQA work and feeding that into the models, on a mission to replace LQA work.
  • Users (QA specialists, linguists, etc), expecting and fearing that they will be replaced by QE models and judging the QE scores from a man-machine bias.

Both can be overcome if we all get a different perspective on how QE models work.?

Model trainers, don’t overdo it

QE models are trained for the specific task of checking the quality of the output of MT and LLMs. CSA is right that a lot of good quality language data - both positive and negative examples - is needed for this training. But QE models rely equally on a mathematical universal representation of language, known as embeddings. Using this interlingua we validate the semantic similarity between source and target very effectively and keep tuning the models with exactly the missing data until we reach the required accuracy in the scores. The crux is to find the right balance between data preparation and tuning, or - you could say - between translation knowledge and mathematics. Language professionals are inclined to overestimate the importance of translation knowledge. In this regard, think of the long history of MT: for over half-a-century developers tried to crack the problem by feeding computers grammar rules and dictionaries. Only after the discovery of statistical and neural models, particularly transformers, did we succeed in building MT engines that generate good translations. Now, all this technology is available to us. And yet we see that developers of Quality Estimation fall in the same trap: they are compelled to gather as much human quality evaluation and post-editing data as possible before they train a QE model. This way it can easily take six months for a new language pair or custom model to be completed and the result may not even be as good as one was hoping for. At TAUS, our approach offers a more efficient alternative: our pre-trained QE models satisfy? the majority of industry specific use-cases, right out of the box. For specialized needs, we train new models with high-confidence scores within two to four weeks.

Users, accept your biases

MT engines and LLMs are never going to be 100% perfect. The bias that we need to overcome though is that only humans will be able to detect errors. In most instances, quite the opposite is true. There is an abundance of psychology literature about cognitive biases, the man-machine bias being only one of them. We, humans, for instance have selective attention: big loads of information are confusing to our brains. In those cases we substitute complex questions with simple ones that have nothing to do with the actual problem that needs to be solved. We have to accept those biases and embrace the QE models that are capable of ploughing through massive volumes of translations in just minutes and present us, human linguists and subject matter experts, with only the quality questions that our human skills are best suited for.?

We need to be aware also that QE models are never going to be perfect either. It is in the nature of the statistical models that errors can slip through. And that is where the humans will catch them. If users understand how QE models work and accept their own biases, they can produce ten to hundred times more output and focus on the most interesting and intellectually challenging quality issues.

Your AI Quality Companion

The secret, which is not really a secret but we tend to forget, is that the QE model and the human LQA specialist are not mutually exclusive. The one does not replace the other. They can learn from each other and yet they have completely different modi operandi. That’s why we call EPIC Your AI Quality Companion.

Ricardo Ivan Vivanco Cohn

Experto en calidad en la traducción de patentes y documentos para la industria farmacéutica | ISO 9001, ISO 17001, SAE J 2450.

2 天前

I think that very high expectations are being created which are affecting the market very much. However, for example in patent translations, there is the paradox that "apparently" correct translations are generated which theorically don't need to be revised, but at this point there are many cases where human revision is a must and is more expensive than direct translation by an experienced patent translator. But who wants to pay fair rates for human revision?

回复
Jakov Milicevic

Snr translator, editor ??????/terminologist ???/copywriter???/SEO ?? Lang.: EN/FR/IT<> HR???? CEO, Founder @ Verbosari ---> legal ??/ marketing ??/ IT ????/ EU ???? / life sciences ??/ finance ??/ games ??

5 天前

Let me also say two words about this article which is a b....shit. And why? Find it in two comments as LinkedIn doesn't allow me to write so long comments. Your comment touches on a crucial point: the distinction between human language and machine processing. However, the?real issue is not just about avoiding the “replacement trap” but about how the narrative around MT is shaping the industry and devaluing linguistic expertise. Let’s break this down: 1.) Machines do not “sift through words” in a meaningful way. They don’t understand concepts, context, or intent. They merely?predict the next most statistically probable word?based on their training data. This is not communication; it’s mathematical pattern-matching. 2.) The assumption that MT allows humans to “focus on the delicate issues” is misleading. This implies that the bulk of linguistic work can be automated, and human involvement should be limited to minor refinements. In reality,?language is a complex, deeply human construct that requires interpretation, cultural awareness, and creativity at every level. By relying too much on MT, we are not just replacing humans for efficiency, we are degrading the entire quality of communication.

Edgar Almeida

VP Strategic Alliances & Localization at Korn Ferry

1 周

the use of embeddings (math) to validate semantic similarity is very smart and super helpful when performing QE for the long-tail languages

Josh Olenslager

Director @ LinkedIn | Content Publishing, Product Development, Cross-functional Lead

2 周

Great overview, Jaap. Thanks

要查看或添加评论,请登录

Jaap Van Der Meer的更多文章

  • 2025 will be the year of MT Quality Estimation

    2025 will be the year of MT Quality Estimation

    Market watchers predict the great normalization of AI for 2025. Time to reap the benefits: not just talk the talk, but…

    10 条评论
  • Looking Back on 20 Years of TAUS Where did we go wrong?

    Looking Back on 20 Years of TAUS Where did we go wrong?

    This month TAUS is celebrating its 20th anniversary. Before we launched the Quality Estimation and Human Language…

    15 条评论
  • Half the money I spend on translation is wasted…

    Half the money I spend on translation is wasted…

    The trouble is, I don’t know which half. The man in the picture is John Wanamaker, a nineteenth century American…

    5 条评论
  • Taos revisited

    Taos revisited

    I am back in Taos, fifteen and a half years after our first visit. What a beautiful and mystical place.

    1 条评论