The Latest AI Localization Revolution: Quality Estimation Scores Explained
As Large Language Models (LLMs) continue to evolve, machine-generated translations are becoming the default translation method for many industries and use cases. Advanced neural machine translation (NMT) engines allow businesses to save money and translate more content, but evaluating the quality of machine generated translations can still be costly and cumbersome – defeating the purpose of using NMT.?
Fortunately, quality estimation (QE) is on hand to help us measure the accuracy and overall quality of machine translations without the need for a reference translation. Below, we’ll explore the fundamentals of quality estimation and how it differs from quality evaluation. We will also take a closer look at the key metrics involved in calculating QE scores.??
Understanding the Basics of Quality Estimation Scores
Quality estimation is an easy way to anticipate the quality level of machine-generated translations before investing in human post-editing or risking it all and publishing raw MT. The beauty of QE is that it allows you to measure quality without taking reference translations into account. Additionally, the process doesn’t depend on the input or oversight of human linguists.?
QE takes several factors into account to generate a score that can be used to underline the reliability and accuracy of machine-translated text. Key metrics like fluency, style, accuracy, and more are individually assessed, with translated text compared against a multitude of linguistic models and other language resources.?
While QE scores are a fairly reliable benchmark, it’s worth remembering that they’re only an estimation. The value of QE scores can vary depending on the context of the text in question. Different metrics and tailored calculations can be used to make scores more relevant to specific scenarios.?
What’s the Difference Between Quality Estimation and Quality Evaluation?
While quality estimation and quality evaluation sound similar, they’re actually quite different. Quality evaluation involves assessing content after the translation process is over. Once delivered, machine-translated text is compared against reference translations written by human linguists to establish things like relevance, reliability, and readability of MT.?
While quality evaluation takes place after translation, quality estimation is a part of the translation process and doesn’t require reference translations. It’s used as a predictive tool to establish the quality level of machine translations without human involvement. Quality estimation scores serve as a useful guide to streamline translation and localization workflows, determining if MT use is appropriate for the source text and whether machine-generated texts will require any post-editing intervention by human translators.?
领英推荐
How Quality Estimation Scores Are Generated
Quality estimation scores are easily calculated, with several factors taken into account. Generally speaking, the most important of these factors is semantic similarity. In other words, how similar two text segments are based on their meaning.?
Other factors taken into account include sentence embedding, a natural language processing technique. Meaning, context, and how words in a sentence relate to each other are all covered by this technique. When working out QA scores, sentence embedding helps determine the similarity between source and target segments.?
Each segment is then given its own quality estimation score. In most cases, scores are displayed in numerical form on a scale ranging from 0 to 1. The higher the score, the better the quality level. However, with custom models, there’s plenty of scope for customization of scoring labels.??
Similarities and Differences Between QE Scores and Translation Memory (TM) Matches
While QE scores and TM matches have the same goal (making the translation process more efficient), they aren’t created equally. Translation memory matches compare similarities between two source text segments, comparing a previously translated segment with a newly translated segment. Maintaining a quality TM of verified past translations makes it a great tool for speeding up the translation process and reducing costs.?
QE scores, however, are calculated based on similarities between the source text string and target string without the need for a reference translation.?
Manual Testing | Loclization QA | API Testing | Automation-Selenium | Postman | Java
2 个月Very helpful