?? We're excited to see Slator highlighting our latest research! Our research team identified a critical issue in MT evaluation called "metric interference" (MINT), where using the same metrics for optimization and evaluation creates artificially inflated performance scores. As our Research Scientist José Maria Pombal told Slator: "This bias in evaluation erodes the trust users build for both the evaluation metric and the MT model." MINTADJUST offers a solution by correcting metric scores to better align with human judgments - especially crucial in the era of LLMs where these biases can be even more pronounced. Read the full article here ?? https://hubs.ly/Q03cK3q00 #translation #xl8 #t9n #MT #LLMs #AIResearch #Unbabel
Unbabel exposes ?? how using the same metrics for both training and evaluation can create misleading ?? #machinetranslation performance estimates and proposes how to solve this with MINTADJUST. #translation #xl8 #t9n #MT #LLMs