Responsive Machine Translation: The Next Frontier for MT

Responsive Machine Translation: The Next Frontier for MT

[Republished from https://csa-research.com/Blogs-Events/Blog/responsive-machine-translation]

CSA Research’s recent survey-based examinations of machine translation deployment at language service providers, enterprises, government agencies, and among freelancers revealed an ever-widening engagement with the technology. Although it didn’t surprise us, we also found widespread skepticism of claims that MT has reached human parity with numerous calls in open-ended survey comments for “truth in advertising.” Just as significantly, we saw widespread desire for MT to be more suitable for the use cases in which it finds itself plus a call for more guidance about when and how to use it. These perceptions of a technology that is at once over- and under-sold are a consequence of the very real improvements it has made in recent years.

In our conversations, we uncovered three trends that will drive the next act for machine translation:

  • Increased adoption of MT as a platform service within other applications.?This shift means that machine translation must serve a growing number of use cases servicing ever larger and more varied audiences.
  • The shift to context-driven MT.?Although most developers think of context as being about working with larger chunks of the text (such as paragraphs, pages, or whole documents), our analysis shows that the ability to address multiple kinds of context will lead to radical improvements in machine translation.
  • The emergence of metadata-aware MT.?Today most machine translation engines consider very little metadata in their training, but in the future, MT will be able to account for everything from the gender, age, or location of speakers or authors to the formality and register of text or the specific product lines it applies to. It will do this without needing domain-trained or product-trained engines, which are comparatively crude by comparison.

Taken together, these trends point to a future in which machine translation can respond intelligently to stakeholder requirements at multiple levels and deliver the best possible output for given contexts. The next step forward – we call it “responsive machine translation” – builds on the history of MT, including?augmented translation?(which CSA Research defined in 2016), but goes beyond to create something that is applicable in many more areas.

No alt text provided for this image

?

What Characterizes Responsive MT??

This new approach uses multiple levels and types of context and metadata to:

  • Automatically adapt to domains and text types at the segment level.?Rather than relying on document-level features and the selection of a single engine for a document, every segment can leverage the best and most relevant training data for it. A short legal passage in a marketing text can be machine-translated using legal training data and a technical note can be rendered appropriately even if it appears in an annual report.
  • Consider context beyond the segment.?Current development efforts at addressing context have focused on only one kind of context – what occurs before or after a segment. However, responsive MT will use a wide variety of context types encoded in metadata, such as information about who (or what) has created text, what kind of document it occurs in, the formality of the text, and many other features to adjust on the fly and select the most relevant training data and provide the best result.
  • Adjust itself in response to user or consumer feedback.?Unlike current one-size-fits-all MT, responsive MT incorporates the capabilities of adaptive neural TM to learn over time. But it goes further to integrate various sources of relevant feedback in order to deliver optimal results.
  • Incorporate user-supplied resources without a full retraining cycle.?Similarly, responsive MT is able to incorporate new translation memory or terminology materials without the need for full retraining. Integrating these materials ensures that engines are up-to-date and provide relevant results without the need to rebuild engines.
  • Meet other stakeholder requirements for applicability and usability.?Responsive MT will assess its own usability. In cases where the results do not meet usefulness and serviceability requirements as defined by measures such as MQM or a company’s own guidelines, it would flag that output for attention and cleanup by a professional linguist.

These advances require MT software developers to build in capabilities to ingest and apply metadata within training data and analyze incoming content to apply it as well. These advances will elevate MT beyond the current generation of domain- or company-trained engines that are fit only for narrow purposes toward general-purpose solutions that can be applied more broadly because they can deliver on the disparate functionality of many engines at once.

The advantages of these approaches will be MT that is both more fit-for-purpose and suitable for more applications. For LSPs and linguists, it will mean better input for augmented translation workflows. That improvement will make work simpler for professional translators and free them up to focus on the more interesting and challenging aspects of their jobs.

Although no systems yet meet the requirements for responsive MT, many of the components are available in individual systems or are under active development in research institutions. Taken together, they will deliver better and more useful output and lead MT into its next frontier.

A great vision, esp. that the most of technology pieces of the puzzle are already there. What's missing "in the field", are the legal/organizational/time-to-market pieces of easily and widely sharing data back from MT users to MT providers: project metadata, post-edit results, sometimes even translation memories. How do you envision the formal advancement in this area?

回复
Lifeng Han

Postdoctoral in Computer Applications (NLP)

3 年

great keynote!

回复

I also like the part of your key-note presentation where you explained how other "assistive NLP" applications can contribute to making MT more responsive (an approach which already was applied to SMT to overcome certain limitations, and which also brought back/in some of the existing rule-based components). And Lucia Specia's key-note on "Multimodal Simultaneous Machine Translation" actually has taken this yet one more level up, in my opinion, by including other AI techniques to "compensate for the missing source context" by using "different multimodal approaches and visual features on state-of-the-art SiMT frameworks, including fixed and dynamic policy approaches using reinforcement". This underlines your statement "the ability to address multiple kinds of context will lead to radical improvements in machine translation."

Gina Goodson Fevrier, M.Ed., M.A.

Retired 5/30/2024 (Tech writer, certified localization / translation program mgr., beta program mgr.) Mentoring students in STEM high school. Translating French > English. Former French/Art teacher and corporate trainer.

3 年

It was a great presentation!

Nathan Rasmussen, PhD

Interdisciplinary Computational Linguist

3 年

I was just reading the slides (available online) for Chris Potts's talk "Improving NLP systems with Questions Under Discussion" at UnImplicit Workshop, ACL-IJCNLP 2021. QUD, like genre and formality, might have to be inferred from segmental co-text sometimes, and other times might be explicit in metadata (i.e., in a localization job, "where is this text displayed?" may amount to "what is it meant to inform about?"). But either way, I suspect it has a lot to contribute to responsive MT.

要查看或添加评论,请登录

Arle Lommel的更多文章

社区洞察

其他会员也浏览了