The Devil’s Dictionary – 
Language Services Edition
The Augmented Translator, prompt-engineered with DALL-E but first conceptualized by CSA Research in “How AI Will Augment Human Translation” (2017)

The Devil’s Dictionary – Language Services Edition

When friends and family hear what I’m working on these days, they typically ask: 1) won’t AI eliminate the need for translators and interpreters? and 2) won’t that eradication of the language sector do away with your job, too? The first question has taken the air out of the room in a lot of discussions over the last couple of years, especially given media coverage of humans in the loop that amounts to job descriptions that are little more than janitors cleaning up after bad MT outcomes.

CSA Research has long had a different take on this question. We have positioned humans at the core of global communications, both at the point of adaptation for other languages and channels and most importantly as the target of those communications. The human ingesting information requires it to be on topic, relevant, information-rich, and current. Machines help with the timeliness of those communications, but humans keep the machines on-brand and intelligible through a variety of processes, technologies, and algorithms that have evolved over the years.

At this point in the conversation, I usually drop a few of the buzzwords we’ve coined at CSA Research over the last 10 years or so – lights-out project management, small AI, responsive and responsible MT, human at the core, augmented translation and interpreting, localization and globalization maturity, text-to-sign, and the Post-localization Era. And a few that we popularized – quality estimation, automated content enrichment, and intelligent content, among others.

We have also been relentless in discouraging the oft-expressed "we deliver great translation and interpreting” as the language industry’s value proposition. Instead, we characterize these deliverables as just two of the many transformations enabled by the massive amounts of digital content that need to be processed into a wide array of other forms and languages to meet the imperative of communicating with humans. Our preliminary sizing of the market for 2024 – as much as US$54.72 billion – factors in both accelerators and challenges (“Growth Resumes in Nominal Terms, but 2024 Will Be a Hard Year ").

What I’ll do in the rest of this post is put some of these concepts into a broader context. I’ll start at the pre-high-tech beginning, outline the impact of silicon and gallium on the language sector, and look back three centuries to the dawn of the concepts that are driving today’s language sector into the post-localization era. Big ideas or concepts are bolded.

In the Beginning – Carbon-Based Language Professionals

When two or more people speaking different languages first gathered, interpreters and translators came into being – hence, they are two of the oldest professions. Here is an all-too-brief, very incomplete, radically abbreviated history of their evolution over millennia:?

  • The archetypal linguists of olden times often worked alone, powered mostly by their bilingual brains, with no tools other than a quill and papyrus to translate ancient scrolls, or maybe a stool for an interpreter to sit between people speaking different languages. Over time, they were assisted by resources such as bidirectional dictionaries, domain-specific glossaries, and other written text that they could consult on demand. Because they were translating spoken language in real time, however, interpreters had less time to consult such aids.
  • Production of translated documents became easier over time as manual handwriting yielded to typewriters along with Corrasable? bond paper that was easily erased and Wite-Out? (aka the “Fluid of Expiation” as we called it in my pre-PC grad school days) to paint over errors. Finally, the Selectric? golf-ball typewriter enabled both in-place corrections and typing in multiple languages. Over time, these mechanical devices were replaced by word processors, PCs, and Macs, each allowing you to print on demand and iteratively edit errors out of translated texts. Spoken-language interpreters benefited from improved communications systems that enabled more remote interpreting by phone and video, but they still largely labored in real time, without a safety net.?

Silicon and Gallium Enhance Carbon-Based Professionals

Widening use of computer technology combined with digitized content to increase translator and interpreter productivity, reduce errors, optimize workflow, and enhance quality. Think of these elements that can make translators and interpreters into bionic humans à la the American 1970s television series, “The Six Million Dollar Man ,” with its tagline, “We have the technology. We can make him better than he was. Better, stronger, faster.” In 2016, we discussed how artificial intelligence would integrate various technologies to enhance linguists’ abilities, making them more efficient and helping them produce better results. Thankfully, the evolution of augmented translation has not required cyborg implants to be successful.

  • With the burden of producing translations progressively lessened by mechanical and computing aids, language-focused software vendors innovated with digitized productivity aids such as translation memories, terminology databases or glossaries, and other computer-assisted or -aided translation (CAT) tools to ease even more of the cognitive load of translators. Standards such as TMX, XLIFF, TBX, and SRX that codified their formats in the late 1990s and early 2000s, essentially ossified in an era of comparatively low computing power and limited storage. This freezing has left them – with the arguable exception of TBX – increasingly unable to handle today’s evolving content requirements.
  • Interpreters benefited from their translator counterparts’ efforts in reducing their workload with CAT tools by pushing for their own computer-aided interpreting(CAI) tools like glossary management, real-time terminology assistance, and post-event linguistic asset management. They benefit from faster computing devices that support more on-demand capabilities, but fundamental and unsolved ergonomic challenges have limited their adoption (“Perceptions on Automated Interpreting ”).?
  • Mission-critical simultaneous interpreting was introduced in 1945 at the Nuremberg trials in the form of the IBM Hushaphone Filene-Findlay System (aka the International Translator System). Interpreters simultaneously translated the proceedings into multiple languages (English, French, German, and Russian) so judges, lawyers, defendants, witnesses, and other attendees could follow the proceedings in their language. The Hushaphone system ultimately led to the widespread use of simultaneous interpretation – and later remote interpreting as well.
  • Machine translation (MT) followed a similar trajectory from a practical application in international geopolitics – the Georgetown-IBM Experiment in 1954 translated 250 Russian words used in 60 sentences into English. This rule-based machine translation (RBMT) was succeeded in the early 2000s by data-driven statistical MT (SMT) and neural MT (NMT) more recently. Both fall into the category of artificial intelligence (AI). NMT is a bridge to the democratized ChatGPT large language models (LLMs) that we all now use – and in fact, the LLM transformer model was originally built for machine translation and only later repurposed for the generative magic we see today. For a complete and accurate history of the NMT-LLM connection, read "8 Google Employees Invented Modern AI. Here’s the Inside Story " in WIRED. ?

Supporting these various transformational language technologies are infrastructural components such as translation management systems (TMSes), middleware, and connectors, and emerging categories such as AI-driven workflow automation. These solutions interoperate and provide multilingual support for a variety of enterprise content and document management systems, content creation software pumping out a wide range of content types, and legions of platforms and ecosystems for every business, engineering, entertainment, and other functions you could imagine.

And of course, there are the organizations such as localization teams, emerging LangOps groups, and LSPs and other specialty service providers working with other corporate groups such as development, marketing, and customer care.

Why Now? 3 Centuries of Digitization Enable Post-Localization

Last year CSA Research heralded the emergence of what we named the “post-localization era ,” not because localization was kaput but because an array of enabling technologies and evolving practices had changed the nature of the discussion. Enormous volumes of content digitized and available for transformation plus learning by powerful computing platforms combined to offer new ways of processing textual content – translation being just one of many transformations that could be performed on written content. Spoken language, too, could be morphed into numerous other forms and formats.

Figure: Some Enhancements, Optimizations, and Innovations Enabled by Digitization

It’s always enlightening to dig deeper into today’s hot technologies, especially when we see that the seeds of today’s digital transformation were planted at the beginning of the 18th century. In his 1703 Explication de l’Arithmétique Binaire , Gottfried Wilhelm Leibnitz laid out the foundation for how zeroes and ones could represent numbers. In A Dictionary of the English Language (1755), Samuel Johnson popularized Leibnitz’s “binary arithmetick” with a lexical entry. And a century later, in 1854 George Boole gifted us his Boolean algebra in An Investigation of the Laws of Thought . The rest is 01101000 01101001 01110011 01110100 01101111 01110010 01111001 – except for how these concepts translated into the digital transformation we see today:

  • Despite its early roots in these works, digitization became a major technology driver only in the late 1970s with the adoption of large-scale digital computing driven by companies like IBM and Siemens and the concomitant shift to digital representation. That became the norm for content generation beginning in the 1980s with PC democratization. Today, content is digital – or it might as well not exist.?
  • The concept of digitalization builds on the pervasiveness of binary content to drive efficiency and innovation in commercial, engineering, scientific, government, and other organizational transformations. The management consultancy McKinsey marketed the business proposition of digital transformation to big businesses and governments over the last decade or two, but the concept itself dates back to a 1949 foundational book on information theory by Claude Shannon, A Mathematical Theory of Communication , which was my introduction to computational linguistics in graduate school.?

What This Means to the Language Sector

What’s happened as a result of digit[al]ization and digital transformation is the fundamental realignment of the language sector, which over the last few decades has benefited from the collision of business data informing process, the growing leverage of content for both information purposes and machine learning, the massive processing power of GPUs, and data farms to process it all. The post-localization era builds on a foundation of digital transformation, with the ability to perform many language-related and supporting functions with increasing automation.

In its simplest form, post-localization is the digitalization of the language sector, with enormous opportunities growing from the transparency of digital data – or as Leibnitz himself marveled, “All these operations are so easy that there would never be any need to guess or try out anything.” The solution for this challenge is to embrace LangOps, with its message of making language a core component of everything the organization does.

What human participants in the language trade must do is roll with this New World Order – their role will shift from word jockeys, shepherds, and MT janitors to ensuring that communications in any language, in any form meets the requirements of the human consumers of that information at the other end of the communications link. As the language sector and in-house localization teams review their options, they’ll see opportunities in a broader array of linguistics preceded by qualifiers such as socio-, cultural-, accessibility-, inclusive- and delivered in audio, visual, and text modes in an ever-increasing selection of personalized forms.

Figure: Humans in the loop are unfortunately all too often cleaning up at the end of the loop

Note: This post originally appeared in much the same form at CSA Research on 23 March 2024. Apologies to Ambrose Bierce for the appropriation of his title.

Love this insight! To truly stand out, consider integrating multi-variate testing (A/B/C/D/E/F/G) to optimize content across different markets, ensuring your messaging resonates universally.

回复
Dimi Nakov

Filmmaker / Futurist / Beneficial AGI Enthusiast / Mindful Optimist

7 个月

Are we, as a species, mature enough, to create a Benevolent-Beneficial #AGI/#ASI? I truly hope more people will realise how existentially crucial is for us to carefully govern the #AI evolution, and if we prove capable, when AI evolves into ASI, in return this Benevolent-Beneficial #Superintelligence will guide/nurture humanity towards reaching our true/full potential as a species. Who knows, our potential is potentially limitless if we get this right. Progress could be disruptive/uncomfortable to a species' development/advancement/growth, but also essentially necessary. So caution and care need to be applied too this time. It's a balancing act, a form of art in my opinion. All available fingers crossed we still have a chance to achieve #BeneficialASI before some misguided enteties addicted to power/dominance/control, who are better equipped/financed, accidentally create/unleash the alternative ??. Can we create a #BeneficialAGI without teaching it morality, while we still don't really/truly/deeply understand what morality really is? The lives of billions alive today and countless yet to be born are in the hands of the AI community. I hope we get this right, because if we do, the future will be better than we can imagine.

回复
Chareen Goodman, Business Coach

Branding You as an Authority in Your Niche | Helping You Build a Lead Flow System with LinkedIn | Business Coaching for High-Ticket Coaches & Consultants | Creator of the Authority Brand Formula? | California Gal ??

7 个月

Sounds like they're emphasizing the broader impact of language tech.

回复
Shravan Kumar Chitimilla

Information Technology Manager | I help Client's Solve Their Problems & Save $$$$ by Providing Solutions Through Technology & Automation.

7 个月

Excited to delve deeper into the transformative power of digital content. ???? Don DePalma

回复
Rajesh Sagar

IT Manager | Dedicated to Bringing People Together | Building Lasting Relationships with Clients and Candidates

7 个月

Excited to dive into this insightful research! ??

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了