Neural Machine Translation goes mainstream

Those of us who have several decades of translation/localization industry experience have lived with announcements of the impending rise of the machines since the first days.

In the 1970s and 1980s - yes, that early – rumors abounded about machine translation coming for translation jobs within the next five or so years, putting everybody out of a job.

As usual, it took a little longer. Translation memory technology (CAT) did arrive and do well. Statistical machine translation (SMT) followed with decent results. But progress took time, and if you look at the user interface of the “great survivor” of the first wave of CAT tools, TRADOS, you can be forgiven to believe that progress stopped in the 1980s. If anything, the TRADOS UI showed how much pain translators accepted in those heady days.

In recent years and months, especially with Google’s announcement of a brand new neural translation engine, the industry got really busy.

New players like Lilt out of Silicon Valley are trying to make a name for themselves, and better known existing statistical machine translation companies like KantanMT out of Ireland promise intelligent translation backed by “neural machines”. KantanMT just announced neural engines for a few European languages are available at no cost to existing customers.

The technological shift from CAT via SMT to neural has raised the technological bar to entry substantially. In the early days of CAT, developers fought with pretty small but far reaching problems, such as how to do double byte coding right, or how to establish simple standards for exchanging translations between competing systems. Over time, development became much more math heavy and obscure.

CAT development was basically driven from within the translation industry, with some big customers like IBM going into tools development, too, before abandoning the effort. Neural computing came largely out of academia and from other fields of science that needed to process large amounts of often ill-defined data. So, it is no surprise that computer vision and other sectors played a defining role in the technology we now see coming to translation as neural MT. Of course, neural MT comes with many challenges regarding not only the underlying tech but the question of how to perform and measure QA related to the expected scale of many billions of words of output as well as the role of linguists in the process.

For those who cannot match the R&D budget of the Googles or the Microsofts of the world, the Open Machine Translation (OpenNMT) suite out of Harvard is a good way to familiarize yourself with the neural way of doing translation.

OpenNMT is available on GitHub and runs on top of the scientific computing package Torch.

If you would like to try out OpenNMT without installing anything, Vigoursoft offers an Oracle Virtual Box 5.0.32 image with OpenNMT on Ubuntu 16.04 LTS at no cost, as in free.

The Virtual Box image is several gigabytes in size, so you will need to provide an FTP connection. Please send an email to [email protected] to request a copy.

This screenshot shows OpenNMT training happily, with “Perplexity” declining over the iterations.

[Update 9/8/17] The VM image is no more. If you plan to try OpenNMT, you should set it up with GPU ("CUDA") functionality for much shorter training times.

要查看或添加评论,请登录

Hans Sussenburger的更多文章

社区洞察

其他会员也浏览了