Can ChatGPT translate a Vietnamese novel?
Nick Angiers 安仁良
NAATI-Certified Chinese to English Translator | Localization Specialist | Linguistic Consultant | If your paying customers are human, A.I. translation on its own is not enough.
I've been learning Vietnamese for about two years now, and being a professional translator (and total language nerd), I'm ready to try my hand at some translation from Vietnamese into English. While I'm at it, I thought I'd compare my work to a machine.
Another reason for this experiment is that, as we all know, machine translation (MT) has gotten much better in the seven or eight years, as a result of the rapid development of neural networks.
But is MT good enough to translate fiction? How about one from an Asian language?
MT on its own is still flawed, and its flaws are shown to varying degrees depending on the subject matter.
For example, it can handle news quite well, for two reasons: one, news is written in a very straightforward style; and two, the world produces a lot of news each day. Why in the world we need so much news every day is beyond me, but that's beyond the scope of this article.
With so much news being released, that means a lot of it is being translated. This results in a very large parallel corpus, which is how AI is trained to provide translations. (For more about how a parallel corpus works, check out The Future of Translation, by Jane Kim .)
But when it comes to Asian novels, while many are translated into English every year, the volume is nowhere near the scope of daily news. With a smaller parallel corpus, MT has less to draw on, and yields a translation that is of poorer quality, often riddled with awkward sentences, with a dash of blatant errors.
Novels are written in a less straightforward style than news or academic writing, focusing on imagery and dialogue unique to different characters, rather than facts and figures. People read fiction for pleasure. So, any fiction translated by a machine will usually need much more work than usual.
The second problem with MT translation for an Asian novel concerns linguistic differences. Asian languages are simply very different from European ones. This means that the AI is more likely to misinterpret or mistranslate the source text.
To illustrate this point, here is Phrase's translation of the first paragraph of Dreamy Eyes, by Nguy?n Nh?t ánh:
...About all it got right was "Chapter 1." The narrator is a man, not a tiny girl. While Phrase has no way of knowing this, it's obvious to anyone who continues to read on.
More importantly, the translation of b?n gái (girlfriend) is also wrong; while "girlfriend" may just be the only translation anyone has ever used in translating a Vietnamese novel to English, here the sentence means "I didn't have any female friends."
Here's another excerpt from the same chapter:
This is just a total mess. It sounds like the narrator had five girlfriends, two of whom were his mother and grandmother, and worse yet, the other three were his sisters and uncles.
There's also the issue of tense. Vietnamese has no tense, so the MT often gets confused on which one to use, opting to mix them.
Some other huge problems I found included pronouns, of which there are many more than in English, and there's often no distinction between first- and second-person pronouns; instead they differentiate between the age of the two speakers. To confuse the AI even further, people often refer to themselves by their own name, in third person.
The result is that the translation is beyond the level of just needing to be checked and edited, and more or less has to be scrapped and redone by a human.
领英推荐
That's Phrase; now let's see how ChatGPT compares. Here's the first paragraph:
H?I CòN NH?, NH? XíU, T?I KH?NG Có b?n gái. Su?t ngày t?i ch? ch?i v?i... m? t?i và bà n?i t?i. M? t?i r?t th??ng t?i nh?ng vì m? s? ba nên ít khi m? che ch? ???c t?i tr??c nh?ng tr?n ?òn c?a ba t?i. Bà t?i thì l?i khác. Bà sinh ra ba nên ba ph?i s? bà. ?i?u ?ó th?t may m?n ??i v?i t?i.
When I was young, very young, I didn't have a girlfriend. I only played with my mother and grandmother all day. My mother loved me very much, but because she was afraid of my father, she could not protect me from his beatings. My grandmother, on the other hand, was different. She gave birth to my father, so he had to be afraid of her. That was lucky for me.
This is actually much better than Phrase. Still needs some work, and sounds too much like a direct translation of everything. Some sentences need to be reworked ("She had given birth to my father..."). But the meaning of everything is intact, with the exception of the girlfriend mistranslation, which I don't blame it for. Let's try that second excerpt:
L?N LêN M?T CHúT, NGOàI M? T?I Và bà t?i, t?i có thêm ba ng??i b?n gái. ?ó là hai ng??i ch? con bác t?i.?
Growing up a little, besides my mother and grandmother, I had three more girlfriends. They were two sisters who were my aunt's daughters.
A little bit better, but still quite wrong. In many Asian languages it's okay to call your cousins "sisters," but in English they are cousins. This shows that MT still relies heavily on direct translation, while a human has a better idea of what's actually going on.
What this means
MT is an incredible technology, that saves much time and energy for us, so we can do other things. However, it's still flawed. It mostly translates exactly what it sees on the page, and does its best when encountering idioms and colloquialisms.
But for things like choosing a tense when there is none in the source language, or wrapping its head around what certain pronouns mean within context, its flaws immediately become starkly evident. It's just not very good at fiction, which is a very "human" style of writing.
In a nutshell, if you're just translating a piece of text to get the gist of it, then MT is adequate, fast, and free; but if you need to translate something that is going to be presented to the world, such as a book or a website, please hire a human translator.
This is also good news for translators: our job hasn't been completely taken over, at least, not yet. While MT has replaced much of the demand for our work, and singlehandedly eliminated the need for less skilled translators, there is still much work that requires high-quality translation. And we can use technology like CAT tools to take out much of the redundancy of our work, making it more efficient and enjoyable.
There is also much exploration to be done in some fields, such as novels from East Asian countries. Once there is a larger parallel corpus for such areas, MT will provide better quality translations, but it's up to humans to create the corpus.
What do you think about machine translations of novels? Let me know in the comments!