Mastering Text Alignment
Gauge

Mastering Text Alignment

Text alignment of multilingual documents is one of the most widely used procedures in our industry. Alignment is a process that allows creating translation memories from already translated documents. This can be very useful to leverage previous work and improve consistency and productivity. However, as we will see in this article, alignment has other uses.

This procedure is usually a double-edged sword: while it helps us to recover old translations that have not gone through a CAT tool?and thus reuse the content, doing it incorrectly can result in spending too much time on the process to obtain quality and reusable alignments and therefore be a completely counterproductive tool.

When I worked at Montero Language Services, besides my translation tasks, I had to do others such as proofreading or editing. Over time, the system we worked with was computerized and updated and almost everything went through CAT tool. However, sometimes those DOCX files still arrived to be reviewed against their original in PDF or, DOCX documents that had been previously translated but whose content the client had modified and we had to introduce those changes in the translated document using a new original document in PDF as reference.?Obviously it has never been an option to work in Word for several reasons, but especially because I was not willing to give up an automated linguistic QA, or strain my eyes, or above all, have that work available for future projects. That’s how my love-hate story with alignment began.

Tips to keep in mind before performing an alignment

  1. The two documents must be in the same format. This is explained by understanding how the process works. What would happen if we try to align an original document in PDF with a translated document in DOCX? The aligner would try to understand the structure of the text, segment it according to some segmentation and format rules and use patterns to make that text match in both languages. The segmentation and format can be very compromised when creating those patterns if the file formats are different from each other and the result would be something unusable or with a huge editing requirement before being ready for production.
  2. The segmentation rules have to be as standard as possible for both languages. By this I mean that if in language A we segment after colons (:), we cannot avoid the segmentation of the colons in language B, we would create inconsistent translation units and it would be more difficult to find the alignment pattern.
  3. Similarly, although I am a big fan of comma segmentation for my translation fields, comma alignment is usually quite a disaster because it does not usually match between languages. Also, paragraph segmentation would be unproductive if we then are not going to translate text segmented by paragraph. The ideal is to segment by sentence.
  4. If we want to make the most of our alignments in our projects, it makes no sense to align text without tags or with different tag handling if later our text to translate will have them or in a different way. Therefore, tag handling is very important because it will require less editing when we use the alignment.

How to align bilingual texts using OmegaT

Now that we have the previous points clear, let’s get to work.

1. The first thing is to get the documents in the same format and, in addition, in a format that is CAT tool friendly. If we have PDF we will use optical character recognition (OCR), if we have CSV, we will convert it to XLSX, and so on with all file formats.

Abbyy FineReader 16


2. Next, we must format the documents so that they are as similar as possible. This means that, if we use OCR to convert two documents to DOCX, we have to edit and format both texts in Word so that those texts have exactly the same appearance.

Text A and Text B side-by-side


3. The next step is optional, but I highly recommend it because we will make sure to have full control of the segmentation and tag handling. Use your CAT tool to create two projects, one for Text A and another for Text B. From those projects we obtain the corresponding XLIFF files, thanks to this step, we can detect any discrepancy in the segmentation and we can mimic our CAT tool tag handling.

Both XLIFF files side-by-side


4. Now, we open OmegaT and go to the Tools > Align Files... menu.

OmegaT Menu


The OmegaT aligner will appear and we will only have to select the languages and the XLIFF files that we want to align.

OmegaT Aligner


And when you click on OK, the Autoaligner runs, there you have several options that you can modify until you get a more than decent automated result. It is important to keep in mind that, since our XLIFF files were already segmented previously, so that OmegaT does not resegment those files, we uncheck that option and thus maintain the original segmentation of the XLIFF files.

OmegaT Autoalignment


In the second step of the alignment, OmegaT allows us to manually edit misalignment errors, and finally create our TMX ready to be used in our CAT tool and projects.

Manual Edition of Alignment


It is very important to point out that the alignment may still have some errors and it is very convenient to review the TMX in a translation memory editor such as Heartsome TMX Editor?or to apply some penalty to the translation memory at the time of use to avoid autopopulation of erroneous translation units.


Resulting TMX




Nicolas Severyns

Business interpreting Russian-Dutch. Technical, legal and commercial translations from French, English and Russian.

12 个月

Really love your style. You should be teaching at a university.

Clara Montero Galán

Founder & Owner in Montero Language Services

1 年

A master class Victor, I see you didn't waste any time. I wish you all the best

Govind PS

Expert Trados Trainer/Consultant since 2014 | Strong Expertise in Translation software: CAT TOOL, TMX/TBX editors & Aligners

1 年

Great article as always! Informative post!

Maria Virgínia B.

Multilingual Translator | Subtitler | Interpreter English, French, Spanish > European Portuguese | Member of SUBTLE — the Subtitlers’ Association

1 年

Wow ?? such good tips and I loved your metaphors ??. You know how to deliver a message, Víctor Parra ??

要查看或添加评论,请登录

社区洞察

其他会员也浏览了