Creating Functional TMX Files from Bilingual Documents with Tag Preservation
Tags

Creating Functional TMX Files from Bilingual Documents with Tag Preservation

Let's imagine that our client provides us with an Excel file with Column A in English and Column B containing the Spanish translation. They ask us to use that alignment for our translation (this procedure is also valid when they provide us with translation memories with untagged code, as explained later in the article):

Bilingual Table


As we can see, the segmentation is based on sentences, with each segment corresponding to a cell.

Here, we have several options. The first one would be to transform this bilingual table into a translation memory in TMX format. However, there is something crucial to consider at first glance: the text contains content that should be tagged in the translation memory.

We open Heartsome TMX Editor and go to the following path in the menu:

Convert to TMX options in Heartsome TMX Editor


Next, a menu appears where we select:


  • The Excel file with our bilingual table.
  • The path where we want the generated TMX file to be located.
  • We check the option "Open TMX after conversion" so that we can take a look and verify that our memory has been generated as desired.

Convert to TMX options in Heartsome TMX Editor


It is important to note that for our translation memory to be generated properly, the first row of the bilingual table should contain the language and country code according to ISO standards:

Locales


Now, we execute the action in Heartsome TMX Editor, and we obtain the memory already opened in the program itself

TMX in Heartsome TMX Editor


As we can observe, the XML code content is still in plain text, and this can cause issues when using it in our CAT tool:

Untagged code


This is where the Rainbow application from the Okapi Framework suite comes into play. It is worth noting that this method also applies when the client provides us with a TMX translation memory with untagged code that we would like to tag.

Open Rainbow and load our newly created translation memory into the Input 1 tab. Double-click on the filter name, select "Regex Filter>Create..." and give it a desired name. In this case, since we are creating a regex filter to tag inline content in TMX files, I named it "tmx_tagger," but the actual name you choose is irrelevant. Accept the action:

Rainbow main window


Next, the filter configuration window opens:

Regex filter configuration


And we start by adding our first rule by clicking on "Add...":

Rules


Give a name to the rule (in my case, "Scope" to better identify the content, but again, the name is irrelevant) and accept it. Now, the window to configure the rule appears.

Here, the most important step is to open our newly created translation memory in a rich text editor, such as Notepad++, and copy a complete translation unit that we will use to preview our regex in real-time. Select the content that will be the Source and the content that will be the Target:

Translation units


Actually, at this step, it doesn't matter which translation unit we copy. The important thing is that it is a complete translation unit, meaning from the "<tu>" tag to the closing "</tu>" tag. When we paste the text, we do it in the rule configuration window that opened in Rainbow.

After editing the rule, the previous window would look like this (don't worry, I will explain it step by step):

Parser configuration


The first step is to copy the complete translation unit, from the "<tu>" tag to the "</tu>" tag (Number 2).

Now, I define a regex that encompasses the entire sample text (Number 1) by enclosing the Source text in a group in parentheses and doing the same with the Target text. This is crucial because it tells the parser which content should be in the source segment and which content should be in the target segment:

Regex grouping


Now, select this option (Number 4) to indicate that we will extract both Source and Target.

Next, choose the group to which the Source text belongs (Number 5) and the Target text (Number 6), which we can preview in the preview window (Number 3). Click on OK.

The next step is to define the text that we want to convert into inline tags:

General options


Adding inline tags


Click on "Add":

Preview inline tags


And fill in the fields according to your needs. Once you have the filter configured, accept all the open windows, and you will return to the main Rainbow window. Go to this option:

File Format Conversion...


And in the window that appears, execute the action with the following options:

TMX conversion


The result is the same translation memory you had before, but with the code enclosed in standard <ph> tags of the TMX format:


TMX correctly tagged


TMX correctly tagged:

TMX correctly tagged as seen in Heartsome TMX Editor


I would like to emphasize that the created filters/parsers can be saved and used in future tasks, as well as adapted to new needs.

As you may have noticed, mastery of regex syntax is not just important but essential in our profession. One of my favorite websites for testing regex is the following. It shows developed and explained regex patterns, groups, etc., and even allows the use of different regex dialects for testing purposes:

regex101: build, test, and debug regex

Regex tester preview


Regex explanation and tips






Govind PS

Expert Trados Trainer/Consultant since 2014 | Strong Expertise in Translation software: CAT TOOL, TMX/TBX editors & Aligners

1 年

As usual, it is a great and knowledgeful article! Just to add: there is a Trados related app: https://appstore.rws.com/Plugin/198 OR It is also an independent app: Glossary converter: https://cerebus.de/glossaryconverter/index.html This app can convert TMX into Excel, TMX into SDLTB, SDLTM and many more formats! Just give it a try! Keep up the good work!

Maria Virgínia B.

Multilingual Translator | Subtitler | Interpreter English, French, Spanish > European Portuguese | Member of SUBTLE — the Subtitlers’ Association

1 年

Víctor Parra, this is a masterclass ??. Awesome and insightful content here. Thanks ??

要查看或添加评论,请登录

社区洞察

其他会员也浏览了