Creating Functional TMX Files from Bilingual Documents with Tag Preservation
Let's imagine that our client provides us with an Excel file with Column A in English and Column B containing the Spanish translation. They ask us to use that alignment for our translation (this procedure is also valid when they provide us with translation memories with untagged code, as explained later in the article):
As we can see, the segmentation is based on sentences, with each segment corresponding to a cell.
Here, we have several options. The first one would be to transform this bilingual table
We open Heartsome TMX Editor and go to the following path in the menu:
Next, a menu appears where we select:
It is important to note that for our translation memory to be generated properly, the first row of the bilingual table should contain the language and country code
Now, we execute the action in Heartsome TMX Editor, and we obtain the memory already opened in the program itself
As we can observe, the XML code content is still in plain text, and this can cause issues when using it in our CAT tool:
This is where the Rainbow application from the Okapi Framework suite comes into play. It is worth noting that this method also applies when the client provides us with a TMX translation memory with untagged code that we would like to tag.
Open Rainbow and load our newly created translation memory into the Input 1 tab. Double-click on the filter name, select "Regex Filter>Create..." and give it a desired name. In this case, since we are creating a regex filter
Next, the filter configuration window opens:
And we start by adding our first rule by clicking on "Add...":
Give a name to the rule (in my case, "Scope" to better identify the content, but again, the name is irrelevant) and accept it. Now, the window to configure the rule appears.
Here, the most important step is to open our newly created translation memory in a rich text editor, such as Notepad++, and copy a complete translation unit that we will use to preview our regex in real-time. Select the content that will be the Source and the content that will be the Target:
Actually, at this step, it doesn't matter which translation unit we copy. The important thing is that it is a complete translation unit, meaning from the "<tu>" tag to the closing "</tu>" tag. When we paste the text, we do it in the rule configuration window that opened in Rainbow.
After editing the rule, the previous window would look like this (don't worry, I will explain it step by step):
领英推荐
The first step is to copy the complete translation unit, from the "<tu>" tag to the "</tu>" tag (Number 2).
Now, I define a regex that encompasses the entire sample text (Number 1) by enclosing the Source text in a group in parentheses and doing the same with the Target text. This is crucial because it tells the parser which content should be in the source segment and which content should be in the target segment:
Now, select this option (Number 4) to indicate that we will extract both Source and Target.
Next, choose the group to which the Source text belongs (Number 5) and the Target text (Number 6), which we can preview in the preview window (Number 3). Click on OK.
The next step is to define the text that we want to convert into inline tags
Click on "Add":
And fill in the fields according to your needs. Once you have the filter configured, accept all the open windows, and you will return to the main Rainbow window. Go to this option:
And in the window that appears, execute the action with the following options:
The result is the same translation memory you had before, but with the code enclosed in standard <ph> tags of the TMX format:
TMX correctly tagged:
I would like to emphasize that the created filters/parsers can be saved and used in future tasks, as well as adapted to new needs.
As you may have noticed, mastery of regex syntax is not just important but essential in our profession. One of my favorite websites for testing regex is the following. It shows developed and explained regex patterns, groups, etc., and even allows the use of different regex dialects for testing purposes:
Expert Trados Trainer/Consultant since 2014 | Strong Expertise in Translation software: CAT TOOL, TMX/TBX editors & Aligners
1 年As usual, it is a great and knowledgeful article! Just to add: there is a Trados related app: https://appstore.rws.com/Plugin/198 OR It is also an independent app: Glossary converter: https://cerebus.de/glossaryconverter/index.html This app can convert TMX into Excel, TMX into SDLTB, SDLTM and many more formats! Just give it a try! Keep up the good work!
Multilingual Translator | Subtitler | Interpreter English, French, Spanish > European Portuguese | Member of SUBTLE — the Subtitlers’ Association
1 年Víctor Parra, this is a masterclass ??. Awesome and insightful content here. Thanks ??