登录查看更多内容

Creating Functional TMX Files from Bilingual Documents with Tag Preservation

Víctor Parra

Head of Localization Engineering @ LanguageWire

发布日期: 2023年11月7日

Let's imagine that our client provides us with an Excel file with Column A in English and Column B containing the Spanish translation. They ask us to use that alignment for our translation (this procedure is also valid when they provide us with translation memories with untagged code, as explained later in the article):

As we can see, the segmentation is based on sentences, with each segment corresponding to a cell.

Here, we have several options. The first one would be to transform this bilingual table into a translation memory in TMX format. However, there is something crucial to consider at first glance: the text contains content that should be tagged in the translation memory.

We open Heartsome TMX Editor and go to the following path in the menu:

Convert to TMX options in Heartsome TMX Editor

Next, a menu appears where we select:

The Excel file with our bilingual table.
The path where we want the generated TMX file to be located.
We check the option "Open TMX after conversion" so that we can take a look and verify that our memory has been generated as desired.

It is important to note that for our translation memory to be generated properly, the first row of the bilingual table should contain the language and country code according to ISO standards:

Now, we execute the action in Heartsome TMX Editor, and we obtain the memory already opened in the program itself

As we can observe, the XML code content is still in plain text, and this can cause issues when using it in our CAT tool:

This is where the Rainbow application from the Okapi Framework suite comes into play. It is worth noting that this method also applies when the client provides us with a TMX translation memory with untagged code that we would like to tag.

Open Rainbow and load our newly created translation memory into the Input 1 tab. Double-click on the filter name, select "Regex Filter>Create..." and give it a desired name. In this case, since we are creating a regex filter to tag inline content in TMX files, I named it "tmx_tagger," but the actual name you choose is irrelevant. Accept the action:

Next, the filter configuration window opens:

And we start by adding our first rule by clicking on "Add...":

Give a name to the rule (in my case, "Scope" to better identify the content, but again, the name is irrelevant) and accept it. Now, the window to configure the rule appears.

Here, the most important step is to open our newly created translation memory in a rich text editor, such as Notepad++, and copy a complete translation unit that we will use to preview our regex in real-time. Select the content that will be the Source and the content that will be the Target:

Actually, at this step, it doesn't matter which translation unit we copy. The important thing is that it is a complete translation unit, meaning from the "<tu>" tag to the closing "</tu>" tag. When we paste the text, we do it in the rule configuration window that opened in Rainbow.

After editing the rule, the previous window would look like this (don't worry, I will explain it step by step):

领英推荐

How I Translated a Novel with ChatGPT

Xavi Arderiu 1 年前

The last untranslatable topics

Gengo 7 个月前

About Language Variants – European Portuguese

Maria Virgínia B. 1 年前

The first step is to copy the complete translation unit, from the "<tu>" tag to the "</tu>" tag (Number 2).

Now, I define a regex that encompasses the entire sample text (Number 1) by enclosing the Source text in a group in parentheses and doing the same with the Target text. This is crucial because it tells the parser which content should be in the source segment and which content should be in the target segment:

Now, select this option (Number 4) to indicate that we will extract both Source and Target.

Next, choose the group to which the Source text belongs (Number 5) and the Target text (Number 6), which we can preview in the preview window (Number 3). Click on OK.

The next step is to define the text that we want to convert into inline tags:

Click on "Add":

And fill in the fields according to your needs. Once you have the filter configured, accept all the open windows, and you will return to the main Rainbow window. Go to this option:

And in the window that appears, execute the action with the following options:

The result is the same translation memory you had before, but with the code enclosed in standard <ph> tags of the TMX format:

TMX correctly tagged:

I would like to emphasize that the created filters/parsers can be saved and used in future tasks, as well as adapted to new needs.

As you may have noticed, mastery of regex syntax is not just important but essential in our profession. One of my favorite websites for testing regex is the following. It shows developed and explained regex patterns, groups, etc., and even allows the use of different regex dialects for testing purposes:

regex101: build, test, and debug regex

带有此图标的链接由领英创建，不带此图标的链接由作者添加。

LocEngineering

2,221 位关注者

Govind PS

Expert Trados Trainer/Consultant since 2014 | Strong Expertise in Translation software: CAT TOOL, TMX/TBX editors & Aligners

1 年

As usual, it is a great and knowledgeful article! Just to add: there is a Trados related app: https://appstore.rws.com/Plugin/198 OR It is also an independent app: Glossary converter: https://cerebus.de/glossaryconverter/index.html This app can convert TMX into Excel, TMX into SDLTB, SDLTM and many more formats! Just give it a try! Keep up the good work!

1 次回应

Maria Virgínia B.

Multilingual Translator | Subtitler | Interpreter English, French, Spanish > European Portuguese | Member of SUBTLE — the Subtitlers’ Association

1 年

Víctor Parra, this is a masterclass ??. Awesome and insightful content here. Thanks ??

1 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

Creating Functional TMX Files from Bilingual Documents with Tag Preservation

Víctor Parra

Head of Localization Engineering @ LanguageWire

领英推荐

LocEngineering

2,221 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

Effortless Translations with Azure Translator

Use Case: AI-assisted translation App

Are language professionals doomed?

Translating Without Computers is a Thing of the Past

The Latest News from ARC Writing and Translation Services

Ducks, Eagles and Machine Translation

Why Neural Machine Translation Is Not Your Enemy

The difference between semantic translation and communicative translation

TTC wetranslate Monthly Newsletter May, 2023

Lost in translation

领英推荐

LocEngineering

2,221 位关注者

ID-Based Alignment: A Technical Deep Dive for Localization Professionals

2024年6月10日

Batch Files for Localization Engineers: Part I - File Preparation

2024年5月21日

Efficient Text Extraction for Localization Excellence

2023年12月19日

The Tunnel under Ocean Boulevard of QA in Localization

2023年12月9日

Mastering Text Alignment

2023年11月23日

Decoding the Role of Encoding in the Localization Industry

2023年11月3日

PO Files: Versatile and Human-Readable Powerhouses for Localization

2023年10月30日

Converting Paragraph-Based TMX to Sentence-Based Segmentation

2023年10月24日

JSON Files: A Flexible and Lightweight Format for Localization

2023年10月3日

Empowering Localization: Unveiling the Power of Stop Word Removal and Term Extraction

2023年8月28日

社区洞察

其他会员也浏览了

Effortless Translations with Azure Translator

Use Case: AI-assisted translation App

Are language professionals doomed?

Translating Without Computers is a Thing of the Past

The Latest News from ARC Writing and Translation Services

Ducks, Eagles and Machine Translation

Why Neural Machine Translation Is Not Your Enemy

The difference between semantic translation and communicative translation

TTC wetranslate Monthly Newsletter May, 2023

Lost in translation