Batch Files for Localization Engineers: Part I - File Preparation
Command Line

Batch Files for Localization Engineers: Part I - File Preparation

NOTE: All the software used in this article is Open Source, meaning it is free and can be modified according to your needs.

As localization engineers, we leverage batch (.bat) files to automate repetitive tasks and optimize workflows. These text-based scripts containing pre-defined command sequences offer significant advantages:

  • Reduced Manual Effort: Batch files automate repetitive tasks like file copying, format conversion, and pseudotranslation, freeing up time for higher-level activities.
  • Minimized Human Error: Standardized execution of tasks minimizes the risk of errors introduced during manual processing.
  • Improved Efficiency: Streamlined workflows through automation lead to faster completion times for localization projects.
  • Enhanced Consistency: Batch files ensure consistent task execution across different project iterations, guaranteeing uniform results.

Specific Applications

  • File Preparation: Automate tasks like copying files in specific formats and locations, ensuring a consistent starting point for translators.
  • Pseudotranslation: Batch scripts can iterate through file sets, invoking dedicated pseudotranslation tools to create preliminary translated versions for linguistic validation.
  • Quality Assurance: Automating repetitive QA checks within batch files can expedite the process and improve overall workflow efficiency.

Specialized Considerations

  • Integration with Localization Tools: Batch files can be integrated with existing translation memory (TM) tools and computer-assisted translation (CAT) software to create a more robust automation pipeline.
  • Customization for Project Needs: Batch scripts can be tailored to specific project requirements, automating tasks unique to particular file formats or workflows.

Batch files are text files containing a sequence of commands executed by the operating system's command interpreter. This seemingly simple functionality makes them a powerful tool for automation in the localization engineering field.

Effective utilization of batch files requires understanding their syntax and structure. Common to most batch files is the @echo off command at the beginning, which suppresses the display of executed commands, resulting in cleaner output. Each subsequent line typically consists of a command followed by any necessary arguments or parameters.

Example

@echo off
echo Hello, World!
pause        

This example demonstrates basic syntax:

  • @echo off: Disables command echoing.
  • echo Hello, World!: Displays the text "Hello, World!" on the console.
  • pause: Pauses execution until a key is pressed.

Importance in Localization Engineering

Localization projects often involve repetitive tasks like file copying, format conversion, and pseudotranslation. Manual execution of these tasks is time-consuming and prone to human error.

Batch files streamline these workflows by encapsulating command sequences. Localization engineers can automate entire processes by executing a single script. This saves time and ensures consistent task execution, minimizing potential discrepancies across project iterations.

Example

Consider a scenario requiring pseudotranslation of multiple XLF files for linguistic validation. Instead of manual processing, a batch file automates the task:

@echo off
for %%i in (*.xlf) do ( "pseudotranslator.exe" "%%i" "%%~ni_pseudo.xlf" )        

This example iterates through all XLF files in the current directory, utilizing pseudotranslator.exe to perform pseudotranslation, and saving the output with a "_pseudo" suffix.

Batch files also ensure consistency in file preparation and processing. Standardized task execution facilitates uniformity across projects and iterations. This consistency is crucial for maintaining the quality and integrity of localized content, especially in multilingual projects with strict linguistic and cultural requirements.

Real Use Case

My Production team asks me to prepare a bunch of JSON files for translation that, obviously, meet the scope and the client’s specific segmentation needs.

This process can be quite time-consuming without automation. Here's a breakdown of the manual steps:

1. Setting Up Folders:

  • Create separate folders for:

- 00_src: Original JSON files (backup)

- 01_prep\aa_master: Master preparation (where I'll convert JSONs to translation format ready for linguists)

- 01_prep\bb_pseudo: Pseudotranslated files (containing gibberish versions for reference and testing)

- 01_prep\cc_configs: Configuration files (holding settings for translation tools like parsers and segmentation rules)

Basic Preparation Folder Structure


2. Copying Files:

  • Manually copy all JSON files my Prod Team wants to translate to the "Original JSON files" folder and all configuration files for my ENG team to check in case something went wrong, to keep track.

3. Using Okapi Framework Tools (Manual):

  • Open Rainbow, load my JSON files, set the project locales, select parser, segmentation rules, and execute the text extraction and creation of XLF files for translation

Rainbow Project Creation


4. Manually Specify Translation Details:

  • I'll need to know the source and target languages for the translation.
  • Locate and specify the configuration files (like .fprm and .srx, parser and segmentation rules, respectively) used by the Rainbow. These files define how the JSON files are converted and segmented for translation.

JSON Filter Configuration


5. Pseudotranslation (Manual):

  • I'll need to copy the prepped XLF files into the 01_prep\bb_pseudo folder, load those in Rainbow again, set the pseudotranslation environment and perform the task.

Pseudotranslation Pre-Defined Pipeline


6. Merging Pseudo-translated Files (Manual):

  • Also, I'll need to generate final JSON files with pseudotranslated content. This is a crucial step to check the segmentation has been correctly applied and that we've covered all the required text in scope from the client requirements, as well as to check we have tagged code and text out of scope using inline tags. This will imply loading the pseudotranslated XLF files in Rainbow again and generate final JSON files.

7. Using OmegaT for Checks:

  • Open the OmegaT (or whatever CAT Tool you use) and manually create a new project with the pseudotranslated files.

Pseudotranslated XLF from JSON in OmegaT


As you can see, the process is lengthy, tedious, highly manual, and can lead to human errors.

Instead of doing all this manual work, we can invest time in creating a batch script so that all the tasks mentioned earlier can be performed with a simple keystroke and (almost) no intervention from the engineer.

It’s worth noting that you need to learn bash syntax to write these scripts, as well as spend time testing to ensure that everything works correctly in different scenarios.

The good news is that the syntax is extremely basic, with thousands of web pages where you can learn it. It’s an effort that will bring great satisfaction later, believe me.

Let's take a closer look to the script I've just written to automate all the tasks above:

@echo off
title JSON File Preparation
setlocal
color a
cls

REM Set paths
set "original_path=%cd%"
set "sourcePath=%original_path%\01_prep\aa_master"
set "targetPath=%original_path%\01_prep\bb_pseudo"
set "okapiPath=C:\Users\vicpa\Work\software\okapi-apps_win32-x86_64_1.46.0"
set "omegatPath=C:\Users\vicpa\Work\software\OmegaT_6.1.0_Beta_Without_JRE"

REM Display initial information
echo.
echo -------------------------------------------------
echo JSON File Preparation
echo victor.parra ^| Last update 2024-20-05
echo Java
echo Okapi Framework
echo OmegaT
echo -------------------------------------------------
echo.

REM Get source and target languages
echo.
echo    What is the source language?
echo.
set /p SL=
echo.
echo    What is the target language?
echo.
set /p TL=

REM Create folder structure
echo.
echo    Creating folder structure...
echo.
mkdir 00_src 01_prep\aa_master 01_prep\bb_pseudo 01_prep\cc_configs

REM Copy required files
echo.
echo    Copying required files...
echo.
robocopy . 00_src *.json
robocopy 00_src 01_prep\aa_master *.json
robocopy C:\Users\vicpa\Work\parsers 01_prep\cc_configs [email protected]
robocopy C:\Users\vicpa\Work\segmentation 01_prep\cc_configs defaultSegmentation.srx
robocopy C:\Users\vicpa\Work\pipelines 01_prep\cc_configs pseudotranslator.pln
xcopy /e /v C:\Users\vicpa\Work\project_templates .

REM Create XLF files
echo.
echo    Creating XLF files...
echo.
call "%okapiPath%\tikal.bat" -x 01_prep\aa_master\*.json -fc 01_prep\cc_configs\*.fprm -sl %SL% -tl %TL% -seg 01_prep\cc_configs\*.srx

REM Pseudotranslate prepped files
echo.
echo    Pseudotranslating prepped files...
echo.

cd /d "%okapiPath%"

for /R "%sourcePath%" %%i in (*.xlf) do (
    "%okapiPath%\rainbow.exe" -pln "%original_path%\01_prep\cc_configs\pseudotranslator.pln" -sl %SL% -tl %TL% "%%i" -o "%targetPath%\%%~ni.xlf" -np
)
timeout /t 4 /nobreak >nul

cd /d "%original_path%"

call "%okapiPath%\tikal.bat" -m 01_prep\bb_pseudo\*.xlf -fc 01_prep\cc_configs\*.fprm -sl %SL% -tl %TL% -sd 01_prep\aa_master -od 01_prep\bb_pseudo

robocopy 01_prep\bb_pseudo pseudo_project\source *.xlf

call java -jar -Xmx1024M "%omegatPath%\OmegaT.jar" %* %~dp0 pseudo_project >nul 2>&1

endlocal        

This batch file automates preparing JSON files for translation:

1. Setting Up:

  • It hides the commands being executed for a cleaner screen (setlocal @echo off).
  • It sets some colors for the text (color a).
  • It clears the screen (cls).
  • It remembers the current folder location (original_path).
  • It defines paths for folders containing source files, target pseudotranslated files, configuration files, Okapi Framework tools, and OmegaT software (various "set" commands).
  • It displays a title ("JSON File Preparation") and some information about the script's purpose, author, and tools used.

2. Getting User Input:

  • It asks the user for the source language (SL) and target language (TL) to be translated. This is the only human input the script needs as it's variable depending on the project.

Script Basic Information


3. Creating Folders:

  • It creates several folders for different purposes: source files, master preparation, pseudo-translated files, and configuration files.

4. Copying Files:

  • It copies all JSON files from the current folder to the source folder (robocopy).
  • It then copies those same JSON files from the source folder to the master preparation folder (another robocopy).
  • It copies specific configuration files from user-defined locations (C:\Users\vicpa\Work) to the configuration folder (multiple robocopy commands).
  • Finally, it copies all files (including subfolders) from a project template folder to the current folder (xcopy).

5. Creating XLF Files (Translation Files):

  • It uses a tool called Tikal from the Okapi Framework that works with the command line to convert the JSON files in the master preparation folder to XLF files (used for translation). This involves specifying the source and target languages, configuration files, and a segmentation rule file.

Tikal Extraction Feature


6. Pseudotranslating Files:

  • It changes the directory to the Okapi Framework folder.
  • It loops through all XLF files in the master preparation folder.
  • For each XLF file, it uses Rainbow from the command line to perform a pseudotranslation. This creates a pseudotranslated version (with gibberish mimicking the source language) in the target pseudotranslated folder.

Pseudotranslated XLF


  • It changes the directory back to the original folder.
  • It uses Tikal again, but this time to merge the pseudotranslated XLF files with the original JSON files. This creates a final set of JSON files with pseudotranslated content for reference.

Tikal Merge Feature


Target Pseudotranslated JSON file


7. Copying Final Files:

  • It copies all XLF files from the target pseudotranslated folder to an OmegaT project template I use just to check pseudos called "pseudo_project/source".

OmegaT Project Template



8. Running OmegaT (Translation Tool):

  • And finally, It uses the "java" command to run the OmegaT software with the pseudotranslated project already loaded, hiding any log messages from OmegaT itself (>nul 2>&1).

OmegaT Pseudo Project


Overall, this batch file automates a complex process of preparing JSON files for translation using various tools. It simplifies the steps for the user by automating file handling, configuration, and tool execution.

As you can see, all I had to do was copy my source JSON files to the “prep” folder and paste a simple batch file (.bat) into that same folder. When I click on it, I’ll enter the locales, and I won’t have to do anything else. Pretty cool, right?

It is important to consider that as we become more proficient in bash syntax, we can create much more complex batch files. For instance, we can store the file extension as a variable and be capable of handling preparations for multiple different file formats simultaneously. Additionally, we can manage translation memories by adding Python subscripts. The possibilities are endless!

In conclusion, batch files have cemented themselves as an indispensable tool for localization engineers. Their ability to automate repetitive tasks transforms workflows, unlocking significant efficiency gains and tangible benefits for projects. This automation frees engineers from mundane activities like file manipulation and pseudotranslation, allowing them to focus on higher-level tasks that leverage their expertise. Standardized execution through batch files minimizes human error and ensures consistency across projects, ultimately enhancing the quality and integrity of localized content.

Batch files offer a versatile solution, seamlessly integrating with existing localization tools and allowing for customization to meet specific project needs. This adaptability fosters a streamlined and efficient localization process, empowering engineers to deliver projects faster and with greater consistency.

Ultimately, mastering batch file syntax empowers localization engineers to leverage the full potential of automation, driving efficiency, consistency, and success in their endeavors.

I am always in awe of people who can make Windows .bat files behave, this is great work. ??

Balraj Ponnuswamy

Automation using Python with Selenium (RPA with Python Developer) with 6+ years

6 个月

Great work and l am loving your approach and detailed presentation about your automation. Great sharing.

Christine B.

Sprache ?? IT

6 个月

Thank you very much for sharing, and keeping promoting the great features of the Okapi framework! I have been a huge fan of Okapi and its value for localization engineering tasks for more than a decade (see also my 2020 article in the MultiLingual magazine: https://multilingual.com/issues/july-aug-2020/localization-engineering-for-translators/ ). Regarding your use cases for performing pseudo-translation and converting .json files to .xliff - Okapi helped me to master a similar challenge last summer: I needed to process MarkDown (.md) files for/in a CAT tool that did not have a MarkDown filter. Thanks to Okapi Rainbow, its MarkDown filter and its batch functionality, I managed to get the .md<-> xliff conversion up and running. Pseudo-translation (Rainbow's "Text Modification") helped a lot to find out how to customize the MarkDown filter for the specific set of files :-)

Govind PS

Expert Trados Trainer/Consultant since 2014 | Strong Expertise in Translation software: CAT TOOL, TMX/TBX editors & Aligners

6 个月

great work! I am a fan of yours or inspired by your localization engineering techniques! Will follow and learn always!

要查看或添加评论,请登录

社区洞察

其他会员也浏览了