Batch Files for Localization Engineers: Part I - File Preparation
NOTE: All the software used in this article is Open Source, meaning it is free and can be modified according to your needs.
As localization engineers, we leverage batch (.bat) files to automate repetitive tasks and optimize workflows. These text-based scripts containing pre-defined command sequences offer significant advantages:
Specific Applications
Specialized Considerations
Batch files are text files containing a sequence of commands executed by the operating system's command interpreter. This seemingly simple functionality makes them a powerful tool for automation in the localization engineering field.
Effective utilization of batch files requires understanding their syntax and structure. Common to most batch files is the @echo off command at the beginning, which suppresses the display of executed commands, resulting in cleaner output. Each subsequent line typically consists of a command followed by any necessary arguments or parameters.
Example
@echo off
echo Hello, World!
pause
This example demonstrates basic syntax:
Importance in Localization Engineering
Localization projects often involve repetitive tasks like file copying, format conversion, and pseudotranslation. Manual execution of these tasks is time-consuming and prone to human error.
Batch files streamline these workflows by encapsulating command sequences. Localization engineers can automate entire processes by executing a single script. This saves time and ensures consistent task execution, minimizing potential discrepancies across project iterations.
Example
Consider a scenario requiring pseudotranslation of multiple XLF files for linguistic validation. Instead of manual processing, a batch file automates the task:
@echo off
for %%i in (*.xlf) do ( "pseudotranslator.exe" "%%i" "%%~ni_pseudo.xlf" )
This example iterates through all XLF files in the current directory, utilizing pseudotranslator.exe to perform pseudotranslation, and saving the output with a "_pseudo" suffix.
Batch files also ensure consistency in file preparation and processing. Standardized task execution facilitates uniformity across projects and iterations. This consistency is crucial for maintaining the quality and integrity of localized content, especially in multilingual projects with strict linguistic and cultural requirements.
Real Use Case
My Production team asks me to prepare a bunch of JSON files for translation that, obviously, meet the scope and the client’s specific segmentation needs.
This process can be quite time-consuming without automation. Here's a breakdown of the manual steps:
1. Setting Up Folders:
- 00_src: Original JSON files (backup)
- 01_prep\aa_master: Master preparation (where I'll convert JSONs to translation format ready for linguists)
- 01_prep\bb_pseudo: Pseudotranslated files (containing gibberish versions for reference and testing)
- 01_prep\cc_configs: Configuration files (holding settings for translation tools like parsers and segmentation rules)
2. Copying Files:
3. Using Okapi Framework Tools (Manual):
4. Manually Specify Translation Details:
5. Pseudotranslation (Manual):
6. Merging Pseudo-translated Files (Manual):
领英推荐
7. Using OmegaT for Checks:
As you can see, the process is lengthy, tedious, highly manual, and can lead to human errors.
Instead of doing all this manual work, we can invest time in creating a batch script so that all the tasks mentioned earlier can be performed with a simple keystroke and (almost) no intervention from the engineer.
It’s worth noting that you need to learn bash syntax to write these scripts, as well as spend time testing to ensure that everything works correctly in different scenarios.
The good news is that the syntax is extremely basic, with thousands of web pages where you can learn it. It’s an effort that will bring great satisfaction later, believe me.
Let's take a closer look to the script I've just written to automate all the tasks above:
@echo off
title JSON File Preparation
setlocal
color a
cls
REM Set paths
set "original_path=%cd%"
set "sourcePath=%original_path%\01_prep\aa_master"
set "targetPath=%original_path%\01_prep\bb_pseudo"
set "okapiPath=C:\Users\vicpa\Work\software\okapi-apps_win32-x86_64_1.46.0"
set "omegatPath=C:\Users\vicpa\Work\software\OmegaT_6.1.0_Beta_Without_JRE"
REM Display initial information
echo.
echo -------------------------------------------------
echo JSON File Preparation
echo victor.parra ^| Last update 2024-20-05
echo Java
echo Okapi Framework
echo OmegaT
echo -------------------------------------------------
echo.
REM Get source and target languages
echo.
echo What is the source language?
echo.
set /p SL=
echo.
echo What is the target language?
echo.
set /p TL=
REM Create folder structure
echo.
echo Creating folder structure...
echo.
mkdir 00_src 01_prep\aa_master 01_prep\bb_pseudo 01_prep\cc_configs
REM Copy required files
echo.
echo Copying required files...
echo.
robocopy . 00_src *.json
robocopy 00_src 01_prep\aa_master *.json
robocopy C:\Users\vicpa\Work\parsers 01_prep\cc_configs [email protected]
robocopy C:\Users\vicpa\Work\segmentation 01_prep\cc_configs defaultSegmentation.srx
robocopy C:\Users\vicpa\Work\pipelines 01_prep\cc_configs pseudotranslator.pln
xcopy /e /v C:\Users\vicpa\Work\project_templates .
REM Create XLF files
echo.
echo Creating XLF files...
echo.
call "%okapiPath%\tikal.bat" -x 01_prep\aa_master\*.json -fc 01_prep\cc_configs\*.fprm -sl %SL% -tl %TL% -seg 01_prep\cc_configs\*.srx
REM Pseudotranslate prepped files
echo.
echo Pseudotranslating prepped files...
echo.
cd /d "%okapiPath%"
for /R "%sourcePath%" %%i in (*.xlf) do (
"%okapiPath%\rainbow.exe" -pln "%original_path%\01_prep\cc_configs\pseudotranslator.pln" -sl %SL% -tl %TL% "%%i" -o "%targetPath%\%%~ni.xlf" -np
)
timeout /t 4 /nobreak >nul
cd /d "%original_path%"
call "%okapiPath%\tikal.bat" -m 01_prep\bb_pseudo\*.xlf -fc 01_prep\cc_configs\*.fprm -sl %SL% -tl %TL% -sd 01_prep\aa_master -od 01_prep\bb_pseudo
robocopy 01_prep\bb_pseudo pseudo_project\source *.xlf
call java -jar -Xmx1024M "%omegatPath%\OmegaT.jar" %* %~dp0 pseudo_project >nul 2>&1
endlocal
This batch file automates preparing JSON files for translation:
1. Setting Up:
2. Getting User Input:
3. Creating Folders:
4. Copying Files:
5. Creating XLF Files (Translation Files):
6. Pseudotranslating Files:
7. Copying Final Files:
8. Running OmegaT (Translation Tool):
Overall, this batch file automates a complex process of preparing JSON files for translation using various tools. It simplifies the steps for the user by automating file handling, configuration, and tool execution.
As you can see, all I had to do was copy my source JSON files to the “prep” folder and paste a simple batch file (.bat) into that same folder. When I click on it, I’ll enter the locales, and I won’t have to do anything else. Pretty cool, right?
It is important to consider that as we become more proficient in bash syntax, we can create much more complex batch files. For instance, we can store the file extension as a variable and be capable of handling preparations for multiple different file formats simultaneously. Additionally, we can manage translation memories by adding Python subscripts. The possibilities are endless!
In conclusion, batch files have cemented themselves as an indispensable tool for localization engineers. Their ability to automate repetitive tasks transforms workflows, unlocking significant efficiency gains and tangible benefits for projects. This automation frees engineers from mundane activities like file manipulation and pseudotranslation, allowing them to focus on higher-level tasks that leverage their expertise. Standardized execution through batch files minimizes human error and ensures consistency across projects, ultimately enhancing the quality and integrity of localized content.
Batch files offer a versatile solution, seamlessly integrating with existing localization tools and allowing for customization to meet specific project needs. This adaptability fosters a streamlined and efficient localization process, empowering engineers to deliver projects faster and with greater consistency.
Ultimately, mastering batch file syntax empowers localization engineers to leverage the full potential of automation, driving efficiency, consistency, and success in their endeavors.
I am always in awe of people who can make Windows .bat files behave, this is great work. ??
Automation using Python with Selenium (RPA with Python Developer) with 6+ years
6 个月Great work and l am loving your approach and detailed presentation about your automation. Great sharing.
Sprache ?? IT
6 个月Thank you very much for sharing, and keeping promoting the great features of the Okapi framework! I have been a huge fan of Okapi and its value for localization engineering tasks for more than a decade (see also my 2020 article in the MultiLingual magazine: https://multilingual.com/issues/july-aug-2020/localization-engineering-for-translators/ ). Regarding your use cases for performing pseudo-translation and converting .json files to .xliff - Okapi helped me to master a similar challenge last summer: I needed to process MarkDown (.md) files for/in a CAT tool that did not have a MarkDown filter. Thanks to Okapi Rainbow, its MarkDown filter and its batch functionality, I managed to get the .md<-> xliff conversion up and running. Pseudo-translation (Rainbow's "Text Modification") helped a lot to find out how to customize the MarkDown filter for the specific set of files :-)
Expert Trados Trainer/Consultant since 2014 | Strong Expertise in Translation software: CAT TOOL, TMX/TBX editors & Aligners
6 个月great work! I am a fan of yours or inspired by your localization engineering techniques! Will follow and learn always!