One of the recent scripts published here allowed OmegaT users who wanted their project to be worked on in a different CAT tool, to export the whole OmegaT project to an XLIFF file. To get the completed work back to OmegaT, one had to run Okapi Rainbow to convert XLIFF to TMX, possibly using the Rainbow settings file created by the script.
In this post I’ll share how to convert those OmegaT-created XLIFF files finished (or partly finished) in Trados/MemoQ/Deja Vu/WhatNotCAT back to TMX that can be used in OmegaT (all tags preserved, of course, that was the whole point), right from within OmegaT, without running Rainbow manually.
The conversion is done by Rainbow, and it should be installed on your computer. But Rainbow is run in a command-line mode so the user simply has to start the script, select the XLIFF file when being prompted, and choose where and under what name to save the TMX.
The script should be added to OmegaT scripts and invoked from OmegaT. When it’s run, it will ask the user for the XLIFF file and where to save the resultant TMX. There are two lines each user must specify for him/herself to point to Java and Rainbow (see right under the listing), and a few defaults in the script behavior can be changed (details under the listing as well).
Below is the listing of the script, the heading is the link.
- xliff2tmx.groovy
/* * @Purpose: Convert translated XLF to TMX using Okapi Rainbow * @author: Kos Ivantsov * @date: 2014-02-12 * @version: 0.4 */ /* set path to JAVA executable * Something like * /C:\Program Files\Java\jre7\bin\java.exe/ on Windows or * '/usr/bin/java' on GNU/Linux */ def java = '/usr/bin/java' /* specify complete path to rainbow.jar * Something like * /C:\Program Files\Okapi\lib\rainbow.jar/ on Windows or * '/opt/okapi/lib/rainbow.jar' on GNU/Linux */ def rainbow = '/opt/okapi/lib/rainbow.jar' /* the script can either ask to specify where to save TMX. Otherwise * (if "show_save_dialog" isn't set to "true", * it will be saved as "/script_output/xlf2tmx/project_save.tmx" in the * current project folder */ def show_save_dialog = true def guessname = true def tmxname = 'project_save.tmx' //if 'show_save_dialog' or 'guessname' are set to false def fixstudio = true import java.io.IOException import java.nio.file.CopyOption import java.nio.file.Files import java.nio.file.Path import java.nio.file.Paths import java.nio.file.StandardCopyOption import javax.swing.filechooser.FileFilter import javax.swing.JFileChooser import static javax.swing.JOptionPane.* import static org.omegat.util.Platform.* def title = 'Converting XLIFF to TMX' def showMessage = { msg -> showMessageDialog null, msg, title, INFORMATION_MESSAGE } //popup a message if java or rainbow aren't found at the above set paths if (! (new File(java)).exists()) { final def msg = 'Java can\'t be found at the specified path.' console.print msg showMessage msg return} if (! (new File(rainbow)).exists()) { final def msg = 'rainbow.jar can\'t be found at the specified path.' console.print msg showMessage msg return} //set up variables to be used in the script def prop = project.projectProperties if (!prop) { final def msg = 'This script requires an open project to run' showMessage msg return } def outfolder = prop.projectRoot+'script_output' def tmpfolder = outfolder+'/tmp' def xlfmemfolder = outfolder+'/xlf2tmx' def rnbproj = new File(tmpfolder+'/xlf2tmx.rnb') def rnbpln = new File(tmpfolder+'/xlf2tmx.pln') def log = tmpfolder+'/log.txt' def startroot = new File(prop.getProjectRoot()) def sourceLocale = prop.getSourceLanguage().getLanguageCode() def targetLocale = prop.getTargetLanguage().getLanguageCode() //run filechooser to select xliff file(s) JFileChooser selfiles = new JFileChooser( currentDirectory: startroot, dialogTitle: "Choose files to export", fileSelectionMode: JFileChooser.FILES_ONLY, //the file filter must show also directories, in order to be able to look into them multiSelectionEnabled: false, fileFilter: [getDescription: {-> "*.xlf; *.xliff"}, accept:{file-> file ==~ /.*?\.[Xx][Ll][Ii]?[Ff]{1,2}/ || file.isDirectory() }] as FileFilter) if(selfiles.showOpenDialog() != JFileChooser.APPROVE_OPTION) { console.println "Canceled" return } // create folders if they don't exist def outcreate def tmpcreate def xlfmemfoldercreate if (! (new File(outfolder)).exists()) { (new File(outfolder)).mkdir() outcreate = '1' } if (! (new File(tmpfolder)).exists()) { (new File(tmpfolder)).mkdir() tmpcreate = '1' } if (! (new File(xlfmemfolder)).exists()) { (new File(xlfmemfolder)).mkdir() xlfmemfoldercreate = '1' } //check show_save_dialog and either ask for filename or just go on def xlfmem if (show_save_dialog == true){ //check if guessname is set to true and if so //get xlf filename and propose it as a name for tmx if (guessname == true){ tmxname = selfiles.selectedFile.name - selfiles.selectedFile.name.split("\\.")[selfiles.selectedFile.name.split("\\.").size()-1]+'tmx'} //run filechooser to select filename for converted tmx JFileChooser convtmx = new JFileChooser( currentDirectory: new File(xlfmemfolder), dialogTitle: "Save TMX as...", approveButtonText: "Save", multiSelectionEnabled: false, fileFilter: [getDescription: {-> "*.tmx"}, accept:{file-> file ==~ /.*?\.[Tt][Mm][Xx]/ || file.isDirectory() }] as FileFilter) convtmx.setSelectedFile(new File(tmxname)) if(convtmx.showSaveDialog() != JFileChooser.APPROVE_OPTION) { console.println "Canceled" if (outcreate == '1') (new File(outfolder)).deleteDir() if (tmpcreate == '1') (new File(tmpcreate)).deleteDir() if (xlfmemfoldercreate == '1') (new File(xlfmemfolder)).deleteDir() return } xlfmem = convtmx.selectedFile }else{ xlfmem = new File(xlfmemfolder+'/'+tmxname) } //collect strings with basedir of original xliff and xliff filenames def xlffolder = selfiles.selectedFile.parent def files = """\ <fi fs="okf_xliff" fo="" se="" te="">$selfiles.selectedFile.name</fi> """ //write settings files rnbpln.write("""\ <?xml version="1.0" encoding="UTF-8"?> <rainbowPipeline version="1"> <step class="net.sf.okapi.steps.searchandreplace.SearchAndReplaceStep">#v1 regEx.b=true dotAll.b=false ignoreCase.b=true multiLine.b=false target.b=false source.b=true replaceALL.b=true replacementsPath= logPath=$tmpfolder/replacementsLog.txt saveLog.b=false count.i=2 use0=true search0=xml:lang=\\"$sourceLocale-[a-z]{2}\\" replace0=xml:lang=\\"$sourceLocale\\" use1=true search1=source-language=\\"$sourceLocale-[a-z]{2}\\" replace1=source-language=\\"$sourceLocale\\" </step> <step class="net.sf.okapi.steps.common.RawDocumentToFilterEventsStep"> </step> <step class="net.sf.okapi.steps.codesremoval.CodesRemovalStep">#v1 stripSource.b=true stripTarget.b=true mode.i=0 includeNonTranslatable.b=true replaceWithSpace.b=false </step> <step class="net.sf.okapi.steps.formatconversion.FormatConversionStep">#v1 singleOutput.b=true autoExtensions.b=true targetStyle.i=0 outputPath=${xlfmem.toString()} outputFormat=tmx useGenericCodes.b=false skipEntriesWithoutText.b=true approvedEntriesOnly.b=false overwriteSameSource.b=false </step> <step class="net.sf.okapi.steps.common.FilterEventsToRawDocumentStep"> </step> </rainbowPipeline> """, 'UTF-8') rnbproj.write("""\ <?xml version="1.0" encoding="UTF-8"?> <rainbowProject version="4"> <fileSet id="1"> <root useCustom="1">$xlffolder</root> $files </fileSet> <output> <root use="1">$outfolder</root> <subFolder use="1">tmp</subFolder> <extension use="1" style="0">.out</extension> <replace use="0" oldText="" newText=""></replace> <prefix use="0"></prefix> <suffix use="1"></suffix> </output> <options sourceLanguage="${sourceLocale.toLowerCase()}" sourceEncoding="UTF-8" targetLanguage="${targetLocale.toLowerCase()}" targetEncoding="UTF-8"> </options> <parametersFolder useCustom="0"></parametersFolder> <utilities xml:spaces="preserve"> </utilities> </rainbowProject> """, 'UTF-8') //backup existing xliff file, make replacements to fix Studio issues if (fixstudio == true) { new File(selfiles.selectedFile.path+".bak").bytes = new File(selfiles.selectedFile.path).bytes String contents = selfiles.selectedFile.getText('UTF-8') contents = contents.replaceAll(/(\<\/mrk\>)(.*)(\<\/target\>)/, /$2$1$3/) contents = contents.replaceAll(/(\<mrk.*\/\>)(.*)(\<\/target\>)/, /$1$2\<\/mrk\>$3/) contents = contents.replaceAll(/(\<mrk.*)(\/\>)/, /$1\>/) selfiles.selectedFile.write(contents, 'UTF-8') } //run rainbow with just created settings files run_line = ["$java", "-jar", "$rainbow", "-p", "$rnbproj", "-pln", "$rnbpln", "-log", "$log", "-np"] run_line.execute().waitFor() //print Rainbow log console.println(new File(log).text.toString()) //remove temp folder new File(tmpfolder).deleteDir() //restore original xlf if it was backed up if (new File(selfiles.selectedFile.path+".bak").exists()) { new File(selfiles.selectedFile.path).bytes = new File(selfiles.selectedFile.path+".bak").bytes new File(selfiles.selectedFile.path+".bak").delete() } return
There are a few things the user has to specify in the script to make sure it finds Java and Rainbow executables to be able to run them. Those are set up in lines 13 and 20. That needs to be done only once — not every time the script is run. In the comments preceding the lines I tried my best to explain how to specify the paths, but if there are any problems, you can always ask here in comments.
Beside that, there are a few things the user can tweak in the scripts behavior.
- By default, the user is being prompted where to save the TMX. That can be disabled (line 28, set
show_save_dialog
tofalse
), and then the TMX will always be saved in/script_output/xlf2tmx/project_save.tmx
(the actual filename can be changed, read on) in the current OmegaT project folder. - If the user is asked for filename and location of the TMX, the script suggests a filename for it based on the actual filename of the XLIFF file. That can be disabled (line 29, set
guessname
tofalse
), and then the filename will beproject_save.tmx
(still read on to learn how to change it). - The filename for the resultant TMX that is going to be suggested or used automatically is set in line 30 (set
tmxname
to anything you like, and don’t forget quotes). - Then lastly, it looks like there are a few issues with files containing translation done in Studio that hinder proper conversion to TMX. To work around those issues line 31 sets
fixstudio
totrue
which can be changed if needed.
- By default, the user is being prompted where to save the TMX. That can be disabled (line 28, set
I’d be happy to get comments on how this script works or fails to work for you. Also I have a rather limited testing possibilities so if you happen to use these scripts and run into the problem, don’t hesitate to report bugs or suggestions.
Meanwhile I want to thank Roman Mironov and translation agency Velior for using OmegaT as a cornerstone of their productivity (if I understand it correctly), coming up with with ideas for OmegaT scripts and willing to support their development. And for growing to be a personal friend, too. Thank you, Roma!
But as of now,
Good luck!
I have pasted the xliff2tmx.groovy in the C:\Program Files (x86)\OmegaT\scripts folder, and I have assigned a shortcut to the script in OmegaT. Lines 13 and 20 look like this, respectively:
def java = /C:\Program Files\Java\jre1.8.0_51\bin\java.exe/
def rainbow = /C:\Apps\okapi-apps_win32-x86_64_0.27\lib\rainbow.jar/
However, when I run the script I get the message “Java can’t be found at the specified path”.
Any suggestion?
Make sure you edit the script file and save it before running it with a shortcut. Also make sure you do have java.exe at the exact location you specified. Other than that I don’t know what can be wrong.
Thanks, Kos. It seems I hadn’t saved the script. It works now!
If you accept a suggestion for enhacement, it would very nice to be able to select several XLIFF files rather than just one.
I have another question. Is it possible to tweak the script so that context is retained and each translation in the TMX file is leveraged only for the segment it was originally extracted from? (so that no auto-propagation is possible for repetitions). Cheers 🙂