XLIFF to TMX

One of the recent scripts published here allowed OmegaT users who wanted their project to be worked on in a different CAT tool, to export the whole OmegaT project to an XLIFF file. To get the completed work back to OmegaT, one had to run Okapi Rainbow to convert XLIFF to TMX, possibly using the Rainbow settings file created by the script.

In this post I’ll share how to convert those OmegaT-created XLIFF files finished (or partly finished) in Trados/MemoQ/Deja Vu/WhatNotCAT back to TMX that can be used in OmegaT (all tags preserved, of course, that was the whole point), right from within OmegaT, without running Rainbow manually.

The conversion is done by Rainbow, and it should be installed on your computer. But Rainbow is run in a command-line mode so the user simply has to start the script, select the XLIFF file when being prompted, and choose where and under what name to save the TMX.

The script should be added to OmegaT scripts and invoked from OmegaT. When it’s run, it will ask the user for the XLIFF file and where to save the resultant TMX. There are two lines each user must specify for him/herself to point to Java and Rainbow (see right under the listing), and a few defaults in the script behavior can be changed (details under the listing as well).

Below is the listing of the script, the heading is the link.

  • xliff2tmx.groovy
    /* 
     * @Purpose:    Convert translated XLF to TMX using Okapi Rainbow
     * @author: Kos Ivantsov
     * @date:   2014-02-12
     * @version:    0.4
     */
    
    /* set path to JAVA executable
     * Something like
     * /C:\Program Files\Java\jre7\bin\java.exe/ on Windows or
     * '/usr/bin/java' on GNU/Linux
     */
    def java = '/usr/bin/java'
    
    /* specify complete path to rainbow.jar
     * Something like 
     * /C:\Program Files\Okapi\lib\rainbow.jar/ on Windows or 
     * '/opt/okapi/lib/rainbow.jar' on GNU/Linux
     */
    def rainbow = '/opt/okapi/lib/rainbow.jar'
    
    /* the script can either ask to specify where to save TMX. Otherwise
     * (if "show_save_dialog" isn't set to "true",
     * it will be saved as "/script_output/xlf2tmx/project_save.tmx" in the
     * current project folder
     */
    
    def show_save_dialog = true
    def guessname = true
    def tmxname = 'project_save.tmx' //if 'show_save_dialog' or 'guessname' are set to false
    def fixstudio = true
    
    import java.io.IOException
    import java.nio.file.CopyOption
    import java.nio.file.Files
    import java.nio.file.Path
    import java.nio.file.Paths
    import java.nio.file.StandardCopyOption
    import javax.swing.filechooser.FileFilter
    import javax.swing.JFileChooser
    import static javax.swing.JOptionPane.*
    import static org.omegat.util.Platform.*
    
    def title = 'Converting XLIFF to TMX'
    def showMessage = { msg -> showMessageDialog null, msg, title, INFORMATION_MESSAGE }
    
    //popup a message if java or rainbow aren't found at the above set paths
    if (! (new File(java)).exists()) {
        final def msg   = 'Java can\'t be found at the specified path.'
        console.print msg
        showMessage msg
        return}
    if (! (new File(rainbow)).exists()) {
        final def msg   = 'rainbow.jar can\'t be found at the specified path.'
        console.print msg
        showMessage msg
        return}
    
    //set up variables to be used in the script
    def prop = project.projectProperties
    if (!prop) {
        final def msg = 'This script requires an open project to run'
        showMessage msg
        return
    }
    def outfolder = prop.projectRoot+'script_output'
    def tmpfolder = outfolder+'/tmp'
    def xlfmemfolder = outfolder+'/xlf2tmx'
    
    def rnbproj = new File(tmpfolder+'/xlf2tmx.rnb')
    def rnbpln = new File(tmpfolder+'/xlf2tmx.pln')
    def log = tmpfolder+'/log.txt'
    
    def startroot = new File(prop.getProjectRoot())
    def sourceLocale = prop.getSourceLanguage().getLanguageCode()
    def targetLocale = prop.getTargetLanguage().getLanguageCode()
    
    //run filechooser to select xliff file(s)
    JFileChooser selfiles = new JFileChooser(
        currentDirectory: startroot,
        dialogTitle: "Choose files to export",
        fileSelectionMode: JFileChooser.FILES_ONLY, 
        //the file filter must show also directories, in order to be able to look into them
        multiSelectionEnabled: false,
        fileFilter: [getDescription: {-> "*.xlf; *.xliff"}, accept:{file-> file ==~ /.*?\.[Xx][Ll][Ii]?[Ff]{1,2}/ || file.isDirectory() }] as FileFilter)
        if(selfiles.showOpenDialog() != JFileChooser.APPROVE_OPTION) {
        console.println "Canceled"
        return
        }
    
    // create folders if they don't exist
    def outcreate
    def tmpcreate
    def xlfmemfoldercreate
    if (! (new File(outfolder)).exists()) {
        (new File(outfolder)).mkdir()
        outcreate = '1'
        }
    if (! (new File(tmpfolder)).exists()) {
        (new File(tmpfolder)).mkdir()
        tmpcreate = '1'
        }
    if (! (new File(xlfmemfolder)).exists()) {
        (new File(xlfmemfolder)).mkdir()
        xlfmemfoldercreate = '1'
        }
    
    //check show_save_dialog and either ask for filename or just go on
    def xlfmem
    if (show_save_dialog == true){
        //check if guessname is set to true and if so
        //get xlf filename and propose it as a name for tmx
        if (guessname == true){
        tmxname = selfiles.selectedFile.name - selfiles.selectedFile.name.split("\\.")[selfiles.selectedFile.name.split("\\.").size()-1]+'tmx'}
        //run filechooser to select filename for converted tmx
        JFileChooser convtmx = new JFileChooser(
            currentDirectory: new File(xlfmemfolder),
            dialogTitle: "Save TMX as...",
            approveButtonText: "Save",
            multiSelectionEnabled: false,
            fileFilter: [getDescription: {-> "*.tmx"}, accept:{file-> file ==~ /.*?\.[Tt][Mm][Xx]/ || file.isDirectory() }] as FileFilter)
            convtmx.setSelectedFile(new File(tmxname))
            if(convtmx.showSaveDialog() != JFileChooser.APPROVE_OPTION) {
            console.println "Canceled"
                if (outcreate == '1')
                (new File(outfolder)).deleteDir()
                if (tmpcreate == '1')
                (new File(tmpcreate)).deleteDir()
                if (xlfmemfoldercreate == '1')
                (new File(xlfmemfolder)).deleteDir()
            return
            }
        xlfmem = convtmx.selectedFile
    }else{
        xlfmem = new File(xlfmemfolder+'/'+tmxname)
        }
    
    //collect strings with basedir of original xliff and xliff filenames
    def xlffolder = selfiles.selectedFile.parent
    def files = """\
            <fi fs="okf_xliff" fo="" se="" te="">$selfiles.selectedFile.name</fi>
    """
    
    //write settings files
    rnbpln.write("""\
    <?xml version="1.0" encoding="UTF-8"?>
    <rainbowPipeline version="1">
        <step class="net.sf.okapi.steps.searchandreplace.SearchAndReplaceStep">#v1
            regEx.b=true
            dotAll.b=false
            ignoreCase.b=true
            multiLine.b=false
            target.b=false
            source.b=true
            replaceALL.b=true
            replacementsPath=
            logPath=$tmpfolder/replacementsLog.txt
            saveLog.b=false
            count.i=2
            use0=true
            search0=xml:lang=\\"$sourceLocale-[a-z]{2}\\"
            replace0=xml:lang=\\"$sourceLocale\\"
            use1=true
            search1=source-language=\\"$sourceLocale-[a-z]{2}\\"
            replace1=source-language=\\"$sourceLocale\\"
        </step>
        <step class="net.sf.okapi.steps.common.RawDocumentToFilterEventsStep">
        </step>
        <step class="net.sf.okapi.steps.codesremoval.CodesRemovalStep">#v1
            stripSource.b=true
            stripTarget.b=true
            mode.i=0
            includeNonTranslatable.b=true
            replaceWithSpace.b=false
        </step>
        <step class="net.sf.okapi.steps.formatconversion.FormatConversionStep">#v1
            singleOutput.b=true
            autoExtensions.b=true
            targetStyle.i=0
            outputPath=${xlfmem.toString()}
            outputFormat=tmx
            useGenericCodes.b=false
            skipEntriesWithoutText.b=true
            approvedEntriesOnly.b=false
            overwriteSameSource.b=false
        </step>
        <step class="net.sf.okapi.steps.common.FilterEventsToRawDocumentStep">
        </step>
    </rainbowPipeline>
    """, 'UTF-8')
    
    rnbproj.write("""\
    <?xml version="1.0" encoding="UTF-8"?>
    <rainbowProject version="4">
        <fileSet id="1">
            <root useCustom="1">$xlffolder</root>
    $files  </fileSet>
        <output>
            <root use="1">$outfolder</root>
            <subFolder use="1">tmp</subFolder>
            <extension use="1" style="0">.out</extension>
            <replace use="0" oldText="" newText=""></replace>
            <prefix use="0"></prefix>
            <suffix use="1"></suffix>
        </output>
        <options sourceLanguage="${sourceLocale.toLowerCase()}" sourceEncoding="UTF-8" targetLanguage="${targetLocale.toLowerCase()}" targetEncoding="UTF-8">
        </options>
        <parametersFolder useCustom="0"></parametersFolder>
        <utilities xml:spaces="preserve">
        </utilities>
    </rainbowProject>
    """, 'UTF-8')
    
    //backup existing xliff file, make replacements to fix Studio issues
    if (fixstudio == true) {
        new File(selfiles.selectedFile.path+".bak").bytes = new File(selfiles.selectedFile.path).bytes
        String contents = selfiles.selectedFile.getText('UTF-8') 
        contents = contents.replaceAll(/(\<\/mrk\>)(.*)(\<\/target\>)/, /$2$1$3/)
        contents = contents.replaceAll(/(\<mrk.*\/\>)(.*)(\<\/target\>)/, /$1$2\<\/mrk\>$3/)
        contents = contents.replaceAll(/(\<mrk.*)(\/\>)/, /$1\>/)
        selfiles.selectedFile.write(contents, 'UTF-8')
        }
    
    //run rainbow with just created settings files
    run_line = ["$java", "-jar", "$rainbow", "-p", "$rnbproj", "-pln", "$rnbpln", "-log", "$log", "-np"]
    run_line.execute().waitFor()
    //print Rainbow log
    console.println(new File(log).text.toString())
    //remove temp folder
    new File(tmpfolder).deleteDir()
    //restore original xlf if it was backed up
    if (new File(selfiles.selectedFile.path+".bak").exists()) {
        new File(selfiles.selectedFile.path).bytes = new File(selfiles.selectedFile.path+".bak").bytes
        new File(selfiles.selectedFile.path+".bak").delete()
        }
    return
    

    There are a few things the user has to specify in the script to make sure it finds Java and Rainbow executables to be able to run them. Those are set up in lines 13 and 20. That needs to be done only once — not every time the script is run. In the comments preceding the lines I tried my best to explain how to specify the paths, but if there are any problems, you can always ask here in comments.

    Beside that, there are a few things the user can tweak in the scripts behavior.

    • By default, the user is being prompted where to save the TMX. That can be disabled (line 28, set show_save_dialog to false), and then the TMX will always be saved in /script_output/xlf2tmx/project_save.tmx (the actual filename can be changed, read on) in the current OmegaT project folder.
    • If the user is asked for filename and location of the TMX, the script suggests a filename for it based on the actual filename of the XLIFF file. That can be disabled (line 29, set guessname to false), and then the filename will be project_save.tmx (still read on to learn how to change it).
    • The filename for the resultant TMX that is going to be suggested or used automatically is set in line 30 (set tmxname to anything you like, and don’t forget quotes).
    • Then lastly, it looks like there are a few issues with files containing translation done in Studio that hinder proper conversion to TMX. To work around those issues line 31 sets fixstudio to true which can be changed if needed.

I’d be happy to get comments on how this script works or fails to work for you. Also I have a rather limited testing possibilities so if you happen to use these scripts and run into the problem, don’t hesitate to report bugs or suggestions.
Meanwhile I want to thank Roman Mironov and translation agency Velior for using OmegaT as a cornerstone of their productivity (if I understand it correctly), coming up with with ideas for OmegaT scripts and willing to support their development. And for growing to be a personal friend, too. Thank you, Roma!


But as of now,
Good luck!

6 thoughts on “XLIFF to TMX

  1. Pingback: (CAT) - XLIFF to TMX | Translator's Recipes | G...
  2. Pingback: (CAT) – XLIFF to TMX | Translator’s Recipes | Glossarissimo!
  3. I have pasted the xliff2tmx.groovy in the C:\Program Files (x86)\OmegaT\scripts folder, and I have assigned a shortcut to the script in OmegaT. Lines 13 and 20 look like this, respectively:
    def java = /C:\Program Files\Java\jre1.8.0_51\bin\java.exe/
    def rainbow = /C:\Apps\okapi-apps_win32-x86_64_0.27\lib\rainbow.jar/
    However, when I run the script I get the message “Java can’t be found at the specified path”.
    Any suggestion?

  4. I have another question. Is it possible to tweak the script so that context is retained and each translation in the TMX file is leveraged only for the segment it was originally extracted from? (so that no auto-propagation is possible for repetitions). Cheers 🙂

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s