Convert OmegaT project to XLIFF for other CAT tools

I’m back with another little script that might be pretty handy for those who need to work on the same material in different CAT tools, or for translation agencies who use OmegaT as their main CAT application but farm out the work to translators using their CAT tools of choice. As a matter of fact, the script was requested by translation agency Velior for this very reason.
When the script is invoked, it writes out a file named PROJECTNAME.xlf (PROJECTNAME is the actual name of the project, not this loudly yelled word, of course), and the file is located in script_output subfolder of the current project. It exports both translated (they get “final” state in the resultant XLF file) and untranslated segments, and for untranslated segments the source is copied to the target, and such segments get “needs-translation” state. OmegaT segmentation and tags are preserved. Tags get enveloped in <ph id=”x”> and </ph>, so that they are treated as tags in other CAT tools.
Here is the link where you can download the ready-to-use (albeit still BETA) version:

To get the translation back to OmegaT once the file has been processed in another CAT tool, it’s advised to use Okapi Framework (Rainbow for GUI/Tikal for command line). To get 100% transferability the pipeline in Okapi should include TMX export and Inline codes removal (remove marker, keep content). The script can write out a .rnb file (enabled by default) that can be opened in Rainbow.
Here’s how conversion to TMX is done in Rainbow:

  1. Start Rainbow.
  2. Open the settings .rnb file created by the script (located in script_output subfolder of the project).
    Open settings file
  3. Drag the PROJECTNAME.xlf into the first tab of Rainbow window.
  4. Go to Utilities → Edit/Execute Pipeline and press Execute button in the window. Several settings might need to be tweaked for TMX conversion step (see screenshot).
    Edit / Execute Pipeline
    Pipeline TMX step
  5. The TMX file will be created in the same folder where the XLF file was.

It has been tested with Virtaal, Transolution Xliff Editor, SDL Trados Studio 2011, Kilgray MemoQ 2013, and ATRIL Déjà Vu X2. These programs can create TMX files containing the translation that is supposedly the same as in the XLF file. But when those TMX’s are used back in OmegaT, there are always issues with tags. To get “perfect” matches, the XLF itself has to be converted as described above.

The script is in BETA stage. It means that whatever happens to your data, hardware or mental state, I didn’t do it! More tests are always appreciated. Bug reports and feature requests can be left here as comments or filed at SourceForge bug tracker (make sure you’re filing them in my project, not in the project for OmegaT, as I don’t want to be hated by OmegaT developers).


UPDATE

Converting XLF to TMX to be used back in OmegaT now can be automated. See this post for details.

But as of now,
GOOD LUCK

16 thoughts on “Convert OmegaT project to XLIFF for other CAT tools

  1. Pingback: (CAT) - Convert OmegaT project to XLIFF for oth...
  2. Pingback: (CAT) – Convert OmegaT project to XLIFF for other CAT tools | Translator’s Recipes | Glossarissimo!
  3. Pingback: Convert OmegaT project to XLIFF for other CAT t...
  4. Pingback: (CAT) - Convert OmegaT project to XLIFF for oth...
  5. Pingback: (CAT) - Convert OmegaT project to XLIFF for oth...
  6. Hi there, Kos. Thank you for this wondeful script, it come in very useful to outsource part of a project in OmegaT to linguistis who use OLT. However, I found a small glitch. The original XLIFF files have something like: <b>sometext</b> and in your merged XLIFF I get: <bzzzsmall>/bzzz. This is probably due to the specific nature of my XLIFF files, not your script, but not knowing for sure I wanted to share it with you, in case you have any tips. Thanks!

  7. Pingback: (CAT) – Convert OmegaT project to XLIFF for other CAT tools | Translator’s Recipes – Glossarissimo!
  8. Hi, I made small fixes as described in the Yahoo Group to make it run in the latest versions of OmegaT (tested on 3.). .gist table { margin-bottom: 0; } This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters /* * @author: Kos Ivantsov * @date: 2014-01-16 * @version: 0.6 * Original source: https://libretraduko.wordpress.com/2014/01/16/convert-omegat-project-to-xliff-for-other-cat-tools/ * Changed by: Nilo Menezes based on comments found at: https://groups.yahoo.com/neo/groups/OmegaT/conversations/messages/36979?guccounter=1&guce_referrer=aHR0cHM6Ly93d3cucHJvei5jb20vZm9ydW0vb21lZ2F0X3N1cHBvcnQvMjk1Njk3LXNjcmlwdGluZ19lcnJvcl93cml0ZV9ub3Rlc2dyb292eS5odG1sP3ByaW50PTE&guce_referrer_sig=AQAAACDZGXME9vTwjAuMHaM7j3K0FEPwQrvNY_LsWUBToTf6SFPwBzx9MSTH7iqrgsLmAtpPdtImfYEwM3fBuxiPBQcHakGJ87H1xbk_9UWYO0KrQeQXjQD0IsaR-0t6vXk6VPJJS2d6Ln_fgRptS2qozN9VM34PzhQZM51-jYSsdbrM */ /* set to true to write a settings file for Okapi Rainbow that can be * used to convert the XLF file produced by this script, to TMX * otherwise set to false */ def rainbow = true /* set to true to output only approved entries from XLF to TMX during * conversion in Rainbow */ def get_only_approved = true import static javax.swing.JOptionPane.* import static org.omegat.util.Platform.* import org.omegat.util.StringUtil def prop = project.projectProperties if (!prop) { final def title = 'Export project to XLIFF file(s)' final def msg = 'Please try again after you open a project.' showMessageDialog null, msg, title, INFORMATION_MESSAGE return } def folder = prop.projectRoot+'script_output/' projname = new File(prop.getProjectRoot()).getName() xliff_file = new File(folder + projname +'.xlf') // create folder if it doesn't exist if (! (new File (folder)).exists()) { (new File(folder)).mkdir() } count = 0 ignorecount = 0 transcount = 0 writecount = 0 def sourceLocale = prop.getSourceLanguage().toString().toLowerCase() def targetLocale = prop.getTargetLanguage().toString().toLowerCase() xliff_file.write("""<?xml version="1.0" encoding="UTF-8"?> <xliff xmlns="urn:oasis:names:tc:xliff:document:1.2" version="1.2"> """, 'UTF-8') files = project.projectFiles for (i in 0 ..< files.size()) { fi = files[i] xliff_file.append(""" <file original="$fi.filePath" source-language="$sourceLocale" target-language="$targetLocale" datatype="x-application/x-tmx+xml"> <body> <trans-unit id="0" approved="yes"> <source xml:lang="$sourceLocale"><ph id="filename">==FILENAME: "$fi.filePath"==</ph> </source> <target xml:lang="$targetLocale" state="final"><ph id="filename">==FILENAME: "$fi.filePath"==</ph> </target> </trans-unit> """, 'UTF-8') for (j in 0 ..< fi.entries.size()) { def state def approved = '' def unitnote = '' def ignore = '' ste = fi.entries[j] seg_num = ste.entryNum() source = ste.getSrcText() info = project.getTranslationInfo(ste) target = info ? info.translation : null if (target == null){ state = 'state="needs-translation"' target = "$source" }else{ approved = ' approved="yes"' state = 'state="final" state-qualifier="exact-match"' transcount++ } if (target.size() == 0 ){ target = "<EMPTY>" } if (info.hasNote()) { unitnote = "n <note>${StringUtil.makeValidXML(info.note)}</note>" } if (source ==~ /(</?[a-z]+[0-9]* ?/?>){1,5}/ ){ ignoresource = source ignore = 'yes' } source = source.replaceAll(/(<)(/?[a-z]+[0-9]* ?/?)(>)/, /zzz$2zzz/).replaceAll(/</, /zzz#LESSTHEN#zzz/).replaceAll(/>/, /zzz#GREATERTHEN#zzz/).replaceAll(/(zzz)(/?[a-z]+[0-9]* ?/?)(zzz)/, /<$2>/) target = target.replaceAll(/(<)(/?[a-z]+[0-9]* ?/?)(>)/, /zzz$2zzz/).replaceAll(/</, /zzz#LESSTHEN#zzz/).replaceAll(/>/, /zzz#GREATERTHEN#zzz/).replaceAll(/(zzz)(/?[a-z]+[0-9]* ?/?)(zzz)/, /<$2>/) source = StringUtil.makeValidXML(source).replaceAll(/&lt;/, /<ph>&lt;/).replaceAll(/&gt;/, /&gt;</ph>/).replaceAll(/zzz#LESSTHEN#zzz/, /&lt;/).replaceAll(/zzz#GREATERTHEN#zzz/, /&gt;/) target = StringUtil.makeValidXML(target).replaceAll(/&lt;/, /<ph>&lt;/).replaceAll(/&gt;/, /&gt;</ph>/).replaceAll(/zzz#LESSTHEN#zzz/, /&lt;/).replaceAll(/zzz#GREATERTHEN#zzz/, /&gt;/) tagnumber = source.findAll(/<ph>/).size() if (tagnumber > 0) { tgnum = 0 while (tgnum++ <= tagnumber) { source = source.replaceFirst(/<ph>/, "<ph id="$tgnum">") target = target.replaceFirst(/<ph>/, "<ph id="$tgnum">") //console.println "count: "+tagnumber+"n"+source } } if (source =~ '<ph>') source = source.replaceAll('<ph>', '<ph id="orph"') if (target =~ '<ph>') target = target.replaceAll('<ph>', '<ph id="orph">') if (ignore != 'yes'){ xliff_file.append(""" <trans-unit id="$seg_num"$approved> <source xml:lang="$sourceLocale">$source</source> <seg-source><mrk mid="0" mtype="seg">$source</mrk></seg-source> <target $state xml:lang="$targetLocale"><mrk mid="0" mtype="seg">$target</mrk></target>$unitnote </trans-unit> """, 'UTF-8') writecount++ }else{ ignorecount++ } count++ } xliff_file.append(" </body>n </file>n", 'UTF-8') } xliff_file.append("</xliff>", 'UTF-8') console.println """ ${'*'*(xliff_file.toString().size()+12)} Output file: $xliff_file ${'*'*(xliff_file.toString().size()+12)} Segments processed: $count Segments written: $writecount Segments not written: $ignorecount Translated segments written: $transcount Untranslated segments written: ${writecount-transcount} """ if (rainbow == true) { def approved if (get_only_approved == true){ approved = 'true' }else{ approved = 'false' } rainbowfile = new File(folder + projname +'.xlf2tmx.rnb') rainbowfile.write(""" <?xml version="1.0" encoding="UTF-8"?> <rainbowProject version="4"> <fileSet id="1"> <root useCustom="0"></root> </fileSet> <fileSet id="2"> <root useCustom="0"></root> </fileSet> <fileSet id="3"> <root useCustom="0"></root> </fileSet> <output> <root use="0"></root> <subFolder use="0"></subFolder> <extension use="1" style="0">.out</extension> <replace use="0" oldText="" newText=""></replace> <prefix use="0"></prefix> <suffix use="0"></suffix> </output> <options sourceLanguage="$sourceLocale" sourceEncoding="UTF-8" targetLanguage="$targetLocale" targetEncoding="UTF-8"></options> <parametersFolder useCustom="0"></parametersFolder> <utilities xml:spaces="preserve"><params id="currentProjectPipeline">&lt;?xml version="1.0" encoding="UTF-8"?> &lt;rainbowPipeline version="1">&lt;step class="net.sf.okapi.steps.common.RawDocumentToFilterEventsStep">&lt;/step> &lt;step class="net.sf.okapi.steps.codesremoval.CodesRemovalStep">#v1 stripSource.b=true stripTarget.b=true mode.i=0 includeNonTranslatable.b=true replaceWithSpace.b=false&lt;/step> &lt;step class="net.sf.okapi.steps.formatconversion.FormatConversionStep">#v1 singleOutput.b=false autoExtensions.b=true targetStyle.i=0 outputPath= outputFormat=tmx useGenericCodes.b=false skipEntriesWithoutText.b=true approvedEntriesOnly.b=$approved overwriteSameSource.b=false&lt;/step> &lt;step class="net.sf.okapi.steps.common.FilterEventsToRawDocumentStep">&lt;/step> &lt;/rainbowPipeline> </params></utilities> </rainbowProject> """, 'UTF-8') } return view raw xliff_export.groovy hosted with ❤ by GitHub
  9. Hi Kos. Some more feedback: I tried your script today again to export a one-file OmegaT project as XLIFF. I got this error in the scripting console:

    The script “C:\Users\souto\AppData\Roaming\OmegaT\scripts\write_xliff.groovy” is running…
    An error occurred
    javax.script.ScriptException: javax.script.ScriptException: groovy.lang.MissingMethodException: No signature of method: static org.omegat.util.StaticUtils.makeValidXML() is applicable for argument types: (String) values: [<x1/>]

    Indeed, my first segment has `<x1/>`. I am using the OpenXML filter.

Leave a Reply to Kos Ivantsov (VerdaKáfo) Cancel reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s