Posted by

Posted on

January 16, 2014

Posted under

Comments

Convert OmegaT project to XLIFF for other CAT tools

I’m back with another little script that might be pretty handy for those who need to work on the same material in different CAT tools, or for translation agencies who use OmegaT as their main CAT application but farm out the work to translators using their CAT tools of choice. As a matter of fact, the script was requested by translation agency Velior for this very reason.
When the script is invoked, it writes out a file named PROJECTNAME.xlf (PROJECTNAME is the actual name of the project, not this loudly yelled word, of course), and the file is located in script_output subfolder of the current project. It exports both translated (they get “final” state in the resultant XLF file) and untranslated segments, and for untranslated segments the source is copied to the target, and such segments get “needs-translation” state. OmegaT segmentation and tags are preserved. Tags get enveloped in <ph id=”x”> and </ph>, so that they are treated as tags in other CAT tools.
Here is the link where you can download the ready-to-use (albeit still BETA) version:

write_xliff.groovy

To get the translation back to OmegaT once the file has been processed in another CAT tool, it’s advised to use Okapi Framework (Rainbow for GUI/Tikal for command line). To get 100% transferability the pipeline in Okapi should include TMX export and Inline codes removal (remove marker, keep content). The script can write out a .rnb file (enabled by default) that can be opened in Rainbow.
Here’s how conversion to TMX is done in Rainbow:

Start Rainbow.
Open the settings .rnb file created by the script (located in script_output subfolder of the project).
Drag the PROJECTNAME.xlf into the first tab of Rainbow window.
Go to Utilities → Edit/Execute Pipeline and press Execute button in the window. Several settings might need to be tweaked for TMX conversion step (see screenshot).
The TMX file will be created in the same folder where the XLF file was.

It has been tested with Virtaal, Transolution Xliff Editor, SDL Trados Studio 2011, Kilgray MemoQ 2013, and ATRIL Déjà Vu X2. These programs can create TMX files containing the translation that is supposedly the same as in the XLF file. But when those TMX’s are used back in OmegaT, there are always issues with tags. To get “perfect” matches, the XLF itself has to be converted as described above.

The script is in BETA stage. It means that whatever happens to your data, hardware or mental state, I didn’t do it! More tests are always appreciated. Bug reports and feature requests can be left here as comments or filed at SourceForge bug tracker (make sure you’re filing them in my project, not in the project for OmegaT, as I don’t want to be hated by OmegaT developers).

UPDATE

Converting XLF to TMX to be used back in OmegaT now can be automated. See this post for details.

But as of now,
GOOD LUCK

16 thoughts on “Convert OmegaT project to XLIFF for other CAT tools”

Pingback: (CAT) - Convert OmegaT project to XLIFF for oth...
Pingback: (CAT) – Convert OmegaT project to XLIFF for other CAT tools | Translator’s Recipes | Glossarissimo!
Pingback: Convert OmegaT project to XLIFF for other CAT t...
Pingback: (CAT) - Convert OmegaT project to XLIFF for oth...
Skybridge Translation says:

January 28, 2014 at 12:06

Great info. Thanks!

Reply
Pingback: (CAT) - Convert OmegaT project to XLIFF for oth...
Manuel Souto Pico says:

February 8, 2016 at 14:03

Hi there, Kos. Thank you for this wondeful script, it come in very useful to outsource part of a project in OmegaT to linguistis who use OLT. However, I found a small glitch. The original XLIFF files have something like: <b>sometext</b> and in your merged XLIFF I get: <bzzzsmall>/bzzz. This is probably due to the specific nature of my XLIFF files, not your script, but not knowing for sure I wanted to share it with you, in case you have any tips. Thanks!

Reply
- Kos Ivantsov (VerdaKáfo) says:
  
  February 11, 2016 at 10:38
  
  No, that has very little to do with the particularities of your files. It has everything to do with me being a lousy coder. I’ll see if I can do anything, I just don’t know when.
  
  Reply
  - Manuel Souto Pico (@msoutopico) says:
    
    February 14, 2016 at 21:44
    
    Thanks for your reply.
    
    Reply
Manuel Souto Pico says:

February 8, 2016 at 14:08

Sorry, HTML tags got interpreted. Let me try again: I have <b>sometext</b> and in your merged XLIFF I get: <ph id=”1″><bzzzsometext></ph>/bzzz

Reply
Pingback: (CAT) – Convert OmegaT project to XLIFF for other CAT tools | Translator’s Recipes – Glossarissimo!

Hi, I made small fixes as described in the Yahoo Group to make it run in the latest versions of OmegaT (tested on 3.).

	/*
	* @author: Kos Ivantsov
	* @date: 2014-01-16
	* @version: 0.6
	* Original source: https://libretraduko.wordpress.com/2014/01/16/convert-omegat-project-to-xliff-for-other-cat-tools/
	* Changed by: Nilo Menezes based on comments found at: https://groups.yahoo.com/neo/groups/OmegaT/conversations/messages/36979?guccounter=1&guce_referrer=aHR0cHM6Ly93d3cucHJvei5jb20vZm9ydW0vb21lZ2F0X3N1cHBvcnQvMjk1Njk3LXNjcmlwdGluZ19lcnJvcl93cml0ZV9ub3Rlc2dyb292eS5odG1sP3ByaW50PTE&guce_referrer_sig=AQAAACDZGXME9vTwjAuMHaM7j3K0FEPwQrvNY_LsWUBToTf6SFPwBzx9MSTH7iqrgsLmAtpPdtImfYEwM3fBuxiPBQcHakGJ87H1xbk_9UWYO0KrQeQXjQD0IsaR-0t6vXk6VPJJS2d6Ln_fgRptS2qozN9VM34PzhQZM51-jYSsdbrM
	*/

	/* set to true to write a settings file for Okapi Rainbow that can be
	* used to convert the XLF file produced by this script, to TMX
	* otherwise set to false */
	def rainbow = true

	/* set to true to output only approved entries from XLF to TMX during
	* conversion in Rainbow */
	def get_only_approved = true

	import static javax.swing.JOptionPane.*
	import static org.omegat.util.Platform.*
	import org.omegat.util.StringUtil

	def prop = project.projectProperties
	if (!prop) {
	final def title = 'Export project to XLIFF file(s)'
	final def msg = 'Please try again after you open a project.'
	showMessageDialog null, msg, title, INFORMATION_MESSAGE
	return
	}
	def folder = prop.projectRoot+'script_output/'
	projname = new File(prop.getProjectRoot()).getName()
	xliff_file = new File(folder + projname +'.xlf')
	// create folder if it doesn't exist
	if (! (new File (folder)).exists()) {
	(new File(folder)).mkdir()
	}
	count = 0
	ignorecount = 0
	transcount = 0
	writecount = 0

	def sourceLocale = prop.getSourceLanguage().toString().toLowerCase()
	def targetLocale = prop.getTargetLanguage().toString().toLowerCase()
	xliff_file.write("""<?xml version="1.0" encoding="UTF-8"?>
	<xliff xmlns="urn:oasis:names:tc:xliff:document:1.2" version="1.2">
	""", 'UTF-8')

	files = project.projectFiles
	for (i in 0 ..< files.size())
	{
	fi = files[i]
	xliff_file.append(""" <file original="$fi.filePath" source-language="$sourceLocale" target-language="$targetLocale" datatype="x-application/x-tmx+xml">
	<body>
	<trans-unit id="0" approved="yes">
	<source xml:lang="$sourceLocale"><ph id="filename">==FILENAME: "$fi.filePath"==</ph>
	</source>
	<target xml:lang="$targetLocale" state="final"><ph id="filename">==FILENAME: "$fi.filePath"==</ph>
	</target>
	</trans-unit>
	""", 'UTF-8')
	for (j in 0 ..< fi.entries.size())
	{
	def state
	def approved = ''
	def unitnote = ''
	def ignore = ''
	ste = fi.entries[j]
	seg_num = ste.entryNum()
	source = ste.getSrcText()
	info = project.getTranslationInfo(ste)
	target = info ? info.translation : null
	if (target == null){
	state = 'state="needs-translation"'
	target = "$source"
	}else{
	approved = ' approved="yes"'
	state = 'state="final" state-qualifier="exact-match"'
	transcount++
	}
	if (target.size() == 0 ){
	target = "<EMPTY>"
	}
	if (info.hasNote()) {
	unitnote = "\n <note>${StringUtil.makeValidXML(info.note)}</note>"
	}
	if (source ==~ /(<\/?[a-z]+[0-9]* ?\/?>){1,5}/ ){
	ignoresource = source
	ignore = 'yes'
	}
	source = source.replaceAll(/(<)(\/?[a-z]+[0-9]* ?\/?)(>)/, /zzz$2zzz/).replaceAll(/</, /zzz#LESSTHEN#zzz/).replaceAll(/>/, /zzz#GREATERTHEN#zzz/).replaceAll(/(zzz)(\/?[a-z]+[0-9]* ?\/?)(zzz)/, /<$2>/)

	target = target.replaceAll(/(<)(\/?[a-z]+[0-9]* ?\/?)(>)/, /zzz$2zzz/).replaceAll(/</, /zzz#LESSTHEN#zzz/).replaceAll(/>/, /zzz#GREATERTHEN#zzz/).replaceAll(/(zzz)(\/?[a-z]+[0-9]* ?\/?)(zzz)/, /<$2>/)

	source = StringUtil.makeValidXML(source).replaceAll(/</, /<ph></).replaceAll(/>/, /><\/ph>/).replaceAll(/zzz#LESSTHEN#zzz/, /</).replaceAll(/zzz#GREATERTHEN#zzz/, />/)

	target = StringUtil.makeValidXML(target).replaceAll(/</, /<ph></).replaceAll(/>/, /><\/ph>/).replaceAll(/zzz#LESSTHEN#zzz/, /</).replaceAll(/zzz#GREATERTHEN#zzz/, />/)

	tagnumber = source.findAll(/<ph>/).size()
	if (tagnumber > 0) {
	tgnum = 0
	while (tgnum++ <= tagnumber) {
	source = source.replaceFirst(/<ph>/, "<ph id=\"$tgnum\">")
	target = target.replaceFirst(/<ph>/, "<ph id=\"$tgnum\">")
	//console.println "count: "+tagnumber+"\n"+source
	}
	}
	if (source =~ '<ph>')
	source = source.replaceAll('<ph>', '<ph id="orph"')

	if (target =~ '<ph>')
	target = target.replaceAll('<ph>', '<ph id="orph">')

	if (ignore != 'yes'){
	xliff_file.append("""\
	<trans-unit id="$seg_num"$approved>
	<source xml:lang="$sourceLocale">$source</source>
	<seg-source><mrk mid="0" mtype="seg">$source</mrk></seg-source>
	<target $state xml:lang="$targetLocale"><mrk mid="0" mtype="seg">$target</mrk></target>$unitnote
	</trans-unit>
	""", 'UTF-8')
	writecount++
	}else{
	ignorecount++
	}
	count++
	}
	xliff_file.append(" </body>\n </file>\n", 'UTF-8')
	}
	xliff_file.append("</xliff>", 'UTF-8')
	console.println """
	${''(xliff_file.toString().size()+12)}
	Output file: $xliff_file
	${''(xliff_file.toString().size()+12)}
	Segments processed: $count
	Segments written: $writecount
	Segments not written: $ignorecount
	Translated segments written: $transcount
	Untranslated segments written: ${writecount-transcount}
	"""
	if (rainbow == true) {
	def approved
	if (get_only_approved == true){
	approved = 'true'
	}else{
	approved = 'false'
	}
	rainbowfile = new File(folder + projname +'.xlf2tmx.rnb')
	rainbowfile.write("""\
	<?xml version="1.0" encoding="UTF-8"?>
	<rainbowProject version="4">
	<fileSet id="1">
	<root useCustom="0"></root>
	</fileSet>
	<fileSet id="2">
	<root useCustom="0"></root>
	</fileSet>
	<fileSet id="3">
	<root useCustom="0"></root>
	</fileSet>
	<output>
	<root use="0"></root>
	<subFolder use="0"></subFolder>
	<extension use="1" style="0">.out</extension>
	<replace use="0" oldText="" newText=""></replace>
	<prefix use="0"></prefix>
	<suffix use="0"></suffix>
	</output>
	<options sourceLanguage="$sourceLocale" sourceEncoding="UTF-8" targetLanguage="$targetLocale" targetEncoding="UTF-8"></options>
	<parametersFolder useCustom="0"></parametersFolder>
	<utilities xml:spaces="preserve"><params id="currentProjectPipeline"><?xml version="1.0" encoding="UTF-8"?>
	<rainbowPipeline version="1"><step class="net.sf.okapi.steps.common.RawDocumentToFilterEventsStep"></step>
	<step class="net.sf.okapi.steps.codesremoval.CodesRemovalStep">#v1
	stripSource.b=true
	stripTarget.b=true
	mode.i=0
	includeNonTranslatable.b=true
	replaceWithSpace.b=false</step>
	<step class="net.sf.okapi.steps.formatconversion.FormatConversionStep">#v1
	singleOutput.b=false
	autoExtensions.b=true
	targetStyle.i=0
	outputPath=
	outputFormat=tmx
	useGenericCodes.b=false
	skipEntriesWithoutText.b=true
	approvedEntriesOnly.b=$approved
	overwriteSameSource.b=false</step>
	<step class="net.sf.okapi.steps.common.FilterEventsToRawDocumentStep"></step>
	</rainbowPipeline>
	</params></utilities>
	</rainbowProject>
	""", 'UTF-8')
	}
	return

view raw

xliff_export.groovy

hosted with ❤ by GitHub

msoutopico says:

January 17, 2022 at 20:50

Hi Kos. Some more feedback: I tried your script today again to export a one-file OmegaT project as XLIFF. I got this error in the scripting console:

The script “C:\Users\souto\AppData\Roaming\OmegaT\scripts\write_xliff.groovy” is running…
An error occurred
javax.script.ScriptException: javax.script.ScriptException: groovy.lang.MissingMethodException: No signature of method: static org.omegat.util.StaticUtils.makeValidXML() is applicable for argument types: (String) values: [<x1/>]

Indeed, my first segment has `<x1/>`. I am using the OpenXML filter.

Reply
- Kos Ivantsov (VerdaKáfo) says:
  
  January 17, 2022 at 21:58
  
  Thanks for the reply. I’ll try to make sure this script works in the latest version of OmegaT, and will report back here.
  
  Reply
icylave (@icylave) says:

March 14, 2022 at 05:12

Thanks a lot for sharing this amazing script! I am totally new to Omegat, so my question is how to invoke write_xliff.groovy?

Reply
- icylave (@icylave) says:
  
  March 14, 2022 at 05:37
  
  I’ve figured it out. Open an project in Omegat and then select Script under Tools menu.
  
  Reply

True Translation

A daily life between languages

Convert OmegaT project to XLIFF for other CAT tools

UPDATE

16 thoughts on “Convert OmegaT project to XLIFF for other CAT tools”

Leave a reply to Kos Ivantsov (VerdaKáfo) Cancel reply

Convert OmegaT project to XLIFF for other CAT tools

UPDATE

Share this:

16 thoughts on “Convert OmegaT project to XLIFF for other CAT tools”

Leave a reply to Kos Ivantsov (VerdaKáfo) Cancel reply