OmegaT exports TMX for current files in the project every time translated documents are created. It writes three TMX files in the root of the project. But what if you need a translation memory file that contains translation units of only one or several files, not all that are currently present in the project. One solution is to temporarily move unneeded files out of source folder, reload the project and then create translated documents. But it is somewhat awkward and time consuming.
Here’s a groovy script that lets you select one or several files located in the same subfolder of the current project’s /source. Once they are selected, the script writes selected_files.[date_time].tmx in the project root. This TMX-file contains TU’s only for the selected files.
- write_sel_files2TMX.groovy
/* * Purpose: Export source and translation segments of user selected * files into TMX-file * #Files: Writes 'selected_files_<date_time>.tmx' in the current project's root * #File format: TMX v.1.4 * #Details: http:/ /wp.me / p3fHEs-6g * * @author Kos Ivantsov * @date 2013-08-12 * @version 0.3 */ import javax.swing.JFileChooser import org.omegat.util.StaticUtils import org.omegat.util.TMXReader import static javax.swing.JOptionPane.* import static org.omegat.util.Platform.* def prop = project.projectProperties if (!prop) { final def title = 'Export TMX from selected files' final def msg = 'Please try again after you open a project.' showMessageDialog null, msg, title, INFORMATION_MESSAGE return } def curtime = new Date().format("MMM-dd-yyyy_HH.mm") srcroot = new File(prop.getSourceRoot()) def fileloc = prop.projectRoot+'selected_files_'+curtime+'.tmx' exportfile = new File(fileloc) def sourceroot = prop.getSourceRoot().toString() as String JFileChooser fc = new JFileChooser( currentDirectory: srcroot, dialogTitle: "Choose files to export", fileSelectionMode: JFileChooser.FILES_ONLY, //the file filter must show also directories, in order to be able to look into them multiSelectionEnabled: true) if(fc.showOpenDialog() != JFileChooser.APPROVE_OPTION) { console.println "Canceled" return } if (!(fc.selectedFiles =~ sourceroot.replaceAll(/\\+/, '\\\\\\\\'))) { console.println "Selection outside of ${prop.getSourceRoot()} folder" final def title = 'Wrong file(s) selected' final def msg = "Files must be in ${prop.getSourceRoot()} folder." showMessageDialog null, msg, title, INFORMATION_MESSAGE return } if (prop.isSentenceSegmentingEnabled()) segmenting = TMXReader.SEG_SENTENCE else segmenting = TMXReader.SEG_PARAGRAPH def sourceLocale = prop.getSourceLanguage().toString() def targetLocale = prop.getTargetLanguage().toString() exportfile.write("", 'UTF-8') exportfile.append("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n", 'UTF-8') exportfile.append("<!DOCTYPE tmx SYSTEM \"tmx11.dtd\">\n", 'UTF-8') exportfile.append("<tmx version=\"1.4\">\n", 'UTF-8') exportfile.append(" <header\n", 'UTF-8') exportfile.append(" creationtool=\"OmegaTScripting\"\n", 'UTF-8') exportfile.append(" segtype=\"" + segmenting + "\"\n", 'UTF-8') exportfile.append(" o-tmf=\"OmegaT TMX\"\n", 'UTF-8') exportfile.append(" adminlang=\"EN-US\"\n", 'UTF-8') exportfile.append(" srclang=\"" + sourceLocale + "\"\n", 'UTF-8') exportfile.append(" datatype=\"plaintext\"\n", 'UTF-8') exportfile.append(" >\n", 'UTF-8') fc.selectedFiles.each{ fl = "${it.toString()}" - "$sourceroot" exportfile.append(" <prop type=\"Filename\">" + fl + "</prop>\n", 'UTF-8') } exportfile.append(" </header>\n", 'UTF-8') exportfile.append(" <body>\n", 'UTF-8') def count = 0 fc.selectedFiles.each{ fl = "${it.toString()}" - "$sourceroot" files = project.projectFiles files.each{ if ( "${it.filePath}" != "$fl" ) { println "Skipping to the next file" }else{ it.entries.each { def info = project.getTranslationInfo(it) def changeId = info.changer def changeDate = info.changeDate def creationId = info.creator def creationDate = info.creationDate def alt = 'unknown' if (info.isTranslated()) { source = StaticUtils.makeValidXML(it.srcText) target = StaticUtils.makeValidXML(info.translation) exportfile.append(" <tu>\n", 'UTF-8') exportfile.append(" <tuv xml:lang=\"" + sourceLocale + "\">\n", 'UTF-8') exportfile.append(" <seg>" + "$source" + "</seg>\n", 'UTF-8') exportfile.append(" </tuv>\n", 'UTF-8') exportfile.append(" <tuv xml:lang=\"" + targetLocale + "\"", 'UTF-8') exportfile.append(" changeid=\"${changeId ?: alt }\"", 'UTF-8') exportfile.append(" changedate=\"${ changeDate > 0 ? new Date(changeDate).format("yyyyMMdd'T'HHmmss'Z'") : alt }\"", 'UTF-8') exportfile.append(" creationid=\"${creationId ?: alt }\"", 'UTF-8') exportfile.append(" creationdate=\"${ creationDate > 0 ? new Date(creationDate).format("yyyyMMdd'T'HHmmss'Z'") : alt }\"", 'UTF-8') exportfile.append(">\n", 'UTF-8') exportfile.append(" <seg>" + "$target" + "</seg>\n", 'UTF-8') exportfile.append(" </tuv>\n", 'UTF-8') exportfile.append(" </tu>\n", 'UTF-8') count++; } } } } } exportfile.append(" </body>\n", 'UTF-8') exportfile.append("</tmx>", 'UTF-8') final def title = 'TMX file written' final def msg = "$count TU's written to " + exportfile.toString() console.println msg showMessageDialog null, msg, title, INFORMATION_MESSAGE return
The TMX file is rewritten each time the script in invoked.
Big thank you goes to Roman Mironov and Velior Translation Agency for the idea and comprehensive support.
Suggestions and comments are always welcome.
But as of now,
Good luck!
Hi Kos,
I had a look at your script, which is very useful by the way, and noticed a minor glitch and also managed to add a few contributions.
Minor glitch:
The tmx file the script writes has a doctype declaration that is inconsistent with the version attribute in the root element:
exportfile.append(“\n”, ‘UTF-8’)
exportfile.append(“\n”, ‘UTF-8’)
For the xml to be valid, the doctype declaration should specify “tmx14.dtd”. You may want to change that to prevent validation errors when the tmx files are loaded by stricter applications.
Few contributions:
I modified the script to write a tmx file for each selected file. As an alternative, I modified the script to insert a prop element with the filename attribute for each tu processed.
The reason behind is that I wanted to know exactly which segments belong to which file. That information was important in my workflow.
I will be happy to share my first-ever contribution to OmegaT scripts and I hope you don’t mind me messing around with your code!
Hector, thank you very much. Would you post your entire version of the script and post the link as comment here?
I would be happy to post it, but I don’t know how. Maybe I can send it to you in an email and you do the honors? After all, it is your script. I don’t know your email, btw.
As a side topic, I found that the script is not writing a newline after each exportfile.append instruction. Maybe it is that the \n is not being processed correctly, but I couldn’t find why.
OK, I managed to post it, here:
http://pastebin.com/As8CSSJV
This is the version that writes one TMX per file.
exportfile.append(“<!DOCTYPE tmx SYSTEM \”tmx11.dtd\”<\n”, ‘UTF-8’)
exportfile.append(“<tmx version=\”1.4\”<\n”, ‘UTF-8’)
There it is.
kind of, I mean, the glitch…glitched