Write all source segments to a file

Update: Please, download scripts from the dedicated SF.net project page where they are maintained. Scripts at the links below might be obsolete (though most likely still working).

Since we started to make OmegaT write stuff to files, let’s try to dump all source segments to one file. I’m pretty sure one can find some use for it.

  • write_source2file.groovy
    /*
     * #Purpose:	Write all source segments to a file
     * #Files:	Writes 'allsource.txt' in the current project's root
     * 
     * @author:	Kos Ivantsov
     * @date:	2013-07-16
     * @version:	0.2
     */
    
    /* change "includefilenames" to anything but 'yes' (with quotes)
     * if you don't need filenames to be included in the file */
    
    def includefilenames = 'no'
    def includerepetitions = 'no'
    
    import static javax.swing.JOptionPane.*
    import static org.omegat.util.Platform.*
    
    // abort if a project is not opened yet
    def prop = project.projectProperties
    if (!prop) {
    	final def title = 'Source to File'
    	final def msg   = 'Please try again after you open a project.'
    	showMessageDialog null, msg, title, INFORMATION_MESSAGE
    	return
    }
    
    def folder = prop.projectRoot+'/script_output'
    def fileloc = folder+'/allsource.txt'
    writefile = new File(fileloc)
    if (! (new File(folder)).exists()) {
    	(new File(folder)).mkdir()
    	}
    
    writefile.write("", 'UTF-8')
    def count = 0
    def uniqline
    
    if (includefilenames == 'yes') {
    	files = project.projectFiles;
    	for (i in 0 ..< files.size())
    	{
    		fi = files[i];
    		marker = "+${'='*fi.filePath.size()}+\n"
    		writefile.append("$marker|$fi.filePath|\n$marker", 'UTF-8')
    		for (j in 0 ..< fi.entries.size())
    		{
    		ste = fi.entries[j];
    		source = ste.getSrcText();
    		writefile.append source +"\n", 'UTF-8'
    		count++;
    		}
    	}
    } else {
    	project.allEntries.each { ste ->
    	source = ste.getSrcText();
    	writefile.append source+"\n",'UTF-8'
    	count++
    		}
    	console.println "$count segments found in all files"
    	if (includerepetitions != 'yes') {
    		count = 0
    		uniqline = writefile.readLines().unique()
    		//console.println uniqline
    		writefile.write("",'UTF-8')
    		uniqline.each {
    		writefile.append "$it\n\n",'UTF8';
    		count++
    				}
    			}
    	}
    
    console.println count +" segments written to "+ writefile
    final def title = 'Source to File'
    final def msg   = count +" segments"+"\n"+"written to \n"+ writefile
    showMessageDialog null, msg, title, INFORMATION_MESSAGE
    return
    

    Once the script is invoked, it’ll create a file named “allsource.txt” in the current project’s root folder, where each segment will be on a new line. It’ll contain all the segments, even the ones that are already translated, and all the repetitions. The script can either just dump all segments into the file, or write out a filename in a box like this
    +====+
    |file|
    +====+

    followed by all the segments that belong to this file, and then a new filename and respective segment, and so on, or just dump all the segments in the order they appear in OmegaT without indicating what files they belong to. This behavior can be triggered by changing line 13. When it says def includefilenames = 'yes', you’ll get filenames written to the allsource.txt, but if you don’t want the filenames, change ‘yes’ to anything else or even leave it empty, making sure you have quotes, i.e. it can say def includefilenames = 'no, thanks' or even def includefilenames = '', but not def includefilenames = no (no quotes in the last example).
    The way the filenames get marked is defined in lines 44, 45.
    If filenames are not included, one can choose whether to include repetitions (line 14). 'yes' means “yes”, anything else, even 'yep', means “no”.

Suggestions, enhancements, bug reports, donations, postcards, invitations to a cup of coffee, feature requests, interesting translation projects with a good pay etc. are always welcome. Criticism isn’t, but will be accepted too.


wordpress visitor

But as of now,
Good luck

SVN status

Here’s a little script that checks current SVN status of various files in an OmegaT team project. May not be awfully useful, but sometimes it can help you prevent or solve SVN sync issues. It doesn’t do anything special, just shows you the status of the project’s project_save.tmx, main writable glossary, current file and the whole project folder. Continue reading

Batch Search and Replace and Selective Pretranslation in OmegaT

Update: Most of the post ramains true, but make sure you download these scripts from the SF.net repository.

In this post I want to share three scripts that can do an extended search and replace in OmegaT project. Search and replace templates for each script are specified in external plain text files located in project’s root folder, so these scripts without any modifications can be used for different projects with different sets of search and replace patterns — the user needs to modify only those plain text files as needed. On top of text modification there is a possibility to do a simple math on what is being found by the script thus enabling the user to have a per project unit converter.
Each script should be accompanied by its own external file located in a subfolder named .ini in the project’s root (details under each script further on). The format of these files is the same for all three:


  • Only one empty line in the file — the very last one
  • Each line consists of tree blocks:
    1. Search pattern (regex aware)
    2. Tab
    3. Replace pattern

So, if you need to replace “Владимир Владимирович” (taking into consideration different cases of Russian nouns) with “the President of Russian Federation“, here’s what you need to specify in the substitution file:
Владимир\p{L}?+\sВладимирович\p{L}?+ the President of Russian Federation Continue reading

Writing Auxilary Text Files from OmegaT

Update: Please, download scripts from the dedicated SF.net project page where they are maintained. Scripts at the links below might be obsolete (though most likely still working).

Here I’d like to share two Groovy scripts that don’t help with anything at hand in OmegaT, but write out external text files that can often be helpful in producing better quality translation.

The first script writes selected text to a file along with some context information. This can be helpful if you need to produce a list of unknown/unclear term that need to be discussed with the client, or things to be double-checked, studied, rewritten etc.

  • write_selection2list.groovy
    /*
     * #Purpose: Write selection to a file to create a list of terms
     * #Files:   Writes 'terms_list.txt' in the current project's root
     *     the file contains selection text, segment number, segment text
     *     and filename of the selection, if selection is in the current segment,
     *     or just the text of selection and the filename, if selection
     *     is outside the current segment.
     * #Note:    When invoked without selection, it opens the file
     *     in the default text editor
     * #Details: http : / / wp.me/p3fHEs-4L
     *
     * @author   Kos Ivantsov
     * @based on scripts by Yu Tang
     * @date     2013-06-25
     * @version  0.2
     */
    
    import static javax.swing.JOptionPane.*
    import static org.omegat.util.Platform.*
    
    // abort if a project is not opened yet
    def prop = project.projectProperties
    if (!prop) {
      final def title = 'Selection to List'
      final def msg   = 'Please try again after you open a project.'
      showMessageDialog null, msg, title, INFORMATION_MESSAGE
      return
    }
    // get segment #, source filename and the whole current segment
    def srcfile = editor.currentFile
    def ste = editor.currentEntry
    cur_text = ste.getSrcText()
    cur_seg = ste.entryNum()
    
    // define list file
    
    def folder = prop.projectRoot
    def fileloc = folder+'/terms_list.txt'
    list_file = new File(fileloc)
    
    // create file if it doesn't exist
    if (! list_file.exists()) {
    	list_file.write("",'UTF-8')
    	}
    
    /* 
     * command to open the file if there's no active selection
     * if a custom (not OS default) text editor should be used,
     * it needs to be defined in the next line (edit as needed and uncomment)
     */
    
    // def textEditor = /path to your editor/
    def command
    switch (osType) {
      case [OsType.WIN64, OsType.WIN32]:
        command = "cmd /c start \"\" \"$list_file\""  // default
        try { command = textEditor instanceof List ? [*textEditor, list_file] : "\"$textEditor\" \"$list_file\"" } catch (ignore) {}
        break
      case [OsType.MAC64, OsType.MAC32]:
        command = ['open', list_file]  // default
        try { command = textEditor instanceof List ? [*textEditor, list_file] : ['open', '-a', textEditor, list_file] } catch (ignore) {}
        break
      default:  // for Linux or others
        command = ['xdg-open', list_file] // default
        try { command = textEditor instanceof List ? [*textEditor, list_file] : [textEditor, list_file] } catch (ignore) {}
        break
    }
    
    def sel_txt = editor.selectedText
    if (sel_txt) {
    	list_file.append "${'='*10}\n $sel_txt\n",'UTF-8'
    	if (cur_text =~ sel_txt) {
    		list_file.append "${'-'*5}\n\
    filename: $srcfile\n\
    segment: $cur_seg\n\
    segment text: $cur_text \n\n",'UTF-8'
    	}else{
    		list_file.append "${'-'*5}\n\
    filename: $srcfile\n\
    ***Selection outside of current segment***\n",'UTF-8'
    	}
    	console.println "\"$sel_txt\" written to $list_file"	
    } else {
    console.println "[No selection]"
    console.println "***Opening the file in text editor***"
    console.println "Command: $command"
    command.execute()
    return // exit
    }
    

    The list is created in the current OmegaT project folder, file is named terms_list.txt. When the script is invoked with no selection, this file is opened in the default text editor — so that you can easily view or edit the file. When it’s invoked with some text selected in the Editor pane, the selection gets written to the file along with some context info depending on whether selection is inside or outside of the current segment.
    I’d like to write wider context, but I don’t know how to get text from previous and next segment without actually going there. Any help is welcome and appreciated, as usual.

The second script writes unique untranslated segments from the complete project into a text file named untranslated.txt. This files is located in the project’s root folder, and is rewritten each time the script is invoked. Such file can be used for a number of purposes, including producing TMX with MT.

  • write_untranslated2file.groovy
    /*
     * #Purpose: Write all unique untranslated segments to a file
     * #Files:   Writes 'untranslated.txt' in the current project's root
     * #Details: http : / / wp.me/p3fHEs-4L
     *
     * @author   Kos Ivantsov
     * @based on scripts by Yu Tang
     * @date     2013-06-25
     * @version  0.2
     */
    
    import static javax.swing.JOptionPane.*
    import static org.omegat.util.Platform.*
    
    // abort if a project is not opened yet
    def prop = project.projectProperties
    if (!prop) {
      final def title = 'Untranslated to File'
      final def msg   = 'Please try again after you open a project.'
      showMessageDialog null, msg, title, INFORMATION_MESSAGE
      return
    }
    
    def folder = prop.projectRoot
    def fileloc = folder+'/untranslated.txt'
    writefile = new File(fileloc)
    
    writefile.write("", 'UTF-8')
    def count = 0
    project.projectFiles
    .each {
    //console.println "\n${it.filePath}"
    it.entries
    .findAll {!project.getTranslationInfo(it).isTranslated()}
    .each {count++; writefile.append "${it.srcText}\n",'UTF-8'}
    }
    
    console.println "\nUntranslated segments found: $count"
    count = 0 
    def lines = writefile.readLines()
    uniqline = lines.unique()
    writefile.write("",'UTF-8')
    uniqline.each {
    writefile.append "$it\n",'UTF8';
    }
    console.println "Unique untranslated segments written to file:  $uniqline.size"
    

If you have ideas how to improve these, feel free to share.


UPDATE:

Here’s another script that writes all source segments to a file


But as of now,
Good luck!

Substitute Template For Each Project

Update: Please, download scripts from the dedicated SF.net project page where they are maintained. Scripts at the links below might be obsolete (though most likely still working).

Here I have a script that reads a tab-separated file (any number of tabs between items), each line of which contains the patterns to be found in the first position, and what it should be replaced with in the second. This file MUST be named subst_template.txt (well, it can be changed in the script, so maybe such a loud “must” isn’t really needed). The first pair should start on the first line, no empty lines between the pairs, and after the final pair there should be exactly one empty line. Below you’ll find an example of such file.
The file ought to be placed in OmegaT project’s root. That is made intentionally so that one can have a unique set of substitute patterns for each project. For example, I had an English to Ukrainian Christian project where names of the Bible books needed to be translated using one particular Ukrainian Bible version (Khomenko Bible), while for another project they needed to be taken from another version (Ohiyenko Bible). While English abbreviations remained the same, Ukrainian needed to be quite different (for instance, “Jn.” was “Йо.” in one, and “Ів.” in the other). So having a separate substitute pattern file in each projects I could use just one script to get Bible references with proper abbreviations in each of them. Continue reading

Simple QA with OmegaT GUI scripts

Here I want to share several groovy scripts that bring up a window showing a clickable segment # button to skip to the respective segment, and source and target segments that meet certain criteria. I’m not a scripting master, so I share them in hope that they can be helpful to some and improved by many (reverse the pronouns if needed).
Here’s but one screenshot to give you an idea what this is all about.
Window showing a simple QA check

Following is a list of scripts with links (click on titles) to download them, and short descriptions of what each one does.

Update: Please, download scripts from the dedicated SF.net project page where they are maintained. Scripts at the links below might be obsolete (though most likely still working).

  • Check QA rules script
  • This one is a simple QA check. Originally it was included in OmegaT scripts bundle, but I expanded it to show a window with clickable buttons. Currently checks for leading and trailing spaces, double words, and unproportionally longer or shorter target segments. If anyone knows how to add more rules to check, please share.
    Update: No need to download this script from here, it’s now included in the OmegaT bundle.

  • Show Source = Target
  • This one was originally included in the bundle as well, and I just added the GUI part. It brings up a window that lets you navigate through segments where target is the same as source.

  • Show Untranslated
  • This script brings up a window with all the untranslated segments, so you get only segment # buttons and source segments. Beside that, in the scripting console (lower part of the right side of the Scripting window) you get the count of all and unique untranslated segments. I don’t see a big practical value of this script, but a friend of mine and a fellow OmegaT user felt really blessed to get it. This one originally isn’t mine either, I used Yu Tang’s idea that he shared on OmegaT Yahoo! Group.

Testing scripts

You know, of course, about the Scripting Plugin for OmegaT. If you’re a member of OmegaT Yahoo user group, you may know Yu Tang who provided a bunch of nice scripts to open various files and folders of the currently active OmegaT project. His scripts have been included in the most recent version of the Scripting Plugin. The only problem with them was that one needed to edit them if they were used on OS’es other than MS Windows. Generally not a problem, as many OmegaT users are quite computer savvy, but it would be nicer if they could work everywhere “out of box”.

Yu Tang has updated the scripts so now they check which OS they are running on. If you’re willing to test them out and report any possible problems, please do.

The scripts are listed here and posted on the Pastebin.com with the link to each individual script right before the respective listing.

  1. open_current_file.groovy
    /*
    * Open current file
    *
    * @author Yu Tang
    * @date 2013-05-23
    * @version 0.2
    */
    
    import static javax.swing.JOptionPane.*
    import static org.omegat.util.Platform.*
    
    // abort if a project is not opened yet
    def prop = project.projectProperties
    if (!prop) {
    final def title = 'open current file'
    final def msg = 'Please try again after you open a project.'
    showMessageDialog null, msg, title, INFORMATION_MESSAGE
    return
    }
    
    // get command GString to open a file
    def file = prop.sourceRoot + editor.currentFile
    def command
    switch (osType) {
    case [OsType.WIN64, OsType.WIN32]:
    command = "cmd /c start \"\" \"$file\"" // for WinNT
    // command = "command /c start \"\" \"$file\"" // for Win9x or WinME
    break
    case [OsType.MAC64, OsType.MAC32]:
    command = "open \"$file\""
    break
    default: // for Linux or others
    command = ['xdg-open', file]
    break
    }
    
    // open it
    command.execute()
    
  2. open_folder.groovy
    /*
     *  Open project folder
     *
     * @author  Yu Tang
     * @date    2013-05-23
     * @version 0.2
     */
    
    import static javax.swing.JOptionPane.*
    import static org.omegat.util.Platform.*
    
    // abort if a project is not opened yet
    def prop = project.projectProperties
    if (!prop) {
      final def title = 'open project folder'
      final def msg   = 'Please try again after you open a project.'
      showMessageDialog null, msg, title, INFORMATION_MESSAGE
      return
    }
    
    // get command GString to open a folder
    def folder = prop.projectRoot
    def command
    switch (osType) {
      case [OsType.WIN64, OsType.WIN32]:
        command = "explorer.exe \"$folder\""
        break
      case [OsType.MAC64, OsType.MAC32]:
        command = "open \"$folder\""
        break
      default:  // for Linux or others
        command = ['xdg-open', folder]
        break
    }
    
    // open it
    command.execute()
    
  3. open_glossary.groovy
    /*
     *  Open the writeable glossary in an editor
     *
     * @author  Yu Tang
     * @date    2013-05-23
     * @version 0.2
     */
    
    import static javax.swing.JOptionPane.*
    import static org.omegat.util.Platform.*
    
    /**
     * Uncomment the next line if you want to set a default text editor
     * that will open glossary file
     */
    // def textEditor = /path to your editor/
    // E.g., /TextMate/
    // /C:\Program Files (x86)\editor\editor.exe/
    // ['x-terminal-emulator', '-e', 'vi']
    
    // make a Closure to show message dialog
    def showMessage = { msg -> showMessageDialog null, msg, 'Open glossary', INFORMATION_MESSAGE }
    
    // abort if a project is not opened yet
    def prop = project.projectProperties
    if (!prop) {
      showMessage 'Please try again after you open a project.'
      return
    }
    
    // exit if file not found
    def file = prop.writeableGlossary
    if (! new File(file).exists()) {
      showMessage 'Glossary file not found.'
      return
    }
    
    // get command GString list to open a file
    def command
    switch (osType) {
      case [OsType.WIN64, OsType.WIN32]:
        command = "cmd /c start \"\" \"$file\""  // default
        try { command = textEditor instanceof List ? [*textEditor, file] : "\"$textEditor\" \"$file\"" } catch (ignore) {}
        break
      case [OsType.MAC64, OsType.MAC32]:
        command = "open \"$file\""  // default
        try { command = textEditor instanceof List ? [*textEditor, file] : "open -a \"$textEditor\" \"$file\"" } catch (ignore) {}
        break
      default:  // for Linux or others
        command = ['xdg-open', file] // default
        try { command = textEditor instanceof List ? [*textEditor, file] : [textEditor, file] } catch (ignore) {}
        break
    }
    
    // open it
    console.println "command: $command"
    command.execute()
    
  4. open_project_save.groovy
    /*
     *  Open project_save.tmx in an editor
     *
     * @author  Yu Tang
     * @date    2013-05-23
     * @version 0.2
     */
    
    import static javax.swing.JOptionPane.*
    import static org.omegat.util.Platform.*
    
    /**
     * Uncomment the next line if you want to set a default text editor
     * that will open project_save.tmx
     */
    // def textEditor = /path to your editor/
    // E.g., /TextMate/
    // /C:\Program Files (x86)\editor\editor.exe/
    // ['x-terminal-emulator', '-e', 'vi']
    // "x-terminal-emulator -e vi".split()
    
    // abort if a project is not opened yet
    def prop = project.projectProperties
    if (!prop) {
      final def title = 'open project_save.tmx'
      final def msg   = 'Please try again after you open a project.'
      showMessageDialog null, msg, title, INFORMATION_MESSAGE
      return
    }
    
    // get command GString list to open a file
    def file = "${prop.projectInternal}project_save.tmx" 
    def command
    switch (osType) {
      case [OsType.WIN64, OsType.WIN32]:
        command = "cmd /c start \"\" \"$file\""  // default
        try { command = textEditor instanceof List ? [*textEditor, file] : "\"$textEditor\" \"$file\"" } catch (ignore) {}
        break
      case [OsType.MAC64, OsType.MAC32]:
        command = "open \"$file\""  // default
        try { command = textEditor instanceof List ? [*textEditor, file] : "open -a \"$textEditor\" \"$file\"" } catch (ignore) {}
        break
      default:  // for Linux or others
        command = ['xdg-open', file] // default
        try { command = textEditor instanceof List ? [*textEditor, file] : [textEditor, file] } catch (ignore) {}
        break
    }
    
    // open it
    console.println "command: $command"
    command.execute()
    
  5. open_tm_folder.groovy
    /*
     *  Open the /tm folder
     *
     * @author  Yu Tang
     * @date    2013-05-23
     * @version 0.2
     */
    
    import static javax.swing.JOptionPane.*
    import static org.omegat.util.Platform.*
    
    // abort if a project is not opened yet
    def prop = project.projectProperties
    if (!prop) {
      final def title = 'open TM folder'
      final def msg   = 'Please try again after you open a project.'
      showMessageDialog null, msg, title, INFORMATION_MESSAGE
      return
    }
    
    // get command GString to open a folder
    def folder = prop.TMRoot
    def command
    switch (osType) {
      case [OsType.WIN64, OsType.WIN32]:
        command = "explorer.exe \"$folder\""
        break
      case [OsType.MAC64, OsType.MAC32]:
        command = "open \"$folder\""
        break
      default:  // for Linux or others
        command = ['xdg-open', folder]
        break
    }
    
    // open it
    command.execute()
    

To test, put the scripts to other scripts that come with the Scripting Plugin and run them on a real or testing OmegaT projects. Among the things to check are: ability to run at all, ability to run when the filenames or path to the projects contain spaces and “special” characters.
You can post your findings in comments, or, if you’re so inclined, here (clickable)


UPDATE

Thanks to all who participated in testing, now shiny new scripts got included in a new release of the Scripting Plugin for OmegaT 2.6, and as an integral part of new OmegaT 3.0. Pretty cool, yeah?

File Renamer (Bash from within OmegaT)

Situation

You have a client who loves to give his files very descriptive names. That’s understandable as mostly the files you get to translate from him are lessons, lectures, howtos, manuals and so on. It makes sense to distribute them digitally with localized filenames.

Problem

What you need is a way to translate filenames in OmegaT thus keeping consistency with the contents of the translated files and past/future files from the same client.
Continue reading

OmegaT match insert/replace without tags

Situation

After having translated a complete user manual that you converted from PDF to ODT to be able to work on it in OmegaT, you receive another manual from the same client, but this time it’s a DOCX file. Great! You can start right away, without converting anything. That should be a peace of cake — half of the manual looks almost the same as the one you have just done.

Problem

After starting to work with it you find out that getting a lot of 95-97% would be really awesome, if it wasn’t for all those nasty tags that are very different in the source and in the match. And there is no “Insert match without tags” menu item in OmegaT (yet).

Continue reading

Bash (Perl, Python, Tcl/Tk and what not) Scripting from within OmegaT

Situation

So, right now you’re using quite a few scripts while working in OmegaT. Some of them are the ones included in the Scripting Plugin, others are taken from the Internet, several of them were written on your own, and a couple are still cooking in your head, promising to be something that will save you a couple hours of work everyday in future and now hindering you from concentrating on what is at hand. To run them from with a key shortcut you had to assign global key combinations, as from withing OmegaT you can run only 5 custom scripts with a key combo, and those are not just any scripts but the ones that the Scripting Plugin can run.

Problem

Now, with many other scripts and actions used elsewhere for your work/leisure you’re running out of available key combinations, plus you get more and more questions like, “Dad, what you just did doesn’t work on my computer. Do I press it wrong or what’s the matter?” from your elementary-school-aged son.
What you want is an ability to run any script from within OmegaT, not just the ones that the Scripting Plugin can run, as you don’t want to be limited to Java-like languages, but you look for a way to use anything that you’re comfortable with. Besides, these custom scripts should be aware of your current OmegaT project’s variables and settings (like project folders, language pairs etc.) Then at least you’ll be able to say to your son, “Boy, you don’t use OmegaT yet. Let me better show you this combo that you can use on your computer.”
Continue reading