How to use msggrep to apply new terminology on previous translation

Recently the Brazilian GNOME translation team, together with other Brazilian free software translation teams, improved severeal items in our terminology (which we call Standard Vocabulary), probably motivated by our meeting at the 9th FISL and the GNOME terminology revamp. When we started preparing ourselves to translate GNOME 2.24, I wanted to start by fixing the translations to match current terminology. This post brings the tricks I had to learn to make this happen.

The documentation for msggrep (part of GNU Gettext) isn’t very clear, so let’s start from the beginning. Find a message catalog (in example, from GTK+), and choose something you want to search for in the msgid‘s (in example “image”):

msggrep -Kie image gtk+.HEAD.pt_BR.po -o gtk+.HEAD.pt_BR.po.terminology

The resulting message catalog should contain all the relevant messages, with their comments and translations. By default all messages can be grepped: fuzzy, translated, untranslated, even obsolete. The output includes the complete original header with its comments. All you have to do is edit the abridged message catalog with your favorite text editor (don’t forget the obsolete messages!) and merge the new translations back:

mv gtk+.HEAD.pt_BR.po gtk+.HEAD.pt_BR.po.old
msgmerge gtk+.HEAD.pt_BR.po.terminology gtk+.HEAD.pt_BR.po.old 
    -C gtk+.HEAD.pt_BR.po.old -o gtk+.HEAD.pt_BR.po

I know msgmerge was supposed to be used with POT files, but gettext 0.17 doesn’t have this limitation anymore.

The example above is purely academic. If you are going to fix the translation yourself, you could simply open the message catalog in a text editor and find the words. Using msggrep becomes useful when you want to fix several message catalogs, and other translators are going to help. If you had all of the relevant PO files in the same directory, you could do this:

for catalog in *.pt_BR.po; do
    msggrep -Kie "$expression" "$catalog" -o "${catalog}.terminology"
done

Or, if your files are spread in a module/po/LL.po fashion:

expression="image"
mkdir terminology-fix

for path in */po{,-*}/pt_BR.po; do
    # Extra care to match gtk+-properties
    module="$(dirname $(dirname $path))"
    subdir="$(basename $(dirname $path))"
    domain="${module}${subdir#po}"
    target="terminology-fix/${domain}.po"

    msggrep -Kie "$expression" "$path" -o "$target"
done

The merge command can be customized in a similar way.

In the previous examples we searched for a very simple expression, “image”. But you probably need more flexibility, and msggrep can search for fixed strings, regular expressions, extended regular expressions, and expressions written in a file. In example, of you are looking for “print”, “prints”, “printer” or “printing”, or “image”, “imaging” or “images”:

msggrep -Kie "bprint(er|ing|s)?b" -Kie "bimag(es?|ing)?" gtk+.HEAD.pt_BR.po

If you are searching for many expressions, you should save them to a text file and ask msggrep to read it. One regular expression per line, without the quotes, and without any blank line.

msggrep -Kif regex.txt gtk+.HEAD.pt_BR.po

Now we have to fix those regular expressions to make them match words with underlines (access keys). In example, if we are simply looking for “page” or “pages”:

msggrep -Kie "bp_?a_?g_?e(_?s)?" gtk+.HEAD.pt_BR.po

This is ugly, and gets worse when you search for more or more flexible regular expressions. Using extended regular expressions cuts down some backslashes, but saving regular expressions to a file can make things even easier to read. Again looking for “page” or “pages”, I would have a pre-regex.txt with a “pages?” line (without the quotes), and then have it processed like this:

sed -e 's/[[:alpha:]]/_\?&/g' -e 's/.*/\b&\b/' pre-regex.txt > regex.txt

This way you can edit the easy-to-read pre-regex.txt, and automatically get the useful regext.txt.

Corrections, suggestions and enhancements are welcome, as usual. We could have used pogrep instead of msggrep, but gettext is more widely available than translate-toolkit. Maybe Dwayne Bailey or Friedel would like to write similar instructions for pogrep?

3 respostas em “How to use msggrep to apply new terminology on previous translation

  1. If you are doing cleanups of terminology I would also suggest the following approaches from the Translate Toolkit:

    You can use pogrep to traverse one or multiple files, regex queries accepted (–regex), accelerator characters can be ignored(–accelerators), and various sections searchable:

    pogrep image po/ po-grepped/
    # Edit the files
    pomerge -t po po-grepped po # merges all of your changes back again

    If you want to check terminology or create terminology then you can use the poterminology tool. This won’t fix the terminology for you but will identify problem areas. If there are conflicts then it will display both conflicting entries. You could use this then to execute a pogrep search. The advantage of this approach is that it will highlight words of terminology that you also might not yet have defined. But you probably want to use the next tool for a good cleanup.

    Lastly, if you are checking consistency you want to use poconflicts. This will find areas in your translations that are using different translations for the same English source text (note that not all occurrences are incorrect as your language might use different words for English words that have multiple meanings):

    poconflicts po/ po-conflicts/

    You now have a directory of PO files, one for each word or phrase in conflict. The advantage of this is that if the word in conflict is ‘image’ then we will extract every entry that contains the word image into a file called image.po. While you might lose context you will see the context of the word across the whole project. Try ignoring case to identify more conflicts.

    Once you have cleaned up the conflicts we use porestructure and pomerge to get your corrections back into the original files:

    porestructure po-conflicts/ po-restructured/ # This converts the PO files back into the original layout of the original PO files
    pomerge -t po/ po-restructured/ po/

  2. I personally am not happy with eyeballing/editing matched messages on their own, out of their PO files; for proper context, I frequently want to look at messages just above or below, or elsewhere in the same PO. So instead of extracting them, I prefer that matched messages get flagged in place, in their POs; then I can search for those flags in the editor, check the message, possibly edit, remove flag.


    $ alias posieve-fmm='posieve.py find-messages -smark -saccel:_ -m /tmp/fmm.out'
    $
    $ posieve-fmm -smsgid:'bprint(er|s|ing)?|bimag(e|ing|es)' po_files_root/
    ! po_files_root/alpha.po
    ! po_files_root/subdir/bravo.po
    $
    $ cat /tmp/fmm.out
    po_files_root/alpha.po
    po_files_root/subdir/bravo.po
    $
    $ kate `cat /tmp/fmm.out`

    where posieve.py is a part of Pology, a collection of PO tools that we are brewing in KDE's repository.

Deixe uma resposta

Preencha os seus dados abaixo ou clique em um ícone para log in:

Logotipo do WordPress.com

Você está comentando utilizando sua conta WordPress.com. Sair / Alterar )

Imagem do Twitter

Você está comentando utilizando sua conta Twitter. Sair / Alterar )

Foto do Facebook

Você está comentando utilizando sua conta Facebook. Sair / Alterar )

Foto do Google+

Você está comentando utilizando sua conta Google+. Sair / Alterar )

Conectando a %s