Translation quality control with pofilter

For more than an year now, the Brazilian GNOME translation team has been using translate-toolkit, specially its pofilter tool, to check translations before committing them. Since the last versions, pofilter’s behavior can be tailored to the target languages (e.g., pt_BR), but currently only 15 languages are benefited. In Brazil we are preparing a list of useful and useless tests, and I thought other translations teams should do the same. This lists can be then attached to bug reports, together with language-specific information (like the quotation style).

translate-tookit is very useful. It is employed in Pootle, a translation webapp; Open-Tran.eu, a descriptive translation terminology database; and Wordforge (ex-Pootling), a PO- and XLIFF-based translation application. Whether if you translate free software, or develop localization tools, check these projects!

pofilter is able to check tenths of error types. For the sake of usability, there is a standard set of tests for each of these projects: Openoffice.org, Mozilla, GNOME, KDE and wxWidgets. Many tests can detect common errors which gettext doesn’t look for. My favorite tests are: doublewords, endpunc, endwhitespace, startspace and xmltags. On the other hand, pofilter often gives us “false positives”, i. e., it sees errors where there are none. As a workaround, the user can arbitrarily exclude some tests from the verification, using a specific command line argument. Providing language-specific information might helps lowering the “false-positives” number, too.

The Portuguese version of this article listed the tests I skip in my current translation routine. For other locales, I guess the most relevant exclusion suggestions are:

  • doublespacing: because there is an English convention of two spaces after an dot;
  • isfuzzy, untranslated: if like me you check for them before using pofilter;
  • untranslated, acronyms: if your translators are skilled and dedicated, I guess you’ll find only “false positives” with these tests.

Does any other GNOME translation team regularly use pofilter or another tool from translate-toolkit? If by any chances you translate another free software project I’ll interested, too. How do you use translate-toolkit, specially pofilter?

7 respostas em “Translation quality control with pofilter

  1. Glad to see that pofilter is making other people’s lives easier! Current trunk has 46 tests, included in those are of course what I call the extraction ‘tests’ which LF mentioned above e.g. isfuzzy, untranslated.

    I find the variable and accelerator tests the most useful as that is where our translators make the most errors. Once translators are familiar with these things you should see less errors. I made a change to the accelerators test which will now allow a language to have a list of valid accelerator characters, thus you can define them as the characters that appear on your keyboard (some language keyboards have precomposed diacritics on the keyboard).

    Please do report any language specific entries that you want added. It will help you on pofilter but also help all the other users of the toolkit.

  2. Glad to know you are still benefiting from pofilter! Other features that already exist: the ability to give lists containing words that must be translated (perhaps your team has a policy for “OK” or certain program names), or that shouldn’t be translated (brand names, command line programs). In fact, these features are currently only available in pofilter, not yet in Pootle.

    There is now also infrastructure to program tests specific to a single language. This means we could even include some very specific tests specifically for pt-BR! In future, we hope to be able to mark the false positives as reviewed – this will allow pofilter and Pootle to skip those without complaining about them again. This should make it much easier to review translations again and again over the lifetime of several product versions. But of course, we should try to limit the false positives as well. Let us know if you find a pattern of false positives that we can eliminate.

    Of course I use the translate toolkit in “special” ways. But something I often use is pocount – such a simple program, but it really helps me to prioritise my work and to plan. Together with pogrep and pomerge, it becomes easier to isolate important work, and to plan available time. When working on GNOME I often use these and a few other tools as well (I prefer pot2po over msgmerge, for example). If you like pofilter, you might also like poconflicts – a way of checking consistencies between different files.

  3. I’m not using translate-toolkit (kbabel can do most of pofilter’s checks, if I don’t forget to click on all the menus :P), but we have some little and very useful scripts to do spell checking in po files.
    These are: huspell – a convenience wrapper for hunspell, to make invocation simpler, sort results, and add some exceptions, and huspell-po – to extract translations from po files (without the _ accelerator character) and feed the strings to huspell. That way I can quickly minimize the number of typos and such in my translations – experience with launchpad and newbies showed that this is a must have for serious work, altough no other tools I’ve seen have support for spell checking😦 – not counting kbabel’s solution, which plainly sucks.

    Also, Wordforge looks cool, thanks for pointing to it🙂. Why is it not part of the GNOME project?

  4. Dwayne: When I was finishing this article, I browsed pofilter’s source code (to see how many languages have extra features in pofilter), and I was amazed with the variety of tools in translate-toolkit. When I receive enogh feedback I’ll get in touch.

    F Wolff: Fortunately our locale has plenty of localization, so we mostly translate the next GNOME GUI and then whatever we feel like, but I’m sure comment will be more than helpful for many translation teams. I made extensive use of pocount in GNOME 2.20 or maybe 2,22, to make our translation more homogeneous.

    kelemeng, thank you for your suggestion. Personally I use Vim, which spell checker is very useful when coupled with the syntax highlight. But many translators use other tools, so using hunspell-po might make spell checking much easier. Poedit has orthographic verification too, but you must set the language and country for the spell checker to work. Poedit uses GtkSpell, which uses Aspell. Usually it’s patched to use Enchant, which uses Aspell by default too, but can be configured to use My/Hunspell instead.

  5. kelemeng – You can do spell checking with pofilter as follows:
    pofilter -t spellcheck --lang=xx from.po to.po
    pomerge -t from.po to.po from.po

    Might not be ideal for your situation but it will manage correct removal of accelerators before checking the spelling and does some other nice things like ignore words such as variables and words that you have defined shouldn’t be translated.

  6. Hey Leonardo, do you think we can improve the pofilter usage with other options? I was reading Dwayne Bailey’s comment and thinking about the best expression.
    I took your suggestion a long time ago and my pofilter current usage is: pofilter --gnome pt_BR.po | less. Sorry for glChess translation, I would avoid some mistakes if I checked the module using pofilter. By the way, gtali is almost finished.

Deixe uma resposta

Preencha os seus dados abaixo ou clique em um ícone para log in:

Logotipo do WordPress.com

Você está comentando utilizando sua conta WordPress.com. Sair / Alterar )

Imagem do Twitter

Você está comentando utilizando sua conta Twitter. Sair / Alterar )

Foto do Facebook

Você está comentando utilizando sua conta Facebook. Sair / Alterar )

Foto do Google+

Você está comentando utilizando sua conta Google+. Sair / Alterar )

Conectando a %s