Parsing damned-lies' releases.xml.in in the command line

Publicado em 28 de abril de 2008 por Leonardo Fontenelle

Almost two months ago, Simos requested a command line tool to help translators manage their message catalogs and get them uploaded. I agree it would be very, very useful, so I started to think about how to do that. I have almost no programming skills; so after a lot of research and trial-and-error, that’s as far as I could get.

I guess this command line will make releases.xml.in much easier to parse with standard UNIX tools:
echo "xpath //releases/release" | xmllint releases.xml.in --shell

There’s a lot to do from here, but hey, I’m just a translator. Maybe it would be a lot easier to use an advanced scripting language, like Python. Vertimus already integrates nicely with Damned Lies, so maybe some of its code could be used for the command line Simos asked for.

I guess all of this is pretty obvious, or even silly, for a lot of people in Planet GNOME. In case you are one of them, wouldn’t you please create such translation project manager?

(PS: integrating with vertimus would be even greater for the pt_BR team!)

Update: I just remembered Djihed Afifi was thinking on a translation project manager for gtranslator. Djihed, what do you think of this command line tool?

24 respostas em “Parsing damned-lies' releases.xml.in in the command line”

Thomas Thurman em 28 de abril de 2008 às 9:02 am disse:

Has nobody done this yet? It doesn’t sound like it would take a lot of time. If you’d like I could try and put something together, as long as you’d give feedback on what was or wasn’t good about it if I did 🙂

Comentar ↓
Thomas Thurman em 28 de abril de 2008 às 9:35 am disse:

So, my questions:

1) Simos says “Put them in a local folder.” Is there to be only one such local folder per user per system? Or is that an optional extra command-line switch?

2) “also commit to HEAD if required.” — How do we tell whether committing to HEAD is required?

Comentar ↓
Simos em 28 de abril de 2008 às 10:27 am disse:

@Thomas: The general idea of the command line tool is to download the full set of translation files locally, then translate/edit/etc with your preferred tools, and finally upload (with the tool) any changed files. Once the upload takes place, the local folder can be erased. Per 1), the local folder is just a “work” folder, that the translation coordinator manages. This tool makes sense to be used by the translation coordinator or a translation member that has an SVN account (so they can also upload).

Regarding (2). Algorithmically, if a package has a branch, then you also commit the translation at the trunk. A translator always works on the version of the PO files listed in the releases XML file. As there is no automated way to sync the branch and trunk versions of a PO file for a package, if there is a branch we work on it and when committing we dump a copy of the PO file on trunk as well so any updates will be moved there as well.

Packages start getting branched about a couple of weeks before translation deadline (near release of a new stable GNOME). For example, for GNOME 2.22, we currently translate the branch, if it exists. At some point in the summer we will shift focus to GNOME 2.24.

Comentar ↓
Shaun McCance em 28 de abril de 2008 às 10:31 am disse:

For processing XML from the command line, check out XML Starlet:

http://xmlstar.sourceforge.net/

It’s the grep, sed, and awk of XML.

Comentar ↓
Thomas Thurman em 28 de abril de 2008 às 10:48 am disse:

@Simos: Okay, interesting. Does everything that gets translated in the branch also get dumped into trunk/po/*.po, or only some subset?

Comentar ↓
Simos em 28 de abril de 2008 às 10:55 am disse:

@Thomas: For a possible first version of such a tool, it would make sense to focus on the UI translations (no documentation yet); therefore, it’s generally a single file per package (po/LL.po, where LL is language code for the specific translator).

Comentar ↓
Thomas Thurman em 28 de abril de 2008 às 11:04 am disse:

@Simos: Okay, that makes things simpler. But every string which is translated in a branch will get committed both there and in trunk?

Comentar ↓
Simos em 28 de abril de 2008 às 12:36 pm disse:

@Thomas: The way I would phrase it is “if a package has a branch to work on, then we fully update & commit the PO file for the branch, and finally we just add the same PO file in trunk as well, in order to keep it there until we switch working on the next version of GNOME”.

In other words, I would phrase your last comment as “every string^WPO file that is updated/translate for a branch, is committed for the branch (obviously) and also in trunk”.

The reason for “also commit in trunk” is that when we switch to the next version of GNOME, we would not have to think whether there exist useful translations in some branch that we need to pull into the trunk. This last task (figuring out if there are useful translations in some branch) would have been really cumbersome and PITA, and it’s what we are trying to avoid facing.

Comentar ↓
Thomas Thurman em 28 de abril de 2008 às 12:45 pm disse:

@Simos: Hm. It worries me to just “add the same PO file in trunk as well”. What happens when the source code in trunk has a new string which doesn’t exist in the branch? It has to be an intelligent merge, I’d have thought, not just a copy of the same file. (Fortunately that’s not difficult, but it’s worth noting.)

Comentar ↓
Simos em 28 de abril de 2008 às 6:58 pm disse:

@Thomas: For the purposes of describing this task, it’s simpler to say “add the same PO file in trunk as well”. This is good, because the typical workflow of a translator coordinator is to focus on a single GNOME release at a time. If a translator coordinator would like to focus on both branch and trunk, it is also feasible to perform the merge without having to checkout all the source (damned-lies makes available .HEAD.po files for each package). Such an enhancement could come once the basic script is in place.

Comentar ↓
Thomas Thurman em 28 de abril de 2008 às 7:08 pm disse:

@Simos: I apologise for my persistent confusion on this point, but I have got the rest of the problem figured out in my head and intend to write a first draft tonight.

Where can these .HEAD.po files be found on damned-lies?

I am fairly certain that it is not actually possible to check in a file unless you have checked out the whole directory. This does not mean that the translator is required to see the whole directory, but checking out the entire po directory in both head and the relevant branch can’t be avoided behind the scenes. (Of course, you can still just make patches and mail them to people.)

Comentar ↓
Simos em 28 de abril de 2008 às 7:27 pm disse:

̔@Thomas: I am really cool with this discussion. I am not sure if Leonardo is happy we took over his blog post.

For each package, you can get the POT file (.HEAD.po) from the package page. For anjuta, see http://l10n.gnome.org/module/anjuta and the two POT files for documentation and UI string respectively. This information can also be deduced from the XML file so a script can grab the POT file. Then, the script would “msgmerge -o POFileForTrunk.po FreshPOFile.po POTFileFromTrunk.pot” and commit POFileForTrunk.po in the trunk.

Comentar ↓
Simos em 28 de abril de 2008 às 8:14 pm disse:

@Thomas: I am really cool with this discussion. What I am not sure is if Leonardo is happy we hijacked his blog post.

Each package comes with POT files (.HEAD.pot). For example, for Anjuta, see http://l10n.gnome.org/module/anjuta
The information of the location of the POT file can be deduced easily from the releases XML file.

You can create an updated PO file for trunk with the command “msgmerge -o UpdatedPOForTrunk.po NewBranchPOFile.po Package.HEAD.pot”, thus, you do not have to checkout the full source code of the package.

Now, regarding the checkout of files, I think there is a way (=hack) to checkout specific files only for Subversion (was mentioned on the gnome-i18n list?), thus you need to checkout “po/LL.po” and “ChangeLog”. For simplicity, you may assume for now that the translator coordinator has performed an initial translation commit, and also edited the LINGUAS file.

Comentar ↓
Leonardo Fontenelle em 28 de abril de 2008 às 10:48 pm disse:

Hi, Thomas, good to see you around!

Most GUI translations will be located at po/ab_CD.po. GTK+ has both po/ and po-properties, and libgweather has both po/ and po-locations. Documentation translations will be find at help/ab_CD/ab_CD.po, subdir/doc/ab_CD/ab_CD.po and such. Damned-lies already knows how to find each message catalog, and what is the branch/trunk for each module, for each release. This information is actively maintained by the GNOME Translation Team, mostly Claude Paroz, and lags very little after jhbuild.

Most of the time the translator will want to commit to, say, gnome-2-22 as well as trunk, but that must be optional.

It will be possible to check out an empty directory and then check out a single file in it, when SVN 1.5 is released. The How to use GNOME SVN as a translator guide has more information on this.

Comentar ↓
Thomas Thurman em 28 de abril de 2008 às 10:59 pm disse:

@Leonardo: Thanks. So what’s needed is basically a CLI frontend to damned-lies? Is retrieving the XML in http://svn.gnome.org/svn/damned-lies/trunk/releases.xml.in and then checking out the PO files as appropriate to that enough, or is there more to it?

I was going to ask whether I should wait to implement this until the next version of svn, but I see there’s a workaround given there.

Comentar ↓
Leonardo Fontenelle em 28 de abril de 2008 às 11:21 pm disse:

About the “there has to be an intelligent merge”: there is one merge feature we have, and there is another one which we lack.

Take a look at this shot message catalog, for instance. It wasn’t touched in years, and two of the translated messages are not part of the source code anymore. If I check out the module and run intltool-update pt_BR, I’ll get a new message catalog, refreshed to mirror the translatable content of the source code. That’s what the autotools do when the module is been installed, so the installed MO file has only the other two, current, messages.

Now, let’s suppose there a new message to be translated. Again, the message catalog in the repository will continue untouched, unless I commit something new. If I just check out the message catalog, I’ll get a PO file with 4 translated messages (100%), even if two are obsolete, and even if now there’s a hypothetical new message in the source code. To get this untranslated new message in the message catalog, I have to check out the entire module and run intltool-update pt_BR. That’s a lot of useless downloading! So, Damned Lies does that for us. It checks out everything, runs intltool-update, and provides the conveniently updated translations for download. When I say “updated”, I mean the PO file has the same messages as the source code; the message catalogs Damned Lies provides have the same translations as those in SVN. As an example, this is the same message catalog as in the previous example, but updated by Damned Lies. Of course, it’s completely translated, because the “new message” was hypothetical.

This is why we can commit translations to trunk and to branch at the same time. When we do that, we worry about completely translating branch, and about not losing anyting when we start working in trunk. When I focus on a branch (say, gnome-2-22) I don’t care if trunk is completely translated, because nobody is using it anyway. When we start focusing on trunk (say, near GNOME 2.24) we will use the intltool-update’d version of the same translation we committed to branch.

Suppose I committed some fixes in a branch, and other improvements in another branch (or trunk, whatever). It’s really hard to merge the two translations. intltool-update (and msgmerge, for which intltool-update is a warper) changes almost everything in the PO file except for the translated messages. Translators are aware of that, it’s sort of a fact of life. You are welcome to circumvent this at any time, but it’s not a priority.

Comentar ↓
Leonardo Fontenelle em 28 de abril de 2008 às 11:29 pm disse:

When we really need to compare or merge two translations, we may do this:
msgmerge catalog_version1.po anyversion.pot > new_catalog1.po msgmerge catalog_version2.po anyversion.pot > new_catalog2.po
Please note that the POT file must be the same for both commands. Another option:
msgmerge catalog_version1.po catalog_version2.po > new_catalog_version1.po
The last command line requires GNU Gettext 0.17; older versions of msgmerge will give a funny output if the second argument is a regular PO file instead of a POT file.

Comentar ↓
Thomas Thurman em 28 de abril de 2008 às 11:44 pm disse:

@Leonardo: Thank you! That makes perfect sense.

So let’s see what we have as a series of operations:

A. The translator calls tsfx (or whatever it’s called) and supplies:
A project ID, such as “gnome-2-22”
A language code, such as “pt”
B. tsfx goes away and:
Retrieves releases.xml.in
Finds the list of modules and their branch IDs for the given project ID (e.g. module atk branch gnome-2-14, module gail branch gnome-2-18…)
For each of these modules, fetches http://l10n.gnome.org/POT/module.branch/module.branch.language.po where branch is HEAD if it’s trunk and the branch id otherwise; it fetches this file with the svn metadata intact so that it may be checked back in. (This may be accomplished if no other way is available by checking it out to a directory on its own under ~/.cache and hard linking the .po to the destination directory.)

C. The translator does funky stuff as appropriate to the .po files.

D. Either tsfx generates a patchset, for people who don’t have svn access…

E. Or it
Goes through and checks in each changed .po file with an appropriate comment in po/ChangeLog in the same svn revision.
Does the same again, only for trunk, if we weren’t using trunk.

Is that a fair summary?

Comentar ↓
Danilo em 29 de abril de 2008 às 4:26 am disse:

Actually, I’d make use of online data available from Damned-Lies in such a script.

Eg. there is actually http://l10n.gnome.org/po/sr/releases.xml which is a post-processed variant of releases.xml.in with Serbian translations (you can also find other language translations, or use the generic http://l10n.gnome.org/po/C/releases.xml).

Also, Damned-Lies provides separate per-language XML files which contain direct links to translated PO files (or empty POT files when there is no translation), which can be fetched from eg. http://l10n.gnome.org/languages/sr/gnome-2-22.xml (and respectively for other languages). This is where you can get an always updated PO file from the <path> element (so no need to bother yourself with merging steps above—that’s what damned-lies does for you), and a path where a PO file should be uploaded (<svnpath> element).

I imagine a command-line script would work like:

$ get-pofile --lang sr --module=all # both could be defaults, i.e. lang from LANG environment variable
$ get-pofile --lang sr --module=epiphany --branch=gnome-2-22 # and we could have an env-var for default branch too (fallback to trunk if there is no such branch)
$ commit-pofile --update-trunk epiphany.gnome-2-22.sr.po

If you really want to parse releases.xml (though I believe it’s not necessary), you can make use of data.py in damned-lies itself.

Comentar ↓
Danilo em 29 de abril de 2008 às 4:34 am disse:

Btw Thomas, please forget about patches and PO files: nobody wants to resolve conflicts in PO files, especially not just with patches.

Comentar ↓
Leonardo Fontenelle em 29 de abril de 2008 às 6:29 am disse:

Bad, Bad Akismet. No donuts for you 😀

Comentar ↓
Thomas Thurman em 29 de abril de 2008 às 9:24 am disse:

@Danilo: Okay– thanks, I didn’t know about those. I don’t see where I included a merge step, though it doesn’t matter since your solution seems to cover all cases. Given these changes, then, do we seem to have a workable algorithm?

I don’t really see why I would take code out of damned-lies for parsing what’s really a rather simple XML document.

What happens when a translator doesn’t have svn access, then? It should just produce the changed version and send it to someone? (I am finding it hard to see how resolving conflicts in PO files can be reduced for people using only anonymous access just by not using patches, but maybe I’m just thinking like a programmer.)

Comentar ↓
Leonardo Fontenelle em 29 de abril de 2008 às 7:55 pm disse:

@Thomas: When the translator doesn’t have SVN access, he’ll download the file from damned-lies and send it somehow to another member of the same team. Each team has its own way to manage this workflow. We hope someday damned-lies will be improved to help managing the workflow, maybe through incorporation of vertimus.

That’s not a matter of using Damned Lies source code, it’s a matter of using its XML files with information about release sets, modules, languages and teams. Damned Lies’ source code includes both python code and those if XML files. When a developer branches its module, he announces it in the gnome-18n mailing list, and Claude Paroz updates the XML file in the Damned Lies source code, which quickly gets built to the server l10n.gnome.org. (Edit: now I found the part where Danilo proposed using data.py. I’ll leave it for you guys to decide 🙂 )

The releases file could be use to check is the release name is valid, or to provide a list of avaliable releases. But most of the time the translator will choose the release without it. And there is a file for each the release/language, providing a lot of useful information: which is the correct branch for each module, where the files are located, whether if there are any errors, and so on.

Comentar ↓
Leonardo Fontenelle em 6 de maio de 2008 às 12:18 am disse:

Hello?

Comentar ↓