Wichert Akkerman

Software design? Strange hacks? All of the above please!

Since the last time I wrote about Lingua a
lot has happened: Most significantly support for message contxts has been
added, which can be used today in Pyramid applications, more information is
provided for translators, and a new API for creating extraction plugins has
been added.

Message context in ZPT templates

When making an application translatable you may find that you use the exact
same text in multiple places, but the meaning is slightly different. For
example: To disambiguate these messages you can do two things: use a different
message identifier, or define the message context.

Until very recently specifying the message identifier was the only option. THis
is done by specifying an identifier in the i18n:translate attribute.

<button i18n:translate="button_save">Save</button>

This approach works, but has one downside: in the PO file it puts the canonical
text in a comment and shows the identifier as canonical text.

#. Default: Save
msgid "button_save"
msgstr ""

Not all translations tools can handle this, and the result may be confusing for
translators. Chameleon and Lingua now also support
the message contexts to handle this more cleanly. The context can be defined
with a new i18n:context attribute.

<button i18n:context="button" i18n:translate="">Save</button>

The context is inherited (similar to i18n:domain), so you can use it on
larger scopes such as fieldsets or forms as well. The resulting PO entry now
looks like this:

msgctxt "button"
msgid "Save"
msgstr ""

To use this you will need to use Chameleon 2.18 and translationstring 1.3 or
later. At this moment message contexts are not supported by zope.i18n, so you
can not use this in Zope or Plone.

Nested texts in ZPT templates

A common problem when translating text in HTML documents is that text can
be split over multiple elements. Take this simple example:

<p>See what <em>looks best on you</em> in just a few steps</p>

Part of the sentence is emphasized by wrapping it in an <em> element. When
making this translatable you get the following HTML:

<p i18n:translate="">See what
  <em i18:name="looks-best" i18n:translate="">looks best on you</em>
  in just a few steps</p>

which results in these PO file entries:

msgid "See what ${looks-best} in just a few steps"
msgtr ""

msgid "looks best on you"
msgstr ""

When a translator works with this PO file it is impossible to see how the
“looks best on you” text is used. This will quickly lead to incorrect
translations. For example the naive translation to German is “steht Ihnen am
besten”. This is incorrect here; the right translation is “Ihnen am besten
steht”.

To help translators Lingua will now automatically add a comment to messages
which explains the context of the message. For the same input Lingua will
now generate this:

#. Canonical text for ${looks-best} is: "looks best on you"
msgid "See what ${looks-best} in just a few steps"
msgtr ""

#. Used in sentence: "See what ${looks-best} in just a few steps"
msgid "looks best on you"
msgstr ""

This guarantees that translators have all information necessary to create
the right translations.

Extraction API

Lingua 3.2 introduced a new stable API for creating extraction plugins for
Lingua. This is an alternative to the support for Babel plugins, allowing
plugins to use Lingua’s internal API for improved handling of included
Python code and to provide more data for messages.

Writing a plugin is easy to do. The first thing needed is to
write your extractor as a class which implements the
lingua.extractor.Extractor
ABC. Here is a trivial example that will always return a single message:

from lingua.extractors import Extractor
from lingua.extractors import Message

class MyExtractor(Extractor):
    '''One-line description for --list-extractors'''
    extensions = ['.txt']

    def __call__(self, filename, options):
            return [Message(None, 'msgid', None, [], u'', u'', (filename, 1))]

The next step is to register your extractor with lingua using a setuptools
entry point:

setup(name='mypackage',
      ...
      install_requires=['lingua'],
      ...
      entry_points='''
      [lingua.extractors]
      my_extractor = mypackage.extractor:MyExtractor
      '''
      ...)

As soon as your package is install lingua will detect it and use it to process
all txt files.

There is an pull request for Mako to add a lingua
plugin

that will hopefully be merged soon. Support for other languages is always
welcome.