Lingua 3.6: improved template handling and a new extraction API

11 November 2014

Since the last time I wrote about Lingua a lot has happened: Most significantly support for message contxts has been added, which can be used today in Pyramid applications, more information is provided for translators, and a new API for creating extraction plugins has been added.

Message context in ZPT templates

When making an application translatable you may find that you use the exact same text in multiple places, but the meaning is slightly different. For example: To disambiguate these messages you can do two things: use a different message identifier, or define the message context.

Until very recently specifying the message identifier was the only option. THis is done by specifying an identifier in the i18n:translate attribute.

<button i18n:translate="button_save">Save</button>

This approach works, but has one downside: in the PO file it puts the canonical text in a comment and shows the identifier as canonical text.

#. Default: Save
msgid "button_save"
msgstr ""

Not all translations tools can handle this, and the result may be confusing for translators. Chameleon and Lingua now also support the message contexts to handle this more cleanly. The context can be defined with a new i18n:context attribute.

<button i18n:context="button" i18n:translate="">Save</button>

The context is inherited (similar to i18n:domain), so you can use it on larger scopes such as fieldsets or forms as well. The resulting PO entry now looks like this:

msgctxt "button"
msgid "Save"
msgstr ""

To use this you will need to use Chameleon 2.18 and translationstring 1.3 or later. At this moment message contexts are not supported by zope.i18n, so you can not use this in Zope or Plone.

Nested texts in ZPT templates

A common problem when translating text in HTML documents is that text can be split over multiple elements. Take this simple example:

<p>See what <em>looks best on you</em> in just a few steps</p>

Part of the sentence is emphasized by wrapping it in an <em> element. When making this translatable you get the following HTML:

<p i18n:translate="">
  See what
  <em i18:name="looks-best" i18n:translate="">looks best on you</em>
  in just a few steps
</p>

which results in these PO file entries:

msgid "See what ${looks-best} in just a few steps"
msgtr ""

msgid "looks best on you"
msgstr ""

When a translator works with this PO file it is impossible to see how the “looks best on you” text is used. This will quickly lead to incorrect translations. For example the naive translation to German is “steht Ihnen am besten”. This is incorrect here; the right translation is “Ihnen am besten steht”.

To help translators Lingua will now automatically add a comment to messages which explains the context of the message. For the same input Lingua will now generate this:

#. Canonical text for ${looks-best} is: "looks best on you"
msgid "See what ${looks-best} in just a few steps"
msgtr ""

#. Used in sentence: "See what ${looks-best} in just a few steps"
msgid "looks best on you"
msgstr ""

This guarantees that translators have all information necessary to create the right translations.

Extraction API

Lingua 3.2 introduced a new stable API for creating extraction plugins for Lingua. This is an alternative to the support for Babel plugins, allowing plugins to use Lingua’s internal API for improved handling of included Python code and to provide more data for messages.

Writing a plugin is easy to do. The first thing needed is to write your extractor as a class which implements the lingua.extractor.Extractor ABC. Here is a trivial example that will always return a single message:

from lingua.extractors import Extractor
from lingua.extractors import Message

class MyExtractor(Extractor):
    '''One-line description for --list-extractors'''
    extensions = ['.txt']

    def __call__(self, filename, options):
            return [Message(None, 'msgid', None, [], u'', u'', (filename, 1))]

The next step is to register your extractor with lingua using a setuptools entry point:

setup(name='mypackage',
      ...
      install_requires=['lingua'],
      ...
      entry_points='''
      [lingua.extractors]
      my_extractor = mypackage.extractor:MyExtractor
      '''
      ...)

As soon as your package is install lingua will detect it and use it to process all txt files.

There is an pull request for Mako to add a lingua plugin that will hopefully be merged soon. Support for other languages is always welcome.