PHP Internationalization and Localization Explained

Ever since the appearance of web applications and even before came a necessity to translate output according to user's preferred language. How to define language in terms of a standard, how to detect user's language later on then how to use latter to translate content in PHP applications?

How to define a language in a standard way?

This necessity has surfaced way before web applications have first appeared. As early as 1967, an official ISO 639 standard for language codes and names was put forward and used until 2002, when it got adnotated with new languages that gained international recognition (ISO 639-1).

Combined with (ISO 3166-1 standard country and charset it is used by all operating systems to detect user's LOCALE, identifying language content will be translated to. Operating systems define LOCALE as a combination of:

How to detect user LOCALE in a web application?

Unfortunately, LOCALE detection on user request has not yet been standardized. One of the reason is a gradual loss of interest in this field since more and more people today are proficient in English and latter has become a de-facto LOCALE standard in web sites around the world. Sites targeting a non English speaking market will have following options:

Once detected, LOCALE will usually be persisted in session so it will be remembered and used to automatically translate content into on next requests to that site.

How to translate content using LOCALE in PHP applications?

Once LOCALE has been detected, for web applications using PHP language, a translation must be saught after. The way translations are stored comes in four basic flavours:

  1. translations are stored programmatically in locale-specific files. File name or its folder correspond to a LOCALE, whereas file content is an associative array where key is language-generic translation keyword (aka KEY) and value is language-specific translation text (aka VALUE). This way, for example in Laravel, {{__("KEY")}} statements in views are automatically replaced on compilation with VALUE KEY it points to from file that corresponds to LOCALE. Thanks to implementation simplicity and platform independence, this became the solution used by most PHP frameworks (most of them inspired from "mother of frameworks", Java's Spring who pioneered that method). Popular as it is, though, especially in interpreted languages such as PHP, it comes with lots of major disadvantages:
  2. translations are stored statically in XML or JSON files according to KEY and LOCALE. This solution only makes it easier for them to be handled by non-programmers (solving issues #2 & #3 above), but doesn't solve the main problem of intensive function calls and full translation array loaded in memory all the time.
  3. translations are stored in DB according to KEY and LOCALE. This beautifully solves maintenance issues (translators perform their job in specialized tools that reflect in DB), solves memory issues too (translations are queried from DB) but speed wise it's even worse than the two above (because querying always has highest latency in a web application).
  4. translations are stored in binary (.mo) files, themselves compiled from static (.po) files conceptually similar to those at point #2 above. We once again have the same principle (a KEY-VALUE store in static files, themselves stored according to LOCALE) with the major difference that, this time, every translation call (eg: KEY @ Lucinda Framework) will have the value sought in an optimal binary file. The difference from point #1 above is that _("KEY") goes to a native (written in C) instead of implemented (written in PHP) function call, so it taxes performance a lot less. Fast as it is, it does come with its own disadvantages:

In order to have minimal maintenance efficiency, translatable content needs to be broken up into reusable units so you won't have to translate same phrases over and over if requested page changes. This is a functional necessity accepted by all PHP frameworks around the world: outputted content MUST be a combination of units, each associated to a KEY identifier whose VALUE differs by LOCALE.

What does Lucinda think about it?

Originally, Internationalization & Localization API, part of Lucinda Framework, used GETTEXT solution, but it suddenly became obvious its maintenance disadvantages outweight performance advantages. The fact it is not working well on windows, an operating system used by most web developers, added to that impression. The newer 2.0 version thus uses method #2 above: a pure PHP implementation using JSON files to store translations.


Share