Internationalization and localization for PHP
Provide your application in multiple languages, to users in various countries, with different formats and conventions.
- PHP 5.6.0+
- GNU gettext extension (
gettext
) - Internationalization extension (
intl
)
- GNU gettext extension (
Note: On Windows, you may have to use the non-thread-safe (NTS) version of PHP.
-
Include the library via Composer [?]:
$ composer require delight-im/i18n
-
Include the Composer autoloader:
require __DIR__ . '/vendor/autoload.php';
- What is a locale?
- Decide on your initial set of supported locales
- Creating a new instance
- Directory and file names for translation files
- Activating the correct locale for the user
- Enabling aliases for translation
- Identifying, marking and formatting translatable strings
- Extracting and updating translatable strings
- Translating the extracted strings
- Exporting translations to binary format
- Retrieving the active locale
- Information about locales
- Names of locales in the current language
- Native names of locales
- English names of locales
- Names of languages in the current language
- Native names of languages
- English names of languages
- Names of scripts in the current language
- Native names of scripts
- English names of scripts
- Names of regions in the current language
- Native names of regions
- English names of regions
- Language codes
- Script codes
- Region codes
- Directionality of text
- Controlling the leniency for lookups and comparisons of locales
Put simply, a locale is a set of user preferences and expectations, shared across larger communities in the world, and varying by geographic region. Notably, this includes a user’s language and their expectation of how numbers, dates and times are to be formatted.
Whatever set of languages, scripts and regions you decide to support at the beginning, you will be able to add or remove locales at any later time. So perhaps you might like to start with just 1–3 locales to get started faster.
You can find a list of various locale codes in the Codes
class and use the corresponding constants to refer to the locales, which is the recommended solution. Alternatively, you may copy their string values, which use a subset of IETF BCP 47 (RFC 5646) or Unicode CLDR identifiers.
Prior to using your initial set of languages, you should ensure they’re installed on any machine you’d like to develop or deploy your application on, making sure they are known to the operating system:
$ locale -a
Make sure to pay attention to the exact syntax of the locale names used by your operating system, especially with hyphens, underscores and suffixes, e.g. en-US
vs en_US
.
If a certain locale is not installed yet, you can add it like the es-AR
locale in the following example:
$ sudo locale-gen es_AR
$ sudo locale-gen es_AR.UTF-8
$ sudo update-locale
$ sudo service apache2 restart
Note: On Unix-like operating systems, the locale codes used during installation must use underscores.
In order to create an instance of the I18n
class, just provide your set of supported locales. The only special entry is the very first locale, which also serves as the default locale if no better match can be found for the user.
$i18n = new \Delight\I18n\I18n([
\Delight\I18n\Codes::EN_US,
\Delight\I18n\Codes::DA_DK,
\Delight\I18n\Codes::ES,
\Delight\I18n\Codes::ES_AR,
\Delight\I18n\Codes::KO,
\Delight\I18n\Codes::KO_KR,
\Delight\I18n\Codes::RU_RU,
\Delight\I18n\Codes::SW
]);
Your translation files will later have to be stored in the following location:
locale/<LOCALE_CODE>/LC_MESSAGES/messages.po
That may be, for example, using the es-ES
locale:
locale/es_ES/LC_MESSAGES/messages.po
If you need to change the path to the locale
directory or want to use a different name for that directory, just specify its path explicitly:
$i18n->setDirectory(__DIR__ . '/../translations');
The filename in the LC_MESSAGES
directory, i.e. messages.po
, is the name of the application module with the extension for PO (Portable Object) files. There’s usually no need to change that, but if you still want to do that, simply call the following method:
$i18n->setModule('messages');
Note: On Unix-like operating systems, the locale codes used in the directory names have to use underscores, whereas on Windows, the codes have to use hyphens.
The easiest way to pick the most suitable locale for the user is to let this library decide based on various signals and options automatically:
$i18n->setLocaleAutomatically();
This will check and decide based on the following factors (in that order):
-
Subdomain with locale code (e.g.
da-DK.example.com
)(Note: Locale codes in the (leftmost) subdomain are case-insensitive, i.e.
da-dk
works as well, and you can leave out region or script names, i.e. merelyda
would be sufficient here.) -
Path prefix with locale code (e.g.
http://www.example.com/pt-BR/welcome.html
)(Note: Locale codes in the path prefix are case-insensitive, i.e.
pt-br
works as well, and you can leave out region or script names, i.e. merelypt
would be sufficient here.) -
Query string with locale code
- the
locale
parameter - the
language
parameter - the
lang
parameter - the
lc
parameter
- the
-
Session field defined via
I18n#setSessionField
(e.g.$i18n->setSessionField('locale');
) -
Cookie defined via
I18n#setCookieName
(e.g.$i18n->setCookieName('lc');
), with an optional lifetime defined viaI18n#setCookieLifetime
(e.g.$i18n->setCookieLifetime(60 * 60 * 24);
), where a value ofnull
means that the cookie is to expire at the end of the current browser session -
HTTP request header
Accept-Language
(e.g.en-US,en;q=0.5
)
You will usually choose a single one of these options to store and transport your locale codes, with other factors (specifically the last one) as fallback options. The first three options (and the last one) may provide advantages in terms of search engine optimization (SEO) and caching.
Of course, you can also specify the locale for your users manually:
try {
$i18n->setLocaleManually('es-AR');
}
catch (\Delight\I18n\Throwable\LocaleNotSupportedException $e) {
die('The locale requested by the user is not supported');
}
Set up the following aliases in your application code to simplify your work with this library, to make your code more readable, and to enable support for the included tooling and other GNU gettext utilities:
function _f($text, ...$replacements) { global $i18n; return $i18n->translateFormatted($text, ...$replacements); }
function _fe($text, ...$replacements) { global $i18n; return $i18n->translateFormattedExtended($text, ...$replacements); }
function _p($text, $alternative, $count) { global $i18n; return $i18n->translatePlural($text, $alternative, $count); }
function _pf($text, $alternative, $count, ...$replacements) { global $i18n; return $i18n->translatePluralFormatted($text, $alternative, $count, ...$replacements); }
function _pfe($text, $alternative, $count, ...$replacements) { global $i18n; return $i18n->translatePluralFormattedExtended($text, $alternative, $count, ...$replacements); }
function _c($text, $context) { global $i18n; return $i18n->translateWithContext($text, $context); }
function _m($text) { global $i18n; return $i18n->markForTranslation($text); }
If the variable holding your global I18n
instance is not named $i18n
, you have to adjust each occurrence of $i18n
in the snippet above accordingly, of course.
In order to internationalize your code base, you have to identify and mark strings that can be translated, and use formatting with more complex strings. Afterwards, these marked strings can be extracted automatically, to be translated outside of the actual code, and will be inserted again during runtime by this library.
In general, you should follow these simple rules when marking strings for translations:
- Use units of text as large as possible. This could be a single word (e.g. “Save” on a button), several words (e.g. “Create a new account” in a headline), or full sentences (e.g. “Your account has been created.”).
- Strive to treat entire sentences as atomic units whenever possible, and don’t compose sentences from multiple translated words or parts unless absolutely necessary.
- Use string formatting via one of the dedicated functions and methods instead of resorting to string concatenation or string interpolation.
- Handle singular and plural forms using the dedicated functions and methods, which work even for languages with complex plural rules, which are not always as simple as the binary English rule.
Wrap the sentences, phrases and labels of your user interface inside of the _
function:
_('Welcome to our online store!');
// Welcome to our online store!
_('Create account');
// Create account
_('You have been successfully logged out.');
// You have been successfully logged out.
Wrap the sentences, phrases and labels of your user interface inside of the _f
function:
_f('This is %1$s.', 'Bob');
// This is Bob.
_f('This is %1$d.', 3);
// This is 3.
_f('This is %1$05d.', 3);
// This is 00003.
_f('This is %1$ 5d.', 3);
// This is 3.
// This is ␣␣␣␣3.
_f('This is %1$+d.', 3);
// This is +3.
_f('This is %1$+06d.', 3);
// This is +00003.
_f('This is %1$+ 6d.', 3);
// This is +3.
// This is ␣␣␣␣+3.
_f('This is %1$f.', 3.14);
// This is 3.140000.
_f('This is %1$012f.', 3.14);
// This is 00003.140000.
_f('This is %1$010.4f.', 3.14);
// This is 00003.1400.
_f('This is %1$ 12f.', 3.14);
// This is 3.140000.
// This is ␣␣␣␣3.140000.
_f('This is %1$ 10.4f.', 3.14);
// This is 3.1400.
// This is ␣␣␣␣3.1400.
_f('This is %1$+f.', 3.14);
// This is +3.140000.
_f('This is %1$+013f.', 3.14);
// This is +00003.140000.
_f('This is %1$+011.4f.', 3.14);
// This is +00003.1400.
_f('This is %1$+ 13f.', 3.14);
// This is +3.140000.
// This is ␣␣␣␣+3.140000.
_f('This is %1$+ 11.4f.', 3.14);
// This is +3.1400.
// This is ␣␣␣␣+3.1400.
_f('Hello %s!', 'Jane');
// Hello Jane!
_f('%1$s is %2$d years old.', 'John', 30);
// John is 30 years old.
Note: This uses the “printf” format string syntax, known from the C language (and also from PHP). In order to escape the percent sign (to use it literally), simply double it, as in 50 %%
.
Note: When your format strings have more than one placeholder and replacement, always number the placeholders to avoid ambiguity and to allow for flexibility during translation. For example, instead of %s is from %s
, use %1$s is from %2$s
.
Wrap the sentences, phrases and labels of your user interface inside of the _fe
function:
_fe('This is {0}.', 'Bob');
// This is Bob.
_fe('This is {0, number}.', 1003.14);
// This is 1,003.14.
_fe('This is {0, number, percent}.', 0.42);
// This is 42%.
_fe('This is {0, date}.', -14182916);
// This is Jul 20, 1969.
_fe('This is {0, date, short}.', -14182916);
// This is 7/20/69.
_fe('This is {0, date, medium}.', -14182916);
// This is Jul 20, 1969.
_fe('This is {0, date, long}.', -14182916);
// This is July 20, 1969.
_fe('This is {0, date, full}.', -14182916);
// This is Sunday, July 20, 1969.
_fe('This is {0, time}.', -14182916);
// This is 1:18:04 PM.
_fe('This is {0, time, short}.', -14182916);
// This is 1:18 PM.
_fe('This is {0, time, medium}.', -14182916);
// This is 1:18:04 PM.
_fe('This is {0, time, long}.', -14182916);
// This is 1:18:04 PM GMT-7.
_fe('This is {0, time, full}.', -14182916);
// This is 1:18:04 PM GMT-07:00.
_fe('This is {0, spellout}.', 314159);
// This is three hundred fourteen thousand one hundred fifty-nine.
_fe('This is {0, ordinal}.', 314159);
// This is 314,159th.
_fe('Hello {0}!', 'Jane');
// Hello Jane!
_fe('{0} is {1, number} years old.', 'John', 30);
// John is 30 years old.
Note: This uses the ICU “MessageFormat” syntax. In order to escape curly brackets (to use them literally), wrap them in single quotes, as in '{'
or '}'
. In order to escape single quotes (to use them literally), simply double them, as in it''s
. If you use single quotes for your string literals in PHP, you also have to escape the inserted single quotes with backslashes, as in \'{\'
, \'}\'
or it\'\'s
.
Wrap the sentences, phrases and labels of your user interface inside of the _p
function:
_p('cat', 'cats', 1);
// cat
_p('cat', 'cats', 2);
// cats
_p('cat', 'cats', 3);
// cats
_p('The file has been saved.', 'The files have been saved.', 1);
// The file has been saved.
_p('The file has been saved.', 'The files have been saved.', 2);
// The files have been saved.
_p('The file has been saved.', 'The files have been saved.', 3);
// The files have been saved.
Wrap the sentences, phrases and labels of your user interface inside of the _pf
function:
_pf('There is %d monkey.', 'There are %d monkeys.', 0);
// There are 0 monkeys.
_pf('There is %d monkey.', 'There are %d monkeys.', 1);
// There is 1 monkey.
_pf('There is %d monkey.', 'There are %d monkeys.', 2);
// There are 2 monkeys.
_pf('There is %1$d monkey in %2$s.', 'There are %1$d monkeys in %2$s.', 3, 'Anytown');
// There are 3 monkeys in Anytown.
_pf('You have %d new message', 'You have %d new messages', 0);
// You have 0 new messages
_pf('You have %d new message', 'You have %d new messages', 1);
// You have 1 new message
_pf('You have %d new message', 'You have %d new messages', 32);
// You have 32 new messages
Note: This uses the “printf” format string syntax, known from the C language (and also from PHP). In order to escape the percent sign (to use it literally), simply double it, as in 50 %%
.
Wrap the sentences, phrases and labels of your user interface inside of the _pfe
function:
_pfe('There is {0, number} monkey.', 'There are {0, number} monkeys.', 0);
// There are 0 monkeys.
_pfe('There is {0, number} monkey.', 'There are {0, number} monkeys.', 1);
// There is 1 monkey.
_pfe('There is {0, number} monkey.', 'There are {0, number} monkeys.', 2);
// There are 2 monkeys.
_pfe('There is {0, number} monkey in {1}.', 'There are {0, number} monkeys in {1}.', 3, 'Anytown');
// There are 3 monkeys in Anytown.
_pfe('You have {0, number} new message', 'You have {0, number} new messages', 0);
// You have 0 new messages
_pfe('You have {0, number} new message', 'You have {0, number} new messages', 1);
// You have 1 new message
_pfe('You have {0, number} new message', 'You have {0, number} new messages', 32);
// You have 32 new messages
Note: This uses the ICU “MessageFormat” syntax. In order to escape curly brackets (to use them literally), wrap them in single quotes, as in '{'
or '}'
. In order to escape single quotes (to use them literally), simply double them, as in it''s
. If you use single quotes for your string literals in PHP, you also have to escape the inserted single quotes with backslashes, as in \'{\'
, \'}\'
or it\'\'s
.
Wrap the sentences, phrases and labels of your user interface inside of the _c
function:
_c('Order', 'sorting');
// or
_c('Order', 'purchase');
// or
_c('Order', 'mathematics');
// or
_c('Order', 'classification');
_c('Address:', 'location');
// or
_c('Address:', 'www');
// or
_c('Address:', 'email');
// or
_c('Address:', 'letter');
// or
_c('Address:', 'speech');
Wrap the sentences, phrases and labels of your user interface inside of the _m
function. This is a no-op instruction, i.e. (at first glance), it does nothing. But it marks the wrapped text for later translation. This is useful if the text should not be translated immediately but will later be translated from a variable, usually at the latest point in time possible:
_m('User');
// User
This return value could be inserted into your database, for example, and will always use the original string from the source code. Later, you could then use the following call to translate that string from a variable:
$text = 'User';
_($text);
// User
In order to extract all translatable strings from your PHP files, you can use the built-in tool for this task. Again, make sure to pay attention to the exact syntax of the locale names used by your operating system, especially with hyphens, underscores and suffixes, e.g. en-US
vs en_US
. If you are not sure, check the output of the locale -a
command on the CLI again.
# For the `mr-IN` locale, with the default directory, with the default domain, and with fuzzy matching
$ bash ./i18n.sh mr-IN
# For the `sq-MK` locale, with the directory 'translations', with the default domain, and with fuzzy matching
$ bash ./i18n.sh sq-MK translations
# For the `yo-NG` locale, with the default directory, with the domain 'plugin', and with fuzzy matching
$ bash ./i18n.sh yo-NG "" plugin
# For the `fr-FR` locale, with the default directory, with the default domain, and without fuzzy matching
$ bash ./i18n.sh fr-FR "" "" nofuzzy
This creates or updates a PO (Portable Object) file for the specified language, which you can then translate, share with your translation team, or send to external translators.
If you only need a generic POT (Portable Object Template) file with all extracted strings, which is not specific to a certain language, just leave out the argument with the locale code (or set it to an empty string):
# With the default directory, with the default domain, and with fuzzy matching
$ bash ./i18n.sh
# With the directory 'translations', with the default domain, and with fuzzy matching
$ bash ./i18n.sh "" translations
# With the default directory, with the domain 'plugin', and with fuzzy matching
$ bash ./i18n.sh "" "" plugin
# With the default directory, with the default domain, and without fuzzy matching
$ bash ./i18n.sh "" "" "" nofuzzy
Whoever handles the actual task of translating the extracted strings, whether it’s you, your translation team, or external translators, the people in charge will need the PO (Portable Object) file for their language, or, in some cases, the generic POT (Portable Object Template) file.
Just open the file in question and search for strings with msgstr ""
below them. These are the strings with empty translations that you still need to work on. In addition to that, any string with #, fuzzy
above it has had a translation before but the original string in the source code changed, so the translation must be reviewed (and the “fuzzy” flag or comment removed).
After you have worked on your translations and saved the PO (Portable Object) file for a language, you need to run the command from “Extracting and updating translatable strings” again in order to export these translations to a binary format.
They will then be stored in a MO (Machine Object) file alongside your PO (Portable Object) file, ready to be automatically picked up and inserted in place of the original strings.
$i18n->getLocale();
// en-US
$i18n->getSystemLocale();
// en_US.utf8
$i18n->getLocaleName();
// English (United States)
$i18n->getLocaleName('fr-BE');
// French (Belgium)
\Delight\I18n\Locale::toName('nb-NO');
// Norwegian Bokmål (Norway)
$i18n->getNativeLocaleName();
// English (United States)
$i18n->getNativeLocaleName('fr-BE');
// français (Belgique)
\Delight\I18n\Locale::toNativeName('nb-NO');
// norsk bokmål (Norge)
\Delight\I18n\Locale::toEnglishName('nb-NO');
// Norwegian Bokmål (Norway)
$i18n->getLanguageName();
// English
$i18n->getLanguageName('fr-BE');
// French
\Delight\I18n\Locale::toLanguageName('nb-NO');
// Norwegian Bokmål
$i18n->getNativeLanguageName();
// English
$i18n->getNativeLanguageName('fr-BE');
// français
\Delight\I18n\Locale::toNativeLanguageName('nb-NO');
// norsk bokmål
\Delight\I18n\Locale::toEnglishLanguageName('nb-NO');
// Norwegian Bokmål
\Delight\I18n\Locale::toScriptName('nb-Latn-NO');
// Latin
\Delight\I18n\Locale::toNativeScriptName('nb-Latn-NO');
// latinsk
\Delight\I18n\Locale::toEnglishScriptName('nb-Latn-NO');
// Latin
\Delight\I18n\Locale::toRegionName('nb-NO');
// Norway
\Delight\I18n\Locale::toNativeRegionName('nb-NO');
// Norge
\Delight\I18n\Locale::toEnglishRegionName('nb-NO');
// Norway
\Delight\I18n\Locale::toLanguageCode('nb-Latn-NO');
// nb
\Delight\I18n\Locale::toScriptCode('nb-Latn-NO');
// Latn
\Delight\I18n\Locale::toRegionCode('nb-Latn-NO');
// NO
\Delight\I18n\Locale::isRtl('ur-PK');
// true
\Delight\I18n\Locale::isLtr('ln-CD');
// true
When using I18n#setLocaleAutomatically
to determine and activate the correct locale for the user automatically, you can control which locales to consider similar or related. Thus you can control the way lookups and comparisons of locales work.
If the default behavior doesn’t work for you, simply provide the optional first argument, called $leniency
, to I18n#setLocaleAutomatically
. The following table lists the minimum leniency value that is required to match the two locale codes in question:
sr |
sr-RS |
sr-BA |
sr-Cyrl |
sr-Latn |
sr-Cyrl-RS |
sr-Cyrl-BA |
sr-Latn-RS |
sr-Latn-BA |
|
---|---|---|---|---|---|---|---|---|---|
sr |
Leniency::NONE |
Leniency::EXTREMELY_LOW |
Leniency::EXTREMELY_LOW |
Leniency::LOW |
Leniency::LOW |
Leniency::MODERATE |
Leniency::MODERATE |
Leniency::MODERATE |
Leniency::MODERATE |
sr_RS |
Leniency::EXTREMELY_LOW |
Leniency::NONE |
Leniency::VERY_LOW |
Leniency::MODERATE |
Leniency::MODERATE |
Leniency::LOW |
Leniency::HIGH |
Leniency::LOW |
Leniency::HIGH |
sr_BA |
Leniency::EXTREMELY_LOW |
Leniency::VERY_LOW |
Leniency::NONE |
Leniency::MODERATE |
Leniency::MODERATE |
Leniency::HIGH |
Leniency::LOW |
Leniency::HIGH |
Leniency::LOW |
sr_Cyrl |
Leniency::LOW |
Leniency::MODERATE |
Leniency::MODERATE |
Leniency::NONE |
Leniency::VERY_HIGH |
Leniency::EXTREMELY_LOW |
Leniency::EXTREMELY_LOW |
Leniency::EXTREMELY_HIGH |
Leniency::EXTREMELY_HIGH |
sr_Latn |
Leniency::LOW |
Leniency::MODERATE |
Leniency::MODERATE |
Leniency::VERY_HIGH |
Leniency::NONE |
Leniency::EXTREMELY_HIGH |
Leniency::EXTREMELY_HIGH |
Leniency::EXTREMELY_LOW |
Leniency::EXTREMELY_LOW |
sr_Cyrl_RS |
Leniency::MODERATE |
Leniency::LOW |
Leniency::HIGH |
Leniency::EXTREMELY_LOW |
Leniency::EXTREMELY_HIGH |
Leniency::NONE |
Leniency::VERY_LOW |
Leniency::VERY_HIGH |
Leniency::FULL |
sr_Cyrl_BA |
Leniency::MODERATE |
Leniency::HIGH |
Leniency::LOW |
Leniency::EXTREMELY_LOW |
Leniency::EXTREMELY_HIGH |
Leniency::VERY_LOW |
Leniency::NONE |
Leniency::FULL |
Leniency::VERY_HIGH |
sr_Latn_RS |
Leniency::MODERATE |
Leniency::LOW |
Leniency::HIGH |
Leniency::EXTREMELY_HIGH |
Leniency::EXTREMELY_LOW |
Leniency::VERY_HIGH |
Leniency::FULL |
Leniency::NONE |
Leniency::VERY_LOW |
sr_Latn_BA |
Leniency::MODERATE |
Leniency::HIGH |
Leniency::LOW |
Leniency::EXTREMELY_HIGH |
Leniency::EXTREMELY_LOW |
Leniency::FULL |
Leniency::VERY_HIGH |
Leniency::VERY_LOW |
Leniency::NONE |
- Translations are usually cached, so it may be necessary to restart the web server for any changes to take effect.
All contributions are welcome! If you wish to contribute, please create an issue first so that your feature, problem or question can be discussed.
This project is licensed under the terms of the MIT License.