Skip to content
This repository has been archived by the owner on Jul 16, 2021. It is now read-only.

Translation System Improvements #252

Closed
kkomelin opened this issue Oct 15, 2016 · 34 comments
Closed

Translation System Improvements #252

kkomelin opened this issue Oct 15, 2016 · 34 comments

Comments

@kkomelin
Copy link

kkomelin commented Oct 15, 2016

This issue is created as follow-up to the discussion on Twitter

Problem/Motivation

Imagine we need to translate a string “Hello” from English to Spanish.

Current string translation system requires us to do the following:

  1. Create two translation files as follows:

/resources/lang/en/my.php

return [
    'greeting' => 'Hello'
];

/resources/lang/es/my.php

return [
    'greeting' => '¡Hola!'
];
  1. Use trans() function or @lang directive in our blade templates:
{{ trans('my.greeting') }}

or

@lang('my.greeting')

There are a few problems in this approach:

  • All developers of a team should remember the key my.greeting, otherwise they create duplicates in translation files.
  • The key name my.greeting does not tell us what kind of greeting stays behind of it. It can be “Hi” or “Hey” for example.
  • We should send our third-party translator two PHP files /resources/lang/en/my.php and /resources/lang/es/my.php and she has to manually compare them, which is not very convenient.

Obviously, there is room for improvement in this approach.

Proposed resolutions

I would like to propose a few improvements to Laravel string translation system.

1) Get rid of translation keys

First of all, we could get rid of an intermediate layer – translation keys, such as I used in my example above my.greeting.

It would be great to “teach” trans() function to receive a message instead of a key as a parameter, for example:

{{ trans('Hello’) }}

In that case, the translation system should automatically find Spanish translation when necessary and return it.

The Symfony Translation component, which is used in Laravel, does it the following way:

$translator->trans('Hello World')

or in templates:

{% trans %}Hello{% endtrans %}

It means we would just need to adapt Laravel’s wrapper around the Symfony Translation component.

2) [withdrawn] Switch translation storage to Gettext/i18n/PO files.

3) Rename trans() and trans_choice().

The names trans() and trans_choice() are too long to be used for every string in a multilingual application.
I really like the names which are used in laravel-gettext They are just __() and _n().

After that change our greeting example would look exactly like this:

{{ __('Hello') }}

More examples are in the documentation of the laravel-gettext package.

From my experience of working with multilingual sites (in Drupal), the shorter name, the more time you save on typing. And we all care about productivity, don’t we?

4) [withdrawn] Include laravel-gettext package into core

Important note: The described improvements can be integrated into Laravel one at a time together with deprecated functionality approach to make the transition seamless and painless for current users.

Useful links

Updates

2016-10-17: Laravel Gettext (xinax/laravel-gettext) 4 has been released and now it supports Laravel 5.3
See the notes for the release https://github.com/xinax/laravel-gettext/releases/tag/4.0.1

2016-11-16: __() function was introduced and support for json translations was added to Laravel 5.4 laravel/framework#16424

2016-12-02 I contacted the PoEdit tool's author through Twitter and politely asked to add support for Laravel Blade templates to avoid regenerating translations every time when we add a new translation to a template. The author replied with a few rude comments and later deleted them all.
In my projects I can't rely on tools and developers that have a bad approach, so from now on I can't evangelize PoEdit anymore and will correct my proposal accordingly. Also, for my current project we have decided to switch to Laravel 5.4 built-in Json translation system once it's ready.


Do you like the idea of improving Laravel translation system? Let's discuss in comments.

@kkomelin
Copy link
Author

@shaggyz Could you please join the discussion?

@brunogaspar
Copy link

I use the suggested package, but i find that this approach creates issues when translating packages or even a modular applications.

But the suggestion is good and it would be valuable to have this in the core.

👍 from me

@kkomelin
Copy link
Author

Thank you @brunogaspar for sharing you experience and opinion.

@franzliedke
Copy link

What about strings that are homonyms in one language, but not in another? With your proposed system, this would be a problem - that's what keys solve.

@kkomelin
Copy link
Author

kkomelin commented Oct 16, 2016

That's a good point @franzliedke. Thank you.

As I understand, homonyms are words, which can have different meaning depending on a phrase (context). Ideally it should be possible to save some attributes of the context, for example the page where the translating string is rendered or the line of code where the __('Translating string') call is located.

The task of storing context of a translating string can be tricky, so for the first iteration I would suggest keeping existing key-based approach and implementing __('Translating string') as an additional and optional approach. It would allow us to choose between them in each particular case (string). Later, if/when we found a way to store the context, we could mark the old approach as deprecated and get rid of it in some reasonable time or next major version of the framework.

@shaggyz
Copy link

shaggyz commented Oct 17, 2016

Hi @kkomelin, we're currently working on 4.x branch for laravel 5.3.x support (and replacement of php-gettext module in favor of symfony translator). I think it will be ready this week. You can see the progress in issues/PR boards.

Feel free to ask anything you need for this proposal.

@kkomelin
Copy link
Author

Hi @shaggyz,

Thank you for the update regarding your xinax/laravel-gettext project status. Sounds promising. Looking forward to testing 4 version once it's ready.

I will update the proposal according to your note.

Thanks,
Konstantin

@kkomelin
Copy link
Author

kkomelin commented Oct 17, 2016

Hooray, xinax/laravel-gettext 4 has been released with Symfony Translator under the hood and support for Laravel 5.3. I've tested it. It's working fine.
I've updated this issue description accordingly so we could take it into account.

@shaggyz and the team, thanks for the great job with 4th version! You might want to find a couple of my suggestions on how to improve it in the updated 4) point of the issue.

@guiwoda
Copy link

guiwoda commented Oct 18, 2016

What about strings that are homonyms in one language, but not in another? With your proposed system, this would be a problem - that's what keys solve.

You could have the best of both worlds:

{{ _("metals/Lead") }} is different from {{ _("actions/Lead") }}

Although this is also a very minor setback, as you probably don't translate single words that often anyway.

@dionysiosarvanitis
Copy link

dionysiosarvanitis commented Oct 31, 2016

Related to 1) Symfony uses both real and keyword methods. It also recommends keyword format. As of 3) personally I prefer trans() over one-letter functions as it is more fluent. Also __() seems kind of noisy to me.

Finally, I like the idea of adding gettext to the core. It would also be nice to support more sources as of Symfony's Message Catalogs.

@kkomelin
Copy link
Author

kkomelin commented Nov 1, 2016

Thank you @dionysiosarvanitis for your opinion.

Related to 1) Symfony uses both real and keyword methods. It also recommends keyword format.

Many people consider Symfony components as a standard. Of course, they are good and very useful. However, I think there is no limit to perfection, and we can improve even their approaches if we find it necessary. But it's good to know. Thanks.

@dionysiosarvanitis
Copy link

I agree with you @kkomelin. Symfony or any other implementation is just a case study that recommends keywords because you'll only have to change a single point if the message must be changed in the future.

@kkomelin
Copy link
Author

kkomelin commented Nov 1, 2016

a case study that recommends keywords because you'll only have to change a single point if the message must be changed in the future.

@dionysiosarvanitis With gettext and PO files, you don't need to manage changes yourself. PoEdit tracks changes automatically and highlights those strings which have been changed. Just try it https://poedit.net/ and you'll love it.

@barryvdh
Copy link

barryvdh commented Nov 2, 2016

FYI, my translation-manager is not as battle-tested as POEdit, but it should be able to scan and find most translations strings in your code and allow an easy interface: https://github.com/barryvdh/laravel-translation-manager

@kkomelin
Copy link
Author

kkomelin commented Nov 2, 2016

@barryvdh Can it generate PO files or allow their editing?

@barryvdh
Copy link

barryvdh commented Nov 2, 2016

Not that is for the current translation system, not for .po files. It just scans the files for trans() calls, stores them in the database to translate with a UI and then export back to the .php files in Laravel.

@kkomelin
Copy link
Author

kkomelin commented Nov 2, 2016

Not that is for the current translation system, not for .po files. It just scans the files for trans() calls, stores them in the database to translate with a UI and then export back to the .php files in Laravel.

@barryvdh Thanks for the clarification and the link. As I see, it's not related to this issue and proposed changes because it only improves the translator experience but leaves the developer experience the same.

@CyrilMazur
Copy link

Something that's lacking from both gettext and laravel's translator is support for grammatical gender and other inflections. If you decide to switch translation storage to .po files, you're definitely closing the door to inflections. With translations stored in .php files or .json files, we'd have more flexibility. We could do things like this for example:

...
'profile-updated' => [
  'gender.male' => ':username updated HIS profile.',
  'gender.female' => ':username updated HER profile.',
  'gender.neutral' => ':username updated THEIR profile.',
]
...

and use the translator like

{{ trans('messages.profile-updated')->withGender($user->gender) }}

@kkomelin
Copy link
Author

kkomelin commented Nov 18, 2016

@CyrilMazur good point.
Usually it is solved through conventions. For example, "her" and "their" are commonly used in English when referring to an undefined gender. In Russian, however, we use "he". Cases when you need to explicitly define gender are relatively rare. Being aware of cultural differences and the conventions, a professional translator can solve the problem easily.

@kkomelin
Copy link
Author

kkomelin commented Nov 18, 2016

Good news! A couple of days ago, @themsaid implemented translation system based on JSON files which will become a part of Laravel 5.4 release laravel/framework#16424
More information about the change on the @themsaid's blog.

I personally think that the new translation system is a good improvement that solves tasks (1) and (3) of my initial proposal. It gives us an ability to use shorter function name __() and real text strings instead of abstract keys, for example:

__('Hello');

Also, it makes work of a translator more pleasant because original strings and their translations are located in the same JSON file.

For complex cases and plural forms we can still use trans() and trans_choice().
Although, I'm not sure that the new translation function __() processes named parameters the same way as trans(). If not, it could be a possible future improvement in order to make the transition between these two functions smoother.

Anyway, I'd like to thank @themsaid and others who were involved in the change for their effort.
Now the community is free to build translation file generators (code scanners), translation management interfaces and other extensions.

@franzliedke
Copy link

A little bit of feedback from Flarum's I18n expert, @dcsjapan, who has spent a lot of time in making our I18n solution the best it can be.

That system also misses the big advantage of the one we've come up with, which is that every bit of language has a discrete key. As Toby pointed out, the lack of this abstraction makes it harder to split phrases based on context. It also makes it more likely that developers will tweak a phrase in one location and forget to change the same phrase in other locations, so they end up with missing translations somewhere.
Basically, the keys give us a rough and ready way to keep track of resource use; that's something that this proposal will have to supply via some other means if they expect Laravel to be used in any large-scale projects.
I rather liked that system with esoTalk, but that's largely because esoTalk didn't have all that much in the way of language resources. Flarum already has a lot more language, and it's doing more complex things with it, so I suspect we'd have to modify the proposed scheme quite a bit if we wanted to adapt it for use in Flarum anytime soon.

Basically, it seems like the new system is slightly nicer to use for developers, but comes at the expense of expressiveness and feature set.

So, I'd be happy if @themsaid and @taylorotwell could clarify some things:

  • Is this meant as a replacement for or an alternative to the current system?
  • Have you considered these downsides (I'm sure you have) and how do you intend to tackle them in the future?

@themsaid
Copy link
Member

It's simply an alternative way, developers can choose whatever system they prefer based on the size of the project and the languages involved.

@themsaid
Copy link
Member

Looking at a previous production project I'd say it's very useful to have a mix of both in a single project

@franzliedke
Copy link

Thanks for the clarification, @themsaid.

Can you provide some details on how you would mix both approaches, without getting totally inconsistent?

@kkomelin
Copy link
Author

kkomelin commented Nov 22, 2016

@franzliedke Thank you for your and your colleague's opinion.

Basically, it seems like the new system is slightly nicer to use for developers, but comes at the expense of expressiveness and feature set.

We should not forget about improved translator experience. From my initial proposal about current translation system:

We should send our third-party translator two PHP files /resources/lang/en/my.php and /resources/lang/es/my.php and she has to manually compare them, which is not very convenient.

With the change introduced by @themsaid, the translator only needs a single file.

It also makes it more likely that developers will tweak a phrase in one location and forget to change the same phrase in other locations, so they end up with missing translations somewhere.

You have the same probability to forget to rename your key in all places if necessary, so it's rather a matter of habit, from my point of view.

Can you provide some details on how you would mix both approaches, without getting totally inconsistent?

I didn't use both approaches together but I suppose the quote of @themsaid can answer:

If no translation was found in the JSON files, Laravel will start looking for translation keys in the old PHP array files located in resources/lang/en, in case a match was found it's returned to the user. That way the method is fully backward compatible.
(source laravel/framework#16424 (comment))

@kkomelin
Copy link
Author

kkomelin commented Dec 2, 2016

I contacted the PoEdit tool's author through Twitter and politely asked to add support for Laravel Blade templates to avoid regenerating translations every time when we add a new translation to a template. The author replied with a few rude comments and later deleted them all.

In my projects I can't rely on tools and developers that have a bad approach, so from now on I can't evangelize PoEdit anymore and will correct my proposal accordingly. Also, for my current project we have decided to switch to Laravel 5.4 built-in Json translation system once it's ready.

@dcsjapan
Copy link

dcsjapan commented Dec 3, 2016

I hope you don't mind my dropping in to add my thoughts here. I'm not a programmer, so not directly a user of Laravel. But I thought I might clarify a couple of the points made by @franzliedke, on the off chance that my ideas may be of use, somewhere down the road.

We should not forget about improved translator experience.

That what I liked about translating esoTalk, which used a similar system. But I think such a system will tend to become more unwieldy as the project gets bigger. Problems such as homonymous strings will crop up with greater frequency, and the need to organize phrases on some principle other than simple alphabetical order will become more apparent.

And it's not as if there aren't alternatives. For example, one might consider adding a comment line containing the original string above each translation in the PHP file. (I'm not sure I'd favor such an approach, but it does get everything in one file.) And when you get right down to it, looking at two text files side-by-side is not an insurmountable difficulty for most translators.

All this is purely for the sake of argument, since @themsaid has clarified that this system is being added as an alternative to the existing method. I think @franzliedke was mainly concerned that it was being proposed as a replacement, which could be problematic for some people.

You have the same probability to forget to rename your key in all places if necessary, so it's rather a matter of habit, from my point of view.

We don't have that problem with Flarum, since we've set up our system to use a unique key for nearly every translator call. (The only exceptions are the rare cases when the same string is used with exactly the same meaning in exactly the same context.) Each key, in addition to identifying the string to be used, gives information about where and how the string is used. This info helps the translator to locate the string in the application interface and decide how best to translate it.

Of course this means we frequently end up with two or more keys corresponding to the same string. We've dealt with that by having those keys reference a separate key in the "reused translations" section of the file; this referencing is resolved when the resources are compiled.

The upshot is that we have more keys than translations, and a somewhat complex key format ... which all translates to more work for developers. On the upside, however, our system gives translators the flexibility they need to deal with problems such as homonymous strings. When a string needs to be translated one way in one place and another way in a different place, the translator can simply replace the key reference with an appropriate translation ... all without ever needing to bother the devs to make changes to the code, which can be inconvenient for everyone.

Providing our translators with this flexibility was one of the goals we wanted to prioritize when thinking about i18n for Flarum.

If you'd like to know more about the approach we're taking, please feel free to check out our i18n docs. But I should mention that, as of this writing, we're working on some fairly big changes to our translation key structure; so the details of our system are still in flux.


I contacted the PoEdit tool's author through Twitter and politely asked to add support for Laravel Blade templates to avoid regenerating translations every time when we add a new translation to a template. The author replied with a few rude comments and later deleted them all.

I had a similar encounter with the author of Poedit a few years back. He was rude and unwilling to consider the ideas of other people. That experience was one of the reasons that I generally don't use scripts that employ the PO format.

The other reason is that I tend to avoid solutions that necessitate a specific tool or workflow. The really great thing about text files is ... there are so many tools available for editing them! 😄

@sisve
Copy link

sisve commented Dec 3, 2016

Do you realize how much coding would be required to support Blade in POEdit? You would need to instantiate a php engine, boot up the Laravel application, compile the Blade views into a temporary folder and then parse that folder. This is exactly what existing gettext integration packages does with an artisan command.

These steps are necessary because anyone can write a Blade macro that adds functionality. An editor cannot look at @mycommand and know if it's something that should be treated as html, or a custom Blade directive that calculates the final digit of pi and calls gettext. The only way for an editor to know this would be to execute everything, compile the view and look at the compiled result.

Claiming that it is "almost PHP/HTML" may be correct when writing the code, but it's much harder than that when it comes to parsing it.

You could always add a gulp watch task (or whatever javascript framework is used nowadays, I'm too old to keep up) that listens for changes to resources/views/* and call the existing artisan command to rebuild pot/po files with the latest changes.

@kkomelin
Copy link
Author

kkomelin commented Dec 4, 2016

Thank you for your comment @sisve.

Do you realize how much coding would be required to support Blade in POEdit? You would need to instantiate a php engine, boot up the Laravel application, compile the Blade views into a temporary folder and then parse that folder. This is exactly what existing gettext integration packages does with an artisan command.

I agree that compiling Blade templates is a big task for an external application. However, since I'm using wrappers like {{ __('Hello') }}, it's just a matter of parsing *.blade.php files with regular expressions. In fact, what I needed was just that PoEdit scanned *.blade.php files and used the same regexps which were used for PHP files.

Anyway, as mentioned before, PoEdit is not an option anymore. Do you know a cross-platform alternative to PoEdit for collecting strings and editing PO translations?

@kkomelin
Copy link
Author

kkomelin commented Apr 10, 2017

I've started a project, which collects translatable strings and creates json files with the strings.
Currently it doesn't take into account existing translations in json files and simply overrides them.
However, preserving existing translations is in my plans.
Here is the project https://github.com/kkomelin/laravel-translatable-string-exporter

@sisve
Copy link

sisve commented Apr 11, 2017

@kkomelin

  1. Your project seems to parse the blade files, not the resulting php files. This means it will miss all blade directives that may be modifying the output.
  2. It is using a regex on the entire blade file, meaning that it will pick up any __() occurrence, and not just those would be executed by php.

In the context of this issue I feel the need to point these things out. Mostly because it's a very hard thing to do non-Laravel application to handle, and somewhat tricky for a Laravel compatible solution. It is just not a regex you run on some blade files, that won't cover all the scenarios.

For a package to handle all the odd cases you need to compile the blade views and parse the resulting php code. If you still go with a regex solution (instead of a source parser/tokenizer) you need to take into account the scope of the match, if it's in a php scope (inside <?php ... ?>) or somewhere else.

@kkomelin
Copy link
Author

kkomelin commented Apr 11, 2017

Thank you @sisve for your valuable feedback.

Your project seems to parse the blade files, not the resulting php files. This means it will miss all blade directives that may be modifying the output.

Yes, I understand that and do not try to cover all edge cases. Edge cases are very rare and can be covered by adding some strings to the translation files manually.

It is using a regex on the entire blade file, meaning that it will pick up any __() occurrence, and not just those would be executed by php.

Yes, you're right. However, if it picks up some unnecessary occurrences, nothing bad will happen. You will just ignore those strings in the translation files. Moreover, I believe there won't be a lot of such occurrences. It's rather an exception.
I have a filter by JS and PHP files and I think it should filter out most of the unnecessary occurrences.

In the context of this issue I feel the need to point these things out. Mostly because it's a very hard thing to do non-Laravel application to handle, and somewhat tricky for a Laravel compatible solution. It is just not a regex you run on some blade files, that won't cover all the scenarios.

Let's not over-complicate things. I do not have an intention to cover all possible cases. Neither Laravel itself does it. My project is opinionated, as many others, and it solves my particular needs well enough. I would be happy if it could be useful to others too. That's why I shared the link here. And I would appreciate more feedback like yours. Thank you!

@barryvdh
Copy link

barryvdh commented May 5, 2017

As a question, why did we change the __() translations to use a JSON file, and not just a flat PHP file? Eg. lang/en.php or lang/en/all.php instead of lang/en.json? Seems that the effect would be the same (as you are just loading a flat array).

Eg. this could be done with just these 2 changes in other Laravel projects, using lang/nl/all.php

// Helper
 function __($key, array $replace = [], $locale = null)
    {
        return app('translator')->getFromAll($key, $replace, $locale);
    }

// In Translator
    public function getFromAll($key, array $replace = [], $locale = null)
    {

        $locale = $locale ?: $this->locale;
        // For JSON translations, there is only one file per locale, so we will simply load
        // that file and then we will be ready to check the array for the key. These are
        // only one level deep so we do not need to do any fancy searching through it.
        $this->load('*', 'all', $locale);
        $line = isset($this->loaded['*']['all'][$locale][$key])
            ? $this->loaded['*']['all'][$locale][$key] : null;
        // If we can't find a translation for the JSON key, we will attempt to translate it
        // using the typical translation file. This way developers can always just use a
        // helper such as __ instead of having to pick between trans or __ with views.
        if (! isset($line)) {
            $fallback = $this->get($key, $replace, $locale);
            if ($fallback !== $key) {
                return $fallback;
            }
        }
        return $this->makeReplacements($line ?: $key, $replace);
    }

Not that it matters much, but seems that PHP would be faster to load (with opcache etc) and would be consistent with the other files (eg. for translation guis)

@kkomelin
Copy link
Author

kkomelin commented May 5, 2017

@barryvdh I think @themsaid can better answer your question.
I will just give some links where @themsaid talks about the change:
http://themsaid.com/laravel-54-localization-json-20161117/
laravel/framework#16424 (comment)

I personally like the idea of JSON translation files.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests