Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get preferred system locales #14

Open
3 of 5 tasks
jamiebuilds-signal opened this issue Jan 20, 2023 · 13 comments
Open
3 of 5 tasks

Get preferred system locales #14

jamiebuilds-signal opened this issue Jan 20, 2023 · 13 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@jamiebuilds-signal
Copy link

jamiebuilds-signal commented Jan 20, 2023

It is relatively common that a multi-lingual speaker would specify fallback languages on their system in case a program does not have strings for their most preferred language.

User speaks Farsi and French. They prefer Farsi, but many apps don't support it, so they would like to fallback to French in those cases. However, many apps do not respect this preference and will instead fallback to the language they were written in, in many cases a language they do not speak, such as English.

This is unfortunate, and it would be nice if the Rust ecosystem had a cross platform library for retrieving all the users preferred languages.

Links for platform APIs and other code examples:

Platform support currently contains:

  • iOS and macOS
  • Linux with a desktop environment (such as KDE)
  • Android
  • Windows
  • WASM
@complexspaces
Copy link
Collaborator

Thanks for filing this, I agree it'd be nice to expose a preference list instead of "this locale or nothing." This isn't a high priority item for me at the moment but I'd happily accept a PR if someone gets there before me.

Regarding the implementation, WASM and iOS/macOS are already pretty close to this underneath. Notably, the Apple platforms already use preferredLanguages just take the first item currently.

@complexspaces
Copy link
Collaborator

complexspaces commented Jun 16, 2023

#22 implemented the API for this feature, and the full implementations for WASM and Windows. There are still platforms that need flushed out.

@complexspaces
Copy link
Collaborator

#24 implemented this for our currently-supported Apple targets.

@complexspaces
Copy link
Collaborator

@Dinnerbone @jamiebuilds-signal I've published 0.3.1 a few minutes ago, so this new functionality should be available now for the currently supported platforms.

@jamiebuilds-signal
Copy link
Author

Looking at the docs, I would caution about describing the order as the users order of preference. As I understand it, generally you don't have to use the earliest listed language that you support. Often apps will have better and worse quality translations and can serve the better quality translation if its still a match

@pasabanov
Copy link
Contributor

Since there haven't been any updates in #28, I've decided to rewrite my thoughts here.

Based on the mentioned earlier link https://github.com/electron/electron/blob/main/shell/common/language_util_linux.cc and the link referenced there chromium/src/l10n/l10n_util.cc, I think we should use following variable order in locale evaluation: the precedence order: LANGUAGE, LC_ALL, LC_MESSAGES and LANG.

Here are some relevant sources that might help clarify the role of the LANGUAGE and LC_MESSAGES variables:

  1. Debian mailing list discussion where it is mentioned that LANGUAGE primarily affects the locale for message translation and takes precedence over other variables, even LC_ALL.
  2. Baeldung article on Linux locale environment variables, which also states that LANGUAGE has priority over other locale-related variables. But it might not be entirely accurate, as this should only be true in the context of message localization, which is what the article discusses.

Ideally, it would be useful to implement separate functions for determining the general locale and the message locale (and, ideally, for each specific LC_*** category). However, if we want to keep the locale determination process simple, one approach could be to give LANGUAGE priority over LC_ALL when determining the overall locale, because the locale is mainly used for message translation.

Additionally, if we aim to provide the user with a list of locales in descending order of preference (with the get_locales() function), LANGUAGE is well-suited for this purpose, as it literally stores such a list. If we want to go further, we could append the locale from LC_ALL, followed by those from LC_MESSAGES and then from LANG, to the list from LANGUAGE. We might consider excluding LC_CTYPE since it is more focused on specifying character encoding rather than language.

@complexspaces
Copy link
Collaborator

I think we should use following variable order in locale evaluation: the precedence order: LANGUAGE, LC_ALL, LC_MESSAGES and LANG.

Yes, I think this makes sense to me and I don't have any historical reason this crate didn't use LANGUAGE before. I think we need to keep LC_TYPE for compatibility, even if its checked after LC_MESSAGES.

Ideally, it would be useful to implement separate functions for determining the general locale and the message locale (and, ideally, for each specific LC_*** category).

While I agree in theory, I'm not sure if every platform this crate needs to support has the equivalent of per-category languages and my preference would be to keep sys-locale's API consistent.

Additionally, if we aim to provide the user with a list of locales in descending order of preference (with the get_locales() function), LANGUAGE is well-suited for this purpose, as it literally stores such a list. If we want to go further, we could append the locale from LC_ALL, followed by those from LC_MESSAGES and then from LANG, to the list from LANGUAGE

The purpose and intent of LANGUAGE makes sense to me, but from some brief testing I think its a requirement to check the other variables still because LANGUAGE is not set in an old KDE installation I have nor in a Ubuntu 21 desktop VM.

@pasabanov
Copy link
Contributor

I think we need to keep LC_TYPE for compatibility, even if its checked after LC_MESSAGES.

Can you clarify the compatibility reasons for keeping LC_CTYPE?
To me, rejecting LC_CTYPE seems more like a fix rather than a breaking change.
I think we could assume that if LC_CTYPE is crucial in locale evaluation for some application, then this evaluation is already incorrect.

The purpose and intent of LANGUAGE makes sense to me, but from some brief testing I think its a requirement to check the other variables still because LANGUAGE is not set in an old KDE installation I have nor in a Ubuntu 21 desktop VM.

That's my suggestion! We can first check the LANGUAGE variable and then add the locales from the LC_ALL, LC_MESSAGES, and LANG variables to the list, following the priority order and checking for duplicates (if the same locale has already been added, there is no need to add it again)

@complexspaces
Copy link
Collaborator

Can you clarify the compatibility reasons for keeping LC_CTYPE?
To me, rejecting LC_CTYPE seems more like a fix rather than a breaking change.

Its really because I don't know if anyone has been relying on this less-then-perfect logic in the "wild", so starting to ignore it in a patch release might be offputting. I don't mind giving it a try, I'm just cautious 😅.

That's my suggestion!

Got it, I just wanted to make sure we were on the same page.

I was playing around with this a bit more while writing this comment and found that KDE seems to set the $LANGUAGE variable to something non-empty if you configure more then one language in the Plasma settings. This has increased my confidence that this will be a useful change for real end-user systems:

image

Its quite strange that its just set to a empty string when only one language is configured, but maybe that was improved in more recent releases (my VM is pretty old):

image

@pasabanov
Copy link
Contributor

Its really because I don't know if anyone has been relying on this less-then-perfect logic in the "wild", so starting to ignore it in a patch release might be offputting. I don't mind giving it a try, I'm just cautious 😅.

I think we can assume that in an overwhelming number of cases, LC_CTYPE is the same as LC_ALL and/or LC_MESSAGES and is consistent with LANGUAGE and LANG. Otherwise, it would indicate a wrong locale configuration.
Basically, the user is telling the program: "I want messages to be translated to LC_MESSAGES and encoded with LC_CTYPE". In that case, it would be odd to translate the messages in the program to LC_CTYPE.
I still see removing LC_CTYPE as a fix, so I would suggest doing it.

I was playing around with this a bit more while writing this comment and found that KDE seems to set the $LANGUAGE variable to something non-empty if you configure more then one language in the Plasma settings. This has increased my confidence that this will be a useful change for real end-user systems:

KDE sets the LANGUAGE variable in the .config/plasma-localerc file under the [Translations] table (tag, header, section).

Its quite strange that its just set to a empty string when only one language is configured

This is true for my version of Plasma 5.24.7.

@complexspaces
Copy link
Collaborator

complexspaces commented Sep 26, 2024

I still see removing LC_CTYPE as a fix, so I would suggest doing it.

Fair enough, we can go ahead and plan to do that then.

KDE sets the LANGUAGE variable in the .config/plasma-localerc file under the [Translations] table (tag, header, section).

I knew it stored this in a ~/.config file, but had never checked the variable. LANGUAGE is a much nicer option then needing to implement KDE-specific data detection.

This is true for my version of Plasma 5.24.7.

Odd, but at the end of the day not an actual problem since we will still be looking at LANG.

@pasabanov
Copy link
Contributor

Obtaining a list of preferred locales is now implemented for Linux.

@complexspaces
Copy link
Collaborator

Linux support for preferred locale listing has been released in 0.3.2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants