Get preferred system locales #14

jamiebuilds-signal · 2023-01-20T22:00:30Z

It is relatively common that a multi-lingual speaker would specify fallback languages on their system in case a program does not have strings for their most preferred language.

User speaks Farsi and French. They prefer Farsi, but many apps don't support it, so they would like to fallback to French in those cases. However, many apps do not respect this preference and will instead fallback to the language they were written in, in many cases a language they do not speak, such as English.

This is unfortunate, and it would be nice if the Rust ecosystem had a cross platform library for retrieving all the users preferred languages.

Links for platform APIs and other code examples:

Windows:
- electron: https://github.com/electron/electron/blob/main/shell/common/language_util_win.cc
- winapi: https://docs.rs/winapi/0.3.9/winapi/um/winnls/fn.GetSystemPreferredUILanguages.html
macOS:
- electron: https://github.com/electron/electron/blob/main/shell/common/language_util_mac.mm
Linux:
- electron: https://github.com/electron/electron/blob/main/shell/common/language_util_linux.cc
WASM:
- web_sys: https://rustwasm.github.io/wasm-bindgen/api/web_sys/struct.Navigator.html#method.languages

Platform support currently contains:

complexspaces · 2023-03-05T23:25:27Z

Thanks for filing this, I agree it'd be nice to expose a preference list instead of "this locale or nothing." This isn't a high priority item for me at the moment but I'd happily accept a PR if someone gets there before me.

Regarding the implementation, WASM and iOS/macOS are already pretty close to this underneath. Notably, the Apple platforms already use preferredLanguages just take the first item currently.

complexspaces · 2023-06-16T06:46:18Z

#22 implemented the API for this feature, and the full implementations for WASM and Windows. There are still platforms that need flushed out.

complexspaces · 2023-06-17T19:43:09Z

#24 implemented this for our currently-supported Apple targets.

complexspaces · 2023-08-27T20:28:18Z

@Dinnerbone @jamiebuilds-signal I've published 0.3.1 a few minutes ago, so this new functionality should be available now for the currently supported platforms.

jamiebuilds-signal · 2023-08-29T20:57:04Z

Looking at the docs, I would caution about describing the order as the users order of preference. As I understand it, generally you don't have to use the earliest listed language that you support. Often apps will have better and worse quality translations and can serve the better quality translation if its still a match

pasabanov · 2024-09-20T14:46:59Z

Since there haven't been any updates in #28, I've decided to rewrite my thoughts here.

Based on the mentioned earlier link https://github.com/electron/electron/blob/main/shell/common/language_util_linux.cc and the link referenced there chromium/src/l10n/l10n_util.cc, I think we should use following variable order in locale evaluation: the precedence order: LANGUAGE, LC_ALL, LC_MESSAGES and LANG.

Here are some relevant sources that might help clarify the role of the LANGUAGE and LC_MESSAGES variables:

Debian mailing list discussion where it is mentioned that LANGUAGE primarily affects the locale for message translation and takes precedence over other variables, even LC_ALL.
Baeldung article on Linux locale environment variables, which also states that LANGUAGE has priority over other locale-related variables. But it might not be entirely accurate, as this should only be true in the context of message localization, which is what the article discusses.

Ideally, it would be useful to implement separate functions for determining the general locale and the message locale (and, ideally, for each specific LC_*** category). However, if we want to keep the locale determination process simple, one approach could be to give LANGUAGE priority over LC_ALL when determining the overall locale, because the locale is mainly used for message translation.

Additionally, if we aim to provide the user with a list of locales in descending order of preference (with the get_locales() function), LANGUAGE is well-suited for this purpose, as it literally stores such a list. If we want to go further, we could append the locale from LC_ALL, followed by those from LC_MESSAGES and then from LANG, to the list from LANGUAGE. We might consider excluding LC_CTYPE since it is more focused on specifying character encoding rather than language.

complexspaces · 2024-09-21T15:55:57Z

I think we should use following variable order in locale evaluation: the precedence order: LANGUAGE, LC_ALL, LC_MESSAGES and LANG.

Yes, I think this makes sense to me and I don't have any historical reason this crate didn't use LANGUAGE before. I think we need to keep LC_TYPE for compatibility, even if its checked after LC_MESSAGES.

Ideally, it would be useful to implement separate functions for determining the general locale and the message locale (and, ideally, for each specific LC_*** category).

While I agree in theory, I'm not sure if every platform this crate needs to support has the equivalent of per-category languages and my preference would be to keep sys-locale's API consistent.

Additionally, if we aim to provide the user with a list of locales in descending order of preference (with the get_locales() function), LANGUAGE is well-suited for this purpose, as it literally stores such a list. If we want to go further, we could append the locale from LC_ALL, followed by those from LC_MESSAGES and then from LANG, to the list from LANGUAGE

The purpose and intent of LANGUAGE makes sense to me, but from some brief testing I think its a requirement to check the other variables still because LANGUAGE is not set in an old KDE installation I have nor in a Ubuntu 21 desktop VM.

pasabanov · 2024-09-21T18:39:33Z

I think we need to keep LC_TYPE for compatibility, even if its checked after LC_MESSAGES.

Can you clarify the compatibility reasons for keeping LC_CTYPE?
To me, rejecting LC_CTYPE seems more like a fix rather than a breaking change.
I think we could assume that if LC_CTYPE is crucial in locale evaluation for some application, then this evaluation is already incorrect.

The purpose and intent of LANGUAGE makes sense to me, but from some brief testing I think its a requirement to check the other variables still because LANGUAGE is not set in an old KDE installation I have nor in a Ubuntu 21 desktop VM.

That's my suggestion! We can first check the LANGUAGE variable and then add the locales from the LC_ALL, LC_MESSAGES, and LANG variables to the list, following the priority order and checking for duplicates (if the same locale has already been added, there is no need to add it again)

complexspaces · 2024-09-26T05:40:35Z

Can you clarify the compatibility reasons for keeping LC_CTYPE?
To me, rejecting LC_CTYPE seems more like a fix rather than a breaking change.

Its really because I don't know if anyone has been relying on this less-then-perfect logic in the "wild", so starting to ignore it in a patch release might be offputting. I don't mind giving it a try, I'm just cautious 😅.

That's my suggestion!

Got it, I just wanted to make sure we were on the same page.

I was playing around with this a bit more while writing this comment and found that KDE seems to set the $LANGUAGE variable to something non-empty if you configure more then one language in the Plasma settings. This has increased my confidence that this will be a useful change for real end-user systems:

Its quite strange that its just set to a empty string when only one language is configured, but maybe that was improved in more recent releases (my VM is pretty old):

pasabanov · 2024-09-26T14:53:16Z

Its really because I don't know if anyone has been relying on this less-then-perfect logic in the "wild", so starting to ignore it in a patch release might be offputting. I don't mind giving it a try, I'm just cautious 😅.

I think we can assume that in an overwhelming number of cases, LC_CTYPE is the same as LC_ALL and/or LC_MESSAGES and is consistent with LANGUAGE and LANG. Otherwise, it would indicate a wrong locale configuration.
Basically, the user is telling the program: "I want messages to be translated to LC_MESSAGES and encoded with LC_CTYPE". In that case, it would be odd to translate the messages in the program to LC_CTYPE.
I still see removing LC_CTYPE as a fix, so I would suggest doing it.

I was playing around with this a bit more while writing this comment and found that KDE seems to set the $LANGUAGE variable to something non-empty if you configure more then one language in the Plasma settings. This has increased my confidence that this will be a useful change for real end-user systems:

KDE sets the LANGUAGE variable in the .config/plasma-localerc file under the [Translations] table (tag, header, section).

Its quite strange that its just set to a empty string when only one language is configured

This is true for my version of Plasma 5.24.7.

complexspaces · 2024-09-26T15:12:31Z

I still see removing LC_CTYPE as a fix, so I would suggest doing it.

Fair enough, we can go ahead and plan to do that then.

KDE sets the LANGUAGE variable in the .config/plasma-localerc file under the [Translations] table (tag, header, section).

I knew it stored this in a ~/.config file, but had never checked the variable. LANGUAGE is a much nicer option then needing to implement KDE-specific data detection.

This is true for my version of Plasma 5.24.7.

Odd, but at the end of the day not an actual problem since we will still be looking at LANG.

pasabanov · 2024-10-24T20:37:23Z

Obtaining a list of preferred locales is now implemented for Linux.

complexspaces · 2024-11-01T18:14:02Z

Linux support for preferred locale listing has been released in 0.3.2.

Dinnerbone mentioned this issue May 29, 2023

Add get_locales() method for preference-order list of locales #22

Merged

complexspaces added enhancement New feature or request help wanted Extra attention is needed labels Jun 16, 2023

complexspaces mentioned this issue Jun 17, 2023

Add full preference-order locale support to Apple targets #24

Merged

pasabanov mentioned this issue Aug 26, 2024

Error detecting locale with an empty LC_ALL #28

Closed

pasabanov mentioned this issue Sep 30, 2024

unix.rs: modified _get to return Iterator instead of Option #35

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get preferred system locales #14

Get preferred system locales #14

jamiebuilds-signal commented Jan 20, 2023 •

edited by complexspaces

Loading

complexspaces commented Mar 5, 2023

complexspaces commented Jun 16, 2023 •

edited

Loading

complexspaces commented Jun 17, 2023

complexspaces commented Aug 27, 2023

jamiebuilds-signal commented Aug 29, 2023

pasabanov commented Sep 20, 2024

complexspaces commented Sep 21, 2024

pasabanov commented Sep 21, 2024

complexspaces commented Sep 26, 2024

pasabanov commented Sep 26, 2024

complexspaces commented Sep 26, 2024 •

edited

Loading

pasabanov commented Oct 24, 2024

complexspaces commented Nov 1, 2024

Get preferred system locales #14

Get preferred system locales #14

Comments

jamiebuilds-signal commented Jan 20, 2023 • edited by complexspaces Loading

complexspaces commented Mar 5, 2023

complexspaces commented Jun 16, 2023 • edited Loading

complexspaces commented Jun 17, 2023

complexspaces commented Aug 27, 2023

jamiebuilds-signal commented Aug 29, 2023

pasabanov commented Sep 20, 2024

complexspaces commented Sep 21, 2024

pasabanov commented Sep 21, 2024

complexspaces commented Sep 26, 2024

pasabanov commented Sep 26, 2024

complexspaces commented Sep 26, 2024 • edited Loading

pasabanov commented Oct 24, 2024

complexspaces commented Nov 1, 2024

jamiebuilds-signal commented Jan 20, 2023 •

edited by complexspaces

Loading

complexspaces commented Jun 16, 2023 •

edited

Loading

complexspaces commented Sep 26, 2024 •

edited

Loading