Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unix.rs: modified _get to return Iterator instead of Option #35

Merged
merged 1 commit into from
Oct 21, 2024

Conversation

pasabanov
Copy link
Contributor

@pasabanov pasabanov commented Sep 30, 2024

Related: #14.

  1. Added the LANGUAGE and the LC_MESSAGES variables.
  2. Removed the LC_CTYPE variable.
  3. Modified the _get function to return an Iterator over locales instead of an Option.
  4. Updated tests accordingly.

@pasabanov
Copy link
Contributor Author

I updated the code via a force push to make it conform to rustfmt.

@pasabanov
Copy link
Contributor Author

pasabanov commented Oct 19, 2024

@complexspaces, can you review this PR please?

Copy link
Collaborator

@complexspaces complexspaces left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the improvements! I really like the deep testing as well.

My only minor concern with this is that typo-ed locales will result in duplicates (ex en_US and EN_US, which doesn't seem like that uncommon of a scenario if a user sets these by hand. Do you think its worth using a separate HashSet with lowercased strings to do the duplication checking to prevent this?

@pasabanov
Copy link
Contributor Author

pasabanov commented Oct 20, 2024

My only minor concern with this is that typo-ed locales will result in duplicates (ex en_US and EN_US, which doesn't seem like that uncommon of a scenario if a user sets these by hand. Do you think its worth using a separate HashSet with lowercased strings to do the duplication checking to prevent this?

  1. I don't think we should use a HashSet here for deduplication checking, because Vec works faster for a small number of elements due to CPU caching and lower overhead. However, to say for sure, tests are necessary.
  2. Most users set locales via graphical settings applications, and those who manually edit configuration files likely follow predefined examples. For these reasons, I think the chance of such typos is quite low.
  3. Could it be that the user needs to specify several locales with different cases for some specific applications to work properly?
  4. I wouldn't perform a case check in this situation, as libraries that handle locales (which will receive data obtained from sys-locale) usually handle locales with different cases properly, and adding an additional check would complicate the logic and make the function's behavior less clear (users of the library might be confused about why the library returns a different list of locales than the one specified on their system).
  5. Additionally, performing such a check may not be as simple as it seems. If our library receives two locales with different cases, determining which case to keep can be a labor-intensive task. This problem is solved in the language-tags crate, but including it would significantly increase the size of the library.
  6. Perhaps it’s worth adding test cases addressing this aspect. Should I do it?

@complexspaces
Copy link
Collaborator

Those points all seem reasonable to me. I just wanted to raise the discussion point. I agree that we shouldn't bloat the library for such a minor thing.

Perhaps it’s worth adding test cases addressing this aspect. Should I do it?

If you'd like to, sure, but I don't think its a deal breaker for this PR merging.

1. Added `LANGUAGE` and `LC_MESSAGES` variables.
2. Removed `LC_CTYPE` variable.
3. Modified `_get` to return `Iterator` over locales instead of `Option`.
4. Updated tests accordingly.
@pasabanov
Copy link
Contributor Author

If you'd like to, sure, but I don't think its a deal breaker for this PR merging.

I have added the corresponding test case.

@pasabanov
Copy link
Contributor Author

pasabanov commented Oct 21, 2024

I could also add information to the documentation stating that the function neither validates system locales for compliance with the POSIX format nor validates output locales for compliance with the BCP 47 format.
And also add tests to verify that the function does not perform validation.
@complexspaces, what do you think?

P.S. But the function guarantees correct BCP 47 locales for valid POSIX locales. This could also be included in the documentation.

@complexspaces complexspaces merged commit de5d48c into 1Password:main Oct 21, 2024
42 checks passed
@complexspaces
Copy link
Collaborator

If we update the documentation, let's leave that for a followup PR. I think the changes here are solid and shouldn't be bikeshed too long.

But the function guarantees correct BCP 47 locales for valid POSIX locales. This could also be included in the documentation.

Do you mean the public get_locale and get_locales functions? I think "guarantees" is stronger then reality since at the end of the day the crate is interested in returning OS-provided data. There just happens to be good alignment from the platforms today.

@pasabanov
Copy link
Contributor Author

Do you mean the public get_locale and get_locales functions? I think "guarantees" is stronger then reality since at the end of the day the crate is interested in returning OS-provided data. There just happens to be good alignment from the platforms today.

I would like to propose adding documentation for the _get function in unix.rs, and it could also be beneficial to mention this in relation to Linux (Unix) in the get_locale and get_locales functions.

Regarding the term "guarantees": I draw this conclusion based on how the posix_to_bcp47 function transforms POSIX locales into BCP 47 ones. Fortunately, there is some alignment between these formats that supports this assertion. However, if this term seems too strong, a softer alternative can be used.

@complexspaces
Copy link
Collaborator

I would to accept a PR adding a line or two of documentation to the public functions that notes on Linux/BSD systems user-input may influence the format of the returned data.

@pasabanov
Copy link
Contributor Author

I would to accept a PR adding a line or two of documentation to the public functions that notes on Linux/BSD systems user-input may influence the format of the returned data.

I think I'm going to update the documentation regarding Linux/BSD after PR #37.
By the way, which term do you think is better: "Linux/BSD" or "Linux/Unix"? Or perhaps "Unix-like" or something else?

@complexspaces
Copy link
Collaborator

I prefer Linux/BSD to be explicit. macOS and Android are two examples of UNIX platforms that don't use the codepaths modified here.

@complexspaces
Copy link
Collaborator

Released in 0.3.2. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants