-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specify an explicit locale for all to{Lower,Upper}Case
calls
#17687
Conversation
Java's `String#to{Lower,Upper}Case()` is locale-dependent, which can lead to unexpected results in locales with special case mappings in the ASCII range (e.g. in a Turkish locale, a capital ASCII `I` lowercases to a non-ASCII variant of `i`). This is prevented by specifying a local without such case mappings. This commit uses `Locale.ROOT` as the canonical choice with the same case mapping behavior as other common locales such as `Locale.ENGLISH` or `Locale.US`. Follow-up changes could use Guava's `Ascii.to{Lower,Upper}Case` instead, but whether this is safe may depend on the context, which makes this replacement unsuitable to perform across the repo. Fixes bazelbuild#17541
f075124
to
a1d3a17
Compare
@meteorcloudy Could you review this? It's a global change that mostly affects OSS users. Turns out Bazel was very broken with a Turkish locale. |
@bazel-io flag |
Looks a nice fix to me.
What context specifically? I can see |
@meteorcloudy As far as I understand, I would be surprised if |
@fmeum Understood, thanks! |
I'm somewhat worried that even though this change fixes the issue at hand any many others that stem from the meaning of WDYT about sanitizing the environment of the Bazel server instead so that as far as the JVM is concerned, the locale is always the same, regardless of the locale the user sets? (I thought that we already do that, but we apparently don't) |
@lberki Char locales are fixed for Java actions and also by the Java stub template, but I don't think Bazel itself is subject to any locale cleaning. While forcing a locale makes sense from the point of view of hermeticity, there are certain places in which having locale-dependent output may actually be desirable (e.g. An alternative I have thought about was to add an ErrorProne check for locale-less calls to |
That also works (@meteorcloudy do we have ErrorProne set up on our public CI?) I think consistent formatting of floats and the like (to the extent that Bazel emits floats, I don't think it happens a lot) is a positive development, isn't it? And it would remove a whole class of possible bugs; output is one thing, but I'm worried that we are also parsing things in a locale-dependent way right now which isn't great. Ultimately, I'm fine with either approach, it's just that sanitizing the server environment seems to bring the most back for the buck in terms of bugs fixed and possible bugs eliminated. |
@cushon Has an ErrorProne check for locale-less |
I filed google/error-prone#3809 |
@bazel-io fork 6.2.0 |
Superseded by #17702 |
Java's
String#to{Lower,Upper}Case()
is locale-dependent, which canlead to unexpected results in locales with special case mappings in the
ASCII range (e.g. in a Turkish locale, a capital ASCII
I
lowercases toa non-ASCII variant of
i
).This is prevented by specifying a local without such case mappings. This
commit uses
Locale.ROOT
as the canonical choice with the same casemapping behavior as other common locales such as
Locale.ENGLISH
orLocale.US
.Follow-up changes could use Guava's
Ascii.to{Lower,Upper}Case
instead,but whether this is safe may depend on the context, which makes this
replacement unsuitable to perform across the repo.
Fixes #17541