-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix encoding of non-ascii contents written to parameter files. #18972
Conversation
55ac165
to
deef072
Compare
deef072
to
0762612
Compare
Potential fix for #18792 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems logically correct but inefficient: writeContentUtf8 is a private method which has exactly 1 call site, so we can certainly avoid double-recoding.
I would suggest reverting the change to writeContent()
, and instead changing writeContentUtf8()
to the following:
...
if (stringUnsafe.getCoder(line) == StringUnsafe.LATIN1 && isAscii(bytes)) {
outputStream.write(bytes);
} else if (!StringUtil.decodeBytestringUtf8(line).equals(line)) {
// We successfully decoded line from utf8 - meaning it was already encoded as utf8.
// We do not want to double-encode.
outputStream.write(bytes);
} else {
ByteBuffer encodedBytes = encoder.encode(CharBuffer.wrap(line));
...
When args are written to parameter files, non-ascii values are wrongly encoded again as utf-8. This seems to be unaffected by the JDK20 upgrade of Bazel, and has always been happening.
Repro: