-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support encoding UTF-16 #6549
Support encoding UTF-16 #6549
Conversation
if encoding == encoding::UTF_16BE { | ||
for ch in rope.chunks().flat_map(|chunk| chunk.encode_utf16()) { | ||
writer.write_u16(ch).await?; | ||
} | ||
} else if encoding == encoding::UTF_16LE { | ||
for ch in rope.chunks().flat_map(|chunk| chunk.encode_utf16()) { | ||
writer.write_u16_le(ch).await?; | ||
} | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the only actual changes. Everything else is because of indentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a lot of duplication and not fully clear on why UTF-16 would need a custom workaround?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is necessary because encoding-rs doesn't support encoding utf-16. It only supports decoding utf-16
Hi. I tried this patch with the file supplied in the bug report and saw no changes after opening and saving. % file utf16-6.txt
utf16-6.txt: ISO-8859 text |
When to_writer is called on utf16.txt, it get an Due to this, the behaviour of editing utf16.txt is the same that on master. Edit: This value is from |
So it looks like |
To detect BOM status on #6497 I used encoding_rs let encoding = encoding
.or_else(|| {
encoding::Encoding::for_bom(&buf)
.map(|(encoding, _)| encoding)
})
.unwrap_or_else(|| {
let mut encoding_detector = chardetng::EncodingDetector::new();
encoding_detector.feed(&buf, is_empty);
encoding_detector.guess(None, true)
}); encoding_rs for_bom can detect UTF8, UTF16BE and UTF16LE for us, and we fall back on chardetng if no BOM is found. |
} | ||
} else if encoding == encoding::UTF_16LE { | ||
for ch in rope.chunks().flat_map(|chunk| chunk.encode_utf16()) { | ||
writer.write_u16_le(ch).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling the writer directly is pretty ineffient and will cause a ton of syscalls. You want to instead use the buffering code below.
I think a better approach would be to essentially create an.
enum Encoder {
Utf16Be
Utf16Le,
encodingRs(..)
}
and then implement utf16 encoding with the same interface the encoding_rs uses so we don't duplicate the buffering logic here
Resolves #6542