-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bad span computations with unicode characters, should be handling them as graphemes #8706
Comments
They are still are not completely correct, since it does not handle graphemes at all, just codepoints, but at least it handles the common case correctly. The calculation was previously very wrong (rather than just a little bit wrong): it wasn't accounting for the fact that every character is 1 byte, and so multibyte characters were pretending to be zero width. cc rust-lang#8706
If #12489 lands, the compiler handles these slightly better, but it's just operating on codepoint counts and assuming they're all single width: i.e. still needs to be changed to work with graphemes. |
They are still are not completely correct, since it does not handle graphemes at all, just codepoints, but at least it handles the common case correctly. The calculation was previously very wrong (rather than just a little bit wrong): it wasn't accounting for the fact that every character is 1 byte, and so multibyte characters were pretending to be zero width. cc rust-lang#8706
Visiting for triage, still requires #7043 |
#7043 has been fixed, so this can actually make progress now! |
Because of #24428, this issue should be reopened. |
I agree with @huonw's comment on PR #24428: namely, we should revert the portions of that PR that regressed |
Triaging: Updating the code to fn main() {
let s = "ZͨA͑ͦ͒͋ͤ͑̚L̄͑͋Ĝͨͥ̿͒̽̈́Oͥ͛ͭ!̏"; while true { break; }
println!("{}", s);
} The error still does not point to the correct location:
|
Closing as fixed. fn main() {
let s = ~"ZͨA͑ͦL̄͑ĜͨOͥ͛!̏"; while true { break; }
} Now gives:
|
Never mind, closed prematurely. The
|
I'd like to work on this, but I could use some help. I've written a failing UI test, and I've pulled in the |
Display spans correctly when there are zero-width or wide characters Hopefully... * fixes #45211 * fixes #8706 --- Before: ``` error: invalid width `7` for integer literal --> unicode_2.rs:12:25 | 12 | let _ = ("a̐éö̲", 0u7); | ^^^ | = help: valid widths are 8, 16, 32, 64 and 128 error: invalid width `42` for integer literal --> unicode_2.rs:13:20 | 13 | let _ = ("아あ", 1i42); | ^^^^ | = help: valid widths are 8, 16, 32, 64 and 128 error: aborting due to 2 previous errors ``` After: ``` error: invalid width `7` for integer literal --> unicode_2.rs:12:25 | 12 | let _ = ("a̐éö̲", 0u7); | ^^^ | = help: valid widths are 8, 16, 32, 64 and 128 error: invalid width `42` for integer literal --> unicode_2.rs:13:20 | 13 | let _ = ("아あ", 1i42); | ^^^^ | = help: valid widths are 8, 16, 32, 64 and 128 error: aborting due to 2 previous errors ``` Spans might display incorrectly on the browser. r? @estebank
Seems like a part of the issue is still present. See #47380. |
@est31 this is a problem that might get introduced back every now and then by forgetting to test with unicode, but should only happen on new features (like the suggestions machinery that introduced the linked case). |
Fix formatting of `cast_abs_to_unsigned` docs The "use instead" section of the example was not being formatted as Rust code, and the "configuration" documentation was being formatted as Rust code. changelog: `[cast_abs_to_unsigned]` Fix example/configuration formatting
The proper fix for this requires grapheme handling (#7043), e.g. some graphemes are double width.
The text was updated successfully, but these errors were encountered: