-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unescaping cleanups #103919
Unescaping cleanups #103919
Conversation
These have been bugging me for a while. - `literal_text`: `src` is also used and is shorter and better. - `first_char`: used even when "first" doesn't make sense; `c` is shorter and better. - `curr`: `c` is shorter and better. - `unescaped_char`: `result` is also used and is shorter and better. - `second_char`: these have a single use and can be elided.
There is some subtlety here.
It's passed to numerous places where we just need an `is_byte` bool. Passing the bool avoids the need for some assertions. Also rename `is_bytes()` as `is_byte()`, to better match `Mode::Byte`, `Mode::ByteStr`, and `Mode::RawByteStr`.
@@ -351,7 +338,7 @@ where | |||
} | |||
} | |||
|
|||
fn byte_from_char(c: char) -> u8 { | |||
pub fn byte_from_char(c: char) -> u8 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is a bit dicey. I agree that in the context of compiler we don't really care about nice abstrction boundaries, and this is a simplification.
But we want rustc_lexer
to also be a nice crates.io crates, and from that point of view this feels a bit too type-unsafe.
No strong opinion overall.
Though, if we do this, this one surely wants to be tagged as #[inline]
? tiny cross-crate fn.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a fair point. A couple of questions:
- Is
rustc_lexer
actually usable outside of rustc/rust-analyzer? therustc-lexer
crate is v0.1.0 and hasn't been updated in three years. I see some auto-updated variants but none of them have been updated in the past year. - I'm not sure how this is type-unsafe. You'll get
u8
/char
type mismatches if you do something wrong? - Inlining is a good idea.
There are three kinds of "byte" literals: byte literals, byte string literals, and raw byte string literals. None are allowed to have non-ASCII chars in them. Two `EscapeError` variants exist for when that constraint is violated. - `NonAsciiCharInByte`: used for byte literals and byte string literals. - `NonAsciiCharInByteString`: used for raw byte string literals. As a result, the messages for raw byte string literals use different wording, without good reason. Also, byte string literals are incorrectly described as "byte constants" in some error messages. This commit eliminates `NonAsciiCharInByteString` so the three cases are handled similarly, and described correctly. The `mode` is enough to distinguish them. Note: Some existing error messages mention "byte constants" and some mention "byte literals". I went with the latter here, because it's a more correct name, as used by the Reference.
Remove a low-value comment, remove a duplicate comment, and correct a third comment.
It deals with eight cases: ints, floats, and the six quoted types (char/byte/strings). For ints and floats we have an early return, and the other six types fall through to the code at the end, which makes the function hard to read. This commit rearranges things to avoid the early returns.
It has a single callsite, and is fairly small. The `Float` match arm already has base-specific checking inline, so this makes things more consistent.
b5373dd
to
c0162ae
Compare
Thanks for the comments. I have updated.
|
This comment has been minimized.
This comment has been minimized.
This was a network error, when downloading some crates. |
c0162ae
to
bdd7114
Compare
It's easy to just use `unescape_literal` + `byte_from_char`.
bdd7114
to
43d21b5
Compare
@bors r+ |
…=matklad Unescaping cleanups Some code improvements, and some error message improvements. Best reviewed one commit at a time. r? `@matklad`
…=matklad Unescaping cleanups Some code improvements, and some error message improvements. Best reviewed one commit at a time. r? ``@matklad``
The `usize` isn't needed in the error case.
Makes sense. We do use it here but I think that’s not essential and using zero offfset would work. btw, consider sending a PR to upgrade the crate to rust-analyzer, once we get a new ap version, it might be helpful to learn how we use it from that side. @bors r+ |
Oh, I overlooked that. I thought rust-analyzer was getting built by the rustc build system. Is that not the case?
Can you expand on this? I don't really understand the rustc/rust-analyzer integration, so I don't understand everything in that sentence. |
…=matklad Unescaping cleanups Some code improvements, and some error message improvements. Best reviewed one commit at a time. r? `@matklad`
The main insight is that rust-analyzer is a bog-standard rust project. It’s build using In particular, this is how it gets rustc_lexer: so what you’d need to do is to wait for the next auto-published version (rustc crates are published every night I think), and than change the dep in Cargo toml. We could somehow entangle rust-analyzer and rustc repos, to make sure they use the same source, but that would break “ra is a normal cargo project”, and that I consider to be super-important property. As an alternative, we could make both rustc and rust-analyzer depend on crates.io version of the crate with a nice semver-stable API. In general, this is a bad idea for compiler crates (semver-stability is very costly), but for lexer specifically it might make sense: pulling the “official” Rust lexer from crates.io sounds nifty. Finally, a third alternative would be to change |
Thanks for the info. My rust repo has |
I think we are using some fancy fit feature here, but I don’t know, cc @Veykril the source of truth for rust-analyzer is https://github.com/rust-lang/rust-analyzer (And I most definitely not even thinking about reading “optimizing rust-analyzer in 2022” blog post :) ) |
rust-analyzer is pulled in as a git subtree nowadays (and we are currently having trouble with syncing changes from Note that we are currently 2 versions behind on the latest autopublished rustc lexer crate, updating it requires some slight changes due to the token prefix stuff that was added a while ago if I recall correctly |
…=matklad Unescaping cleanups Some code improvements, and some error message improvements. Best reviewed one commit at a time. r? `@matklad`
…=matklad Unescaping cleanups Some code improvements, and some error message improvements. Best reviewed one commit at a time. r? ``@matklad``
…=matklad Unescaping cleanups Some code improvements, and some error message improvements. Best reviewed one commit at a time. r? ```@matklad```
Rollup of 7 pull requests Successful merges: - rust-lang#103570 (Stabilize integer logarithms) - rust-lang#103694 (Add documentation examples for `pointer::mask`) - rust-lang#103919 (Unescaping cleanups) - rust-lang#103933 (Promote {aarch64,i686,x86_64}-unknown-uefi to Tier 2) - rust-lang#103952 (Don't intra linkcheck reference) - rust-lang#104111 (rustdoc: Add mutable to the description) - rust-lang#104125 (Const Compare for Tuples) Failed merges: r? `@ghost` `@rustbot` modify labels: rollup
…=matklad Unescaping cleanups Some code improvements, and some error message improvements. Best reviewed one commit at a time. r? `````@matklad`````
…=matklad Unescaping cleanups Some code improvements, and some error message improvements. Best reviewed one commit at a time. r? ````@matklad````
Rollup of 7 pull requests Successful merges: - rust-lang#103570 (Stabilize integer logarithms) - rust-lang#103694 (Add documentation examples for `pointer::mask`) - rust-lang#103919 (Unescaping cleanups) - rust-lang#103933 (Promote {aarch64,i686,x86_64}-unknown-uefi to Tier 2) - rust-lang#103952 (Don't intra linkcheck reference) - rust-lang#104111 (rustdoc: Add mutable to the description) - rust-lang#104125 (Const Compare for Tuples) Failed merges: r? `@ghost` `@rustbot` modify labels: rollup
…=matklad Unescaping cleanups Some code improvements, and some error message improvements. Best reviewed one commit at a time. r? ````@matklad````
Some code improvements, and some error message improvements.
Best reviewed one commit at a time.
r? @matklad