-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More lexer improvements #102302
More lexer improvements #102302
Conversation
`parse_token_tree` is basically a match with four arms: `Eof`, `OpenDelim`, `CloseDelim`, and "other". It has two call sites, and at each call site one of the arms is unreachable. It's also not inlined. This commit removes `parse_token_tree` by splitting it into four functions and inlining them. This avoids some repeated conditional tests and also some non-inlined function calls on the hot path.
It has no useful effect.
Currently does the "is this a `#!` at the start of the file?" check for every single token(!) This commit moves it so it only happens once.
The spacing computation is done in two parts. In the first part `next_token` and `bump` use `Spacing::Alone` to mean "preceded by whitespace" and `Spacing::Joint` to mean the opposite. In the second part `parse_token_tree_other` then adjusts the `spacing` value to mean the usual thing (i.e. "is the following token joinable punctuation?"). This shift in meaning is very confusing and it took me some time to understand what was going on. This commit changes the first part to use a bool, and adds some comments, which makes things much clearer.
It's an unnecessary layer that obfuscates when I am looking for optimizations.
Instead of replacing `TokenTreesReader::token` in two steps, we can just do it in one, which is both simpler and faster.
`TokenTreesReader` wraps a `StringReader`, but the `into_token_trees` function obscures this. This commit moves to a more straightforward control flow.
`Cursor` is currently hidden, and the main tokenization path uses `rustc_lexer::first_token` which involves constructing a new `Cursor` for every single token, which is weird. Also, `first_token` also can't handle empty input, so callers have to check for that first. This commit makes `Cursor` public, so `StringReader` can contain a `Cursor`, which results in a simpler structure. The commit also changes `StringReader::advance_token` so it returns an `Option<Token>`, simplifying the the empty input case.
This is a case where a small amount of repetition results in code that is faster and easier to read.
`Cursor` keeps track of the position within the current token. But it uses confusing names that don't make it clear that the "length consumed" is just within the current token. This commit renames things to make this clearer.
For alignment with `rust_ast::TokenKind::Eof`. Plus it's a bit faster, due to less `Option` manipulation in `StringReader::next_token`.
This is a small performance win, alas.
@bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit e21850681d4e48218ec0bd27599aef588ce46c33 with merge 1eb81491bc90d976ff779a3082c0ac465e647ee0... |
☀️ Try build successful - checks-actions |
Queued 1eb81491bc90d976ff779a3082c0ac465e647ee0 with parent 72f4923, future comparison URL. |
Finished benchmarking commit (1eb81491bc90d976ff779a3082c0ac465e647ee0): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Footnotes |
Add some comments, and mark one path as unreachable.
These make the delimiter processing clearer.
e218506
to
7f7e216
Compare
Best reviewed one commit at a time. r? @matklad |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this all basically looks great to me!
One thing I am less certain about is that rustc_lexer's interface was specifically crafted for robust incremental re-lexing, but that's not really to important, as long as it continues to be valid in practice to create a fresh Cursor from input's midpoint.
@bors delegate+
I have addressed the comments except where explained above. Thank you for the fast and helpful review. @bors r=matklad |
☀️ Test successful - checks-actions |
Finished benchmarking commit (6201eab): comparison URL. Overall result: ✅ improvements - no action needed@rustbot label: -perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Footnotes |
… r=matklad More lexer improvements A follow-up to rust-lang#99884. r? `@matklad`
A follow-up to #99884.
r? @matklad