Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More lexer improvements #102302

Merged
merged 15 commits into from
Sep 28, 2022
Merged

Conversation

nnethercote
Copy link
Contributor

@nnethercote nnethercote commented Sep 26, 2022

A follow-up to #99884.

r? @matklad

`parse_token_tree` is basically a match with four arms: `Eof`,
`OpenDelim`, `CloseDelim`, and "other". It has two call sites, and at
each call site one of the arms is unreachable. It's also not inlined.

This commit removes `parse_token_tree` by splitting it into four
functions and inlining them. This avoids some repeated conditional
tests and also some non-inlined function calls on the hot path.
It has no useful effect.
Currently does the "is this a `#!` at the start of the file?" check for
every single token(!)

This commit moves it so it only happens once.
The spacing computation is done in two parts. In the first part
`next_token` and `bump` use `Spacing::Alone` to mean "preceded by
whitespace" and `Spacing::Joint` to mean the opposite. In the second
part `parse_token_tree_other` then adjusts the `spacing` value to mean
the usual thing (i.e. "is the following token joinable punctuation?").
This shift in meaning is very confusing and it took me some time to
understand what was going on.

This commit changes the first part to use a bool, and adds some
comments, which makes things much clearer.
It's an unnecessary layer that obfuscates when I am looking for
optimizations.
Instead of replacing `TokenTreesReader::token` in two steps, we can just
do it in one, which is both simpler and faster.
`TokenTreesReader` wraps a `StringReader`, but the `into_token_trees`
function obscures this. This commit moves to a more straightforward
control flow.
`Cursor` is currently hidden, and the main tokenization path uses
`rustc_lexer::first_token` which involves constructing a new `Cursor`
for every single token, which is weird. Also, `first_token` also can't
handle empty input, so callers have to check for that first.

This commit makes `Cursor` public, so `StringReader` can contain a
`Cursor`, which results in a simpler structure. The commit also changes
`StringReader::advance_token` so it returns an `Option<Token>`,
simplifying the the empty input case.
This is a case where a small amount of repetition results in code that
is faster and easier to read.
`Cursor` keeps track of the position within the current token. But it
uses confusing names that don't make it clear that the "length consumed"
is just within the current token.

This commit renames things to make this clearer.
For alignment with `rust_ast::TokenKind::Eof`. Plus it's a bit faster,
due to less `Option` manipulation in `StringReader::next_token`.
This is a small performance win, alas.
@nnethercote
Copy link
Contributor Author

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Sep 26, 2022
@bors
Copy link
Contributor

bors commented Sep 26, 2022

⌛ Trying commit e21850681d4e48218ec0bd27599aef588ce46c33 with merge 1eb81491bc90d976ff779a3082c0ac465e647ee0...

@bors
Copy link
Contributor

bors commented Sep 26, 2022

☀️ Try build successful - checks-actions
Build commit: 1eb81491bc90d976ff779a3082c0ac465e647ee0 (1eb81491bc90d976ff779a3082c0ac465e647ee0)

@rust-timer
Copy link
Collaborator

Queued 1eb81491bc90d976ff779a3082c0ac465e647ee0 with parent 72f4923, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (1eb81491bc90d976ff779a3082c0ac465e647ee0): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean1 range count2
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
1.3% [1.2%, 1.6%] 6
Improvements ✅
(primary)
-0.5% [-1.0%, -0.2%] 76
Improvements ✅
(secondary)
-0.7% [-2.0%, -0.3%] 74
All ❌✅ (primary) -0.5% [-1.0%, -0.2%] 76

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean1 range count2
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-2.3% [-2.3%, -2.3%] 1
All ❌✅ (primary) - - 0

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean1 range count2
Regressions ❌
(primary)
2.3% [2.3%, 2.3%] 1
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-2.7% [-4.0%, -2.0%] 3
Improvements ✅
(secondary)
-2.7% [-3.2%, -2.1%] 6
All ❌✅ (primary) -1.4% [-4.0%, 2.3%] 4

Footnotes

  1. the arithmetic mean of the percent change 2 3

  2. number of relevant changes 2 3

@rustbot rustbot added perf-regression Performance regression. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Sep 26, 2022
@nnethercote nnethercote marked this pull request as draft September 26, 2022 20:31
Add some comments, and mark one path as unreachable.
These make the delimiter processing clearer.
@nnethercote nnethercote force-pushed the more-lexer-improvements branch from e218506 to 7f7e216 Compare September 27, 2022 05:56
@rustbot rustbot added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. labels Sep 27, 2022
@nnethercote nnethercote marked this pull request as ready for review September 27, 2022 05:56
@nnethercote
Copy link
Contributor Author

Best reviewed one commit at a time.

r? @matklad

Copy link
Member

@matklad matklad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this all basically looks great to me!

One thing I am less certain about is that rustc_lexer's interface was specifically crafted for robust incremental re-lexing, but that's not really to important, as long as it continues to be valid in practice to create a fresh Cursor from input's midpoint.

@bors delegate+

compiler/rustc_parse/src/lexer/tokentrees.rs Outdated Show resolved Hide resolved
compiler/rustc_parse/src/lexer/tokentrees.rs Show resolved Hide resolved
compiler/rustc_parse/src/lexer/tokentrees.rs Outdated Show resolved Hide resolved
compiler/rustc_parse/src/lexer/mod.rs Show resolved Hide resolved
compiler/rustc_parse/src/lexer/mod.rs Show resolved Hide resolved
compiler/rustc_parse/src/lexer/tokentrees.rs Show resolved Hide resolved
compiler/rustc_lexer/src/lib.rs Outdated Show resolved Hide resolved
compiler/rustc_lexer/src/lib.rs Show resolved Hide resolved
compiler/rustc_lexer/src/cursor.rs Show resolved Hide resolved
compiler/rustc_parse/src/lexer/mod.rs Show resolved Hide resolved
@nnethercote
Copy link
Contributor Author

I have addressed the comments except where explained above. Thank you for the fast and helpful review.

@bors r=matklad

@bors
Copy link
Contributor

bors commented Sep 28, 2022

📌 Commit d0a26ac has been approved by matklad

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 28, 2022
@bors
Copy link
Contributor

bors commented Sep 28, 2022

⌛ Testing commit d0a26ac with merge 6201eab...

@bors
Copy link
Contributor

bors commented Sep 28, 2022

☀️ Test successful - checks-actions
Approved by: matklad
Pushing 6201eab to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Sep 28, 2022
@bors bors merged commit 6201eab into rust-lang:master Sep 28, 2022
@rustbot rustbot added this to the 1.66.0 milestone Sep 28, 2022
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (6201eab): comparison URL.

Overall result: ✅ improvements - no action needed

@rustbot label: -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean1 range count2
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
2.2% [2.2%, 2.2%] 1
Improvements ✅
(primary)
-0.5% [-1.0%, -0.2%] 80
Improvements ✅
(secondary)
-0.8% [-1.8%, -0.3%] 75
All ❌✅ (primary) -0.5% [-1.0%, -0.2%] 80

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean1 range count2
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
4.2% [4.2%, 4.2%] 1
Improvements ✅
(primary)
-3.3% [-3.3%, -3.3%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -3.3% [-3.3%, -3.3%] 1

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean1 range count2
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-2.1% [-2.1%, -2.1%] 1
Improvements ✅
(secondary)
-3.2% [-4.3%, -2.4%] 3
All ❌✅ (primary) -2.1% [-2.1%, -2.1%] 1

Footnotes

  1. the arithmetic mean of the percent change 2 3

  2. number of relevant changes 2 3

@rustbot rustbot removed the perf-regression Performance regression. label Sep 28, 2022
@nnethercote nnethercote deleted the more-lexer-improvements branch October 26, 2022 03:10
Aaron1011 pushed a commit to Aaron1011/rust that referenced this pull request Jan 6, 2023
… r=matklad

More lexer improvements

A follow-up to rust-lang#99884.

r? `@matklad`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants