More lexer improvements #102302

nnethercote · 2022-09-26T09:28:23Z

A follow-up to #99884.

`parse_token_tree` is basically a match with four arms: `Eof`, `OpenDelim`, `CloseDelim`, and "other". It has two call sites, and at each call site one of the arms is unreachable. It's also not inlined. This commit removes `parse_token_tree` by splitting it into four functions and inlining them. This avoids some repeated conditional tests and also some non-inlined function calls on the hot path.

It has no useful effect.

Currently does the "is this a `#!` at the start of the file?" check for every single token(!) This commit moves it so it only happens once.

The spacing computation is done in two parts. In the first part `next_token` and `bump` use `Spacing::Alone` to mean "preceded by whitespace" and `Spacing::Joint` to mean the opposite. In the second part `parse_token_tree_other` then adjusts the `spacing` value to mean the usual thing (i.e. "is the following token joinable punctuation?"). This shift in meaning is very confusing and it took me some time to understand what was going on. This commit changes the first part to use a bool, and adds some comments, which makes things much clearer.

It's an unnecessary layer that obfuscates when I am looking for optimizations.

Instead of replacing `TokenTreesReader::token` in two steps, we can just do it in one, which is both simpler and faster.

`TokenTreesReader` wraps a `StringReader`, but the `into_token_trees` function obscures this. This commit moves to a more straightforward control flow.

`Cursor` is currently hidden, and the main tokenization path uses `rustc_lexer::first_token` which involves constructing a new `Cursor` for every single token, which is weird. Also, `first_token` also can't handle empty input, so callers have to check for that first. This commit makes `Cursor` public, so `StringReader` can contain a `Cursor`, which results in a simpler structure. The commit also changes `StringReader::advance_token` so it returns an `Option<Token>`, simplifying the the empty input case.

This is a case where a small amount of repetition results in code that is faster and easier to read.

`Cursor` keeps track of the position within the current token. But it uses confusing names that don't make it clear that the "length consumed" is just within the current token. This commit renames things to make this clearer.

For alignment with `rust_ast::TokenKind::Eof`. Plus it's a bit faster, due to less `Option` manipulation in `StringReader::next_token`.

This is a small performance win, alas.

nnethercote · 2022-09-26T09:28:49Z

@bors try @rust-timer queue

rust-timer · 2022-09-26T09:28:50Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2022-09-26T09:28:59Z

⌛ Trying commit e21850681d4e48218ec0bd27599aef588ce46c33 with merge 1eb81491bc90d976ff779a3082c0ac465e647ee0...

bors · 2022-09-26T10:53:14Z

☀️ Try build successful - checks-actions
Build commit: 1eb81491bc90d976ff779a3082c0ac465e647ee0 (1eb81491bc90d976ff779a3082c0ac465e647ee0)

rust-timer · 2022-09-26T10:53:15Z

Queued 1eb81491bc90d976ff779a3082c0ac465e647ee0 with parent 72f4923, future comparison URL.

rust-timer · 2022-09-26T12:09:42Z

Finished benchmarking commit (1eb81491bc90d976ff779a3082c0ac465e647ee0): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean¹	range	count²
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	1.3%	[1.2%, 1.6%]	6
Improvements ✅ (primary)	-0.5%	[-1.0%, -0.2%]	76
Improvements ✅ (secondary)	-0.7%	[-2.0%, -0.3%]	74
All ❌✅ (primary)	-0.5%	[-1.0%, -0.2%]	76

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean¹	range	count²
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-2.3%	[-2.3%, -2.3%]	1
All ❌✅ (primary)	-	-	0

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean¹	range	count²
Regressions ❌ (primary)	2.3%	[2.3%, 2.3%]	1
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-2.7%	[-4.0%, -2.0%]	3
Improvements ✅ (secondary)	-2.7%	[-3.2%, -2.1%]	6
All ❌✅ (primary)	-1.4%	[-4.0%, 2.3%]	4

the arithmetic mean of the percent change ↩ ↩² ↩³
number of relevant changes ↩ ↩² ↩³

Add some comments, and mark one path as unreachable.

These make the delimiter processing clearer.

nnethercote · 2022-09-27T05:57:34Z

Best reviewed one commit at a time.

r? @matklad

matklad

Yeah, this all basically looks great to me!

One thing I am less certain about is that rustc_lexer's interface was specifically crafted for robust incremental re-lexing, but that's not really to important, as long as it continues to be valid in practice to create a fresh Cursor from input's midpoint.

@bors delegate+

compiler/rustc_parse/src/lexer/tokentrees.rs

compiler/rustc_parse/src/lexer/mod.rs

compiler/rustc_parse/src/lexer/tokentrees.rs

compiler/rustc_lexer/src/lib.rs

compiler/rustc_lexer/src/cursor.rs

compiler/rustc_parse/src/lexer/mod.rs

nnethercote · 2022-09-28T01:37:46Z

I have addressed the comments except where explained above. Thank you for the fast and helpful review.

@bors r=matklad

bors · 2022-09-28T01:37:48Z

📌 Commit d0a26ac has been approved by matklad

It is now in the queue for this repository.

bors · 2022-09-28T08:14:07Z

⌛ Testing commit d0a26ac with merge 6201eab...

bors · 2022-09-28T11:11:32Z

☀️ Test successful - checks-actions
Approved by: matklad
Pushing 6201eab to master...

rust-timer · 2022-09-28T12:44:35Z

Finished benchmarking commit (6201eab): comparison URL.

Overall result: ✅ improvements - no action needed

@rustbot label: -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean¹	range	count²
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	2.2%	[2.2%, 2.2%]	1
Improvements ✅ (primary)	-0.5%	[-1.0%, -0.2%]	80
Improvements ✅ (secondary)	-0.8%	[-1.8%, -0.3%]	75
All ❌✅ (primary)	-0.5%	[-1.0%, -0.2%]	80

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean¹	range	count²
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	4.2%	[4.2%, 4.2%]	1
Improvements ✅ (primary)	-3.3%	[-3.3%, -3.3%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-3.3%	[-3.3%, -3.3%]	1

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean¹	range	count²
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-2.1%	[-2.1%, -2.1%]	1
Improvements ✅ (secondary)	-3.2%	[-4.3%, -2.4%]	3
All ❌✅ (primary)	-2.1%	[-2.1%, -2.1%]	1

the arithmetic mean of the percent change ↩ ↩² ↩³
number of relevant changes ↩ ↩² ↩³

… r=matklad More lexer improvements A follow-up to rust-lang#99884. r? `@matklad`

nnethercote added 12 commits September 26, 2022 08:28

Remove unnecessary spacing assignment.

14281e6

It has no useful effect.

Move #! checking.

9640d1c

Currently does the "is this a `#!` at the start of the file?" check for every single token(!) This commit moves it so it only happens once.

Remove TokenTreesReader::bump.

5b2075e

It's an unnecessary layer that obfuscates when I am looking for optimizations.

Remove ast::Token::take.

33ba277

Instead of replacing `TokenTreesReader::token` in two steps, we can just do it in one, which is both simpler and faster.

[ui] Rearrange StringReader/TokenTreesReader creation.

33516ac

`TokenTreesReader` wraps a `StringReader`, but the `into_token_trees` function obscures this. This commit moves to a more straightforward control flow.

Use less DRY in cook_lexer_token.

ceb25d1

This is a case where a small amount of repetition results in code that is faster and easier to read.

Rename some things.

cc0022a

`Cursor` keeps track of the position within the current token. But it uses confusing names that don't make it clear that the "length consumed" is just within the current token. This commit renames things to make this clearer.

Add rustc_lexer::TokenKind::Eof.

da84f0f

For alignment with `rust_ast::TokenKind::Eof`. Plus it's a bit faster, due to less `Option` manipulation in `StringReader::next_token`.

Inline and remove cook_lexer_token.

fb4dba0

This is a small performance win, alas.

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Sep 26, 2022

rustbot added perf-regression Performance regression. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Sep 26, 2022

nnethercote marked this pull request as draft September 26, 2022 20:31

nnethercote added 2 commits September 27, 2022 09:53

Minor improvements.

880ebb6

Add some comments, and mark one path as unreachable.

Rename some variables.

7f7e216

These make the delimiter processing clearer.

nnethercote force-pushed the more-lexer-improvements branch from e218506 to 7f7e216 Compare September 27, 2022 05:56

rustbot added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. labels Sep 27, 2022

nnethercote marked this pull request as ready for review September 27, 2022 05:56

rust-highfive assigned matklad Sep 27, 2022

matklad approved these changes Sep 27, 2022

View reviewed changes

Address review comments.

d0a26ac

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 28, 2022

bors added the merged-by-bors This PR was explicitly merged by bors. label Sep 28, 2022

bors merged commit 6201eab into rust-lang:master Sep 28, 2022

rustbot added this to the 1.66.0 milestone Sep 28, 2022

bors mentioned this pull request Sep 28, 2022

Give tip for unicode chars which might not be visible when rendered #100439

Closed

rustbot removed the perf-regression Performance regression. label Sep 28, 2022

nnethercote deleted the more-lexer-improvements branch October 26, 2022 03:10

JohnTitor mentioned this pull request Dec 3, 2022

fix: Fix broken links rust-lang/rustc-dev-guide#1522

Merged

Aaron1011 pushed a commit to Aaron1011/rust that referenced this pull request Jan 6, 2023

Auto merge of rust-lang#102302 - nnethercote:more-lexer-improvements,…

ebc5fd4

… r=matklad More lexer improvements A follow-up to rust-lang#99884. r? `@matklad`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More lexer improvements #102302

More lexer improvements #102302

nnethercote commented Sep 26, 2022 •

edited

Loading

nnethercote commented Sep 26, 2022

rust-timer commented Sep 26, 2022

bors commented Sep 26, 2022

bors commented Sep 26, 2022

rust-timer commented Sep 26, 2022

rust-timer commented Sep 26, 2022

nnethercote commented Sep 27, 2022

matklad left a comment

nnethercote commented Sep 28, 2022

bors commented Sep 28, 2022

bors commented Sep 28, 2022

bors commented Sep 28, 2022

rust-timer commented Sep 28, 2022

More lexer improvements #102302

More lexer improvements #102302

Conversation

nnethercote commented Sep 26, 2022 • edited Loading

nnethercote commented Sep 26, 2022

rust-timer commented Sep 26, 2022

bors commented Sep 26, 2022

bors commented Sep 26, 2022

rust-timer commented Sep 26, 2022

rust-timer commented Sep 26, 2022

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Instruction count

Max RSS (memory usage)

Cycles

Footnotes

nnethercote commented Sep 27, 2022

matklad left a comment

Choose a reason for hiding this comment

nnethercote commented Sep 28, 2022

bors commented Sep 28, 2022

bors commented Sep 28, 2022

bors commented Sep 28, 2022

rust-timer commented Sep 28, 2022

Overall result: ✅ improvements - no action needed

Instruction count

Max RSS (memory usage)

Cycles

Footnotes

nnethercote commented Sep 26, 2022 •

edited

Loading