Speed up `TokenCursor` #96210

nnethercote · 2022-04-19T09:45:51Z

Plus a few related clean-ups.

This makes it more like `CursorRef::next_with_spacing`. There is no performance effect, just a consistency improvement.

They both have a single call site.

It has a single call site.

And likewise for the inlined variants. I did this for simplicity, but interesting it was a performance win as well.

In particular, avoid wrapping a token within `TokenTree::Token` and then immediately matching it and returning the token within. Just return the token immediately.

This will facilitate the change in the next commit. `boolean` arguments aren't great, but the function is only used in three places within this one file.

Instead of letting the next iteration of the loop handle it.

nnethercote · 2022-04-19T09:48:53Z

Some local instruction count measurements:

Benchmark	Profile	Scenario	% Change	Significance Factor?
time-macros-0.2.3	check	full	-4.31%	21.54x
async-std-1.10.0	check	full	-4.17%	20.85x
tt-muncher	check	full	-3.34%	16.68x
yansi-0.5.0	check	full	-2.16%	10.80x
quote-stress	check	full	-2.11%	10.57x
web-sys-0.3.56	check	full	-1.09%	5.46x
num-derive-0.3.3	check	full	-1.09%	5.45x
ctor-0.1.21	check	full	-1.05%	5.27x
futures-macro-0.3.19	check	full	-0.95%	4.76x
mockall_derive-0.11.0	check	full	-0.89%	4.47x
wasm-bindgen-backend-0.2.79	check	full	-0.88%	4.42x
tonic-build-0.7.0	check	full	-0.87%	4.33x
vsdb_derive-0.21.1	check	full	-0.84%	4.20x
pest_generator-2.1.3	check	full	-0.83%	4.13x
enum-as-inner-0.4.0	check	full	-0.81%	4.06x
stdweb-derive-0.5.3	check	full	-0.81%	4.03x
scroll_derive-0.11.0	check	full	-0.80%	4.02x
serde_derive-1.0.136	check	full	-0.75%	3.73x
clap_derive-3.1.7	check	full	-0.73%	3.67x
ref-cast-impl-1.0.6	check	full	-0.73%	3.63x
wayland-scanner-0.29.4	check	full	-0.71%	3.55x
prost-derive-0.10.0	check	full	-0.69%	3.47x
pyo3-macros-backend-0.16.3	check	full	-0.69%	3.46x
diesel_derives-1.4.1	check	full	-0.67%	3.36x
inotify-0.10.0	check	full	-0.66%	3.32x
funty-2.0.0	check	full	-0.63%	3.13x
enumflags2_derive-0.7.4	check	full	-0.61%	3.06x
deep-vector	check	full	-0.48%	2.40x
nix-0.23.1	check	full	-0.47%	2.33x
futures-lite-1.12.0	check	full	-0.33%	1.66x
externs	check	full	-0.31%	1.55x

The cycles counts are even better, e.g.:

Benchmark	Profile	Scenario	% Change	Significance Factor?
tt-muncher	check	full	-9.67%	48.36x
async-std-1.10.0	check	full	-8.13%	40.67x
quote-stress	check	full	-7.13%	35.65x
time-macros-0.2.3	check	full	-6.77%	33.85x
token-stream-stress	check	full	-3.95%	19.75x
yansi-0.5.0	check	full	-3.74%	18.72x
web-sys-0.3.56	check	full	-3.50%	17.48x
tokio-1.17.0	check	full	-3.22%	16.11x
log-0.4.16	check	full	-3.20%	15.98x
futures-macro-0.3.19	check	full	-3.18%	15.92x
inotify-0.10.0	check	full	-3.14%	15.69x
issue-88862	check	full	2.98%	14.89x
externs	check	full	-2.62%	13.08x
unify-linearly	check	full	-2.47%	12.36x
enumflags2_derive-0.7.4	check	full	-2.14%	10.69x
mockall_derive-0.11.0	check	full	-2.09%	10.43x
ref-cast-impl-1.0.6	check	full	-1.98%	9.88x
tonic-build-0.7.0	check	full	-1.92%	9.59x
scroll_derive-0.11.0	check	full	-1.86%	9.31x
stdweb-derive-0.5.3	check	full	-1.76%	8.79x

nnethercote · 2022-04-19T09:49:12Z

@bors try @rust-timer queue

rust-timer · 2022-04-19T09:49:14Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2022-04-19T09:49:21Z

⌛ Trying commit 36c7c829e99b3785de94e6113b4abb4ff0410e9c with merge efe204aef96605ab9df88a21434b2b571a770ede...

bors · 2022-04-19T11:13:12Z

☀️ Try build successful - checks-actions
Build commit: efe204aef96605ab9df88a21434b2b571a770ede (efe204aef96605ab9df88a21434b2b571a770ede)

rust-timer · 2022-04-19T11:13:13Z

Queued efe204aef96605ab9df88a21434b2b571a770ede with parent e2661ba, future comparison URL.

rust-timer · 2022-04-19T12:14:25Z

Finished benchmarking commit (efe204aef96605ab9df88a21434b2b571a770ede): comparison url.

Summary:

Primary benchmarks: 🎉 relevant improvements found
Secondary benchmarks: 🎉 relevant improvements found

	Regressions 😿 (primary)	Regressions 😿 (secondary)	Improvements 🎉 (primary)	Improvements 🎉 (secondary)	All 😿 🎉 (primary)
count¹	0	0	22	20	22
mean²	N/A	N/A	-0.3%	-1.5%	-0.3%
max	N/A	N/A	-0.6%	-4.1%	-0.6%

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf -perf-regression

number of relevant changes ↩
the arithmetic mean of the percent change ↩

nnethercote · 2022-04-19T21:24:35Z

tt-muncher is most affected. The instruction count results for it are good, with 1-4% reductions. But the wall-time results are amazing, with 6-16% reductions. My guess would be this is due to fewer branch mispredictions.

It's not hot, so shouldn't be within the always inlined part.

nnethercote · 2022-04-19T22:54:55Z

I have fixed the tidy error.

Cachegrind indicates that the outsized wall-time speedup for tt-muncher is actually due to fewer loads and stores in and around Parser::bump(), though it was hard to tell exactly which part of the change was responsible.

The `DelimToken` here is `NoDelim`, which means the returned delim tokens will just be ignored by `Parser::bump()`. This commit changes things so the delim tokens won't be returned.

Because it's now always true.

The loop is there to handle a `NoDelim` open/close token. This commit changes `TokenCursor::inlined_next` so it never returns such a token. This is a performance win because the conditional test in `bump()` is removed. If the parser needs changing in the future to handle `NoDelim` tokens, then `inlined_next()` can easily be changed to return them.

compiler/rustc_parse/src/parser/mod.rs

petrochenkov · 2022-04-20T11:31:16Z

compiler/rustc_parse/src/parser/mod.rs

-        if self.prev_token.kind == TokenKind::Eof {
-            let msg = "attempted to bump the parser past EOF (may be stuck in a loop)";
-            self.span_bug(self.token.span, msg);
-        }


We 100% have seen this error, and relatively commonly, until relevant bugs in the parser were fixed.
But it was several years ago already.

compiler/rustc_parse/src/parser/mod.rs

Surprisingly, this is a non-trivial performance win.

This makes `CloseDelim` handling more like `OpenDelim` handling, which produces `OpenDelim` and pushes the stack at the same time. It requires some adjustment to `parse_token_tree` now that we don't remain within the frame after getting the `CloseDelim`.

This lets us clone just the parts within a `TokenTree` that need cloning, rather than the entire thing. This is a surprisingly large performance win, up to 4% on `async-std-1.10.0`.

nnethercote · 2022-04-21T05:28:28Z

I have added two new commits. The first changes how CloseDelim is handled, making things neater, and also putting all the NoDelim handling within inlined_next. The second avoids some cloning for more performance improvements. Updated instruction counts:

Benchmark	Profile	Scenario	% Change	Significance Factor?
async-std-1.10.0	check	full	-18.07%	90.33x
time-macros-0.2.3	check	full	-16.77%	83.87x
tt-muncher	check	full	-15.11%	75.53x
yansi-0.5.0	check	full	-9.16%	45.82x
ctor-0.1.21	check	full	-4.48%	22.42x
num-derive-0.3.3	check	full	-4.47%	22.33x
web-sys-0.3.56	check	full	-4.43%	22.14x
futures-macro-0.3.19	check	full	-3.90%	19.48x
tonic-build-0.7.0	check	full	-3.70%	18.51x
mockall_derive-0.11.0	check	full	-3.69%	18.46x
wasm-bindgen-backend-0.2.79	check	full	-3.68%	18.42x
vsdb_derive-0.21.1	check	full	-3.61%	18.03x
scroll_derive-0.11.0	check	full	-3.60%	18.02x
pest_generator-2.1.3	check	full	-3.53%	17.63x
stdweb-derive-0.5.3	check	full	-3.43%	17.16x
enum-as-inner-0.4.0	check	full	-3.41%	17.05x
serde_derive-1.0.136	check	full	-3.23%	16.17x
clap_derive-3.1.7	check	full	-3.22%	16.10x
ref-cast-impl-1.0.6	check	full	-3.12%	15.61x
prost-derive-0.10.0	check	full	-3.07%	15.36x
wayland-scanner-0.29.4	check	full	-3.05%	15.23x
pyo3-macros-backend-0.16.3	check	full	-3.02%	15.09x

petrochenkov · 2022-04-21T11:44:14Z

@bors r+

bors · 2022-04-21T11:44:16Z

📌 Commit 643e9f7 has been approved by petrochenkov

bors · 2022-04-21T13:45:18Z

⌛ Testing commit 643e9f7 with merge e35a533fb398e98b318502c2920f82e110aad604...

rust-log-analyzer · 2022-04-21T15:36:13Z

A job failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

bors · 2022-04-21T15:36:35Z

💔 Test failed - checks-actions

lqd · 2022-04-21T15:41:32Z

@bors retry "The hosted runner: GitHub Actions 5 lost communication with the server" on dist-aarch64-apple

bors · 2022-04-21T15:42:53Z

⌛ Testing commit 643e9f7 with merge b04c532...

bors · 2022-04-21T18:31:51Z

☀️ Test successful - checks-actions
Approved by: petrochenkov
Pushing b04c532 to master...

rust-timer · 2022-04-21T19:46:31Z

Finished benchmarking commit (b04c532): comparison url.

Summary:

Primary benchmarks: 🎉 relevant improvements found
Secondary benchmarks: 🎉 relevant improvements found

	Regressions 😿 (primary)	Regressions 😿 (secondary)	Improvements 🎉 (primary)	Improvements 🎉 (secondary)	All 😿 🎉 (primary)
count¹	0	0	94	83	94
mean²	N/A	N/A	-0.9%	-2.4%	-0.9%
max	N/A	N/A	-2.4%	-17.2%	-2.4%

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

@rustbot label: -perf-regression

number of relevant changes ↩
the arithmetic mean of the percent change ↩

nnethercote added 7 commits April 19, 2022 17:02

Tweak Cursor::next_with_spacing.

ad566b7

This makes it more like `CursorRef::next_with_spacing`. There is no performance effect, just a consistency improvement.

Inline and remove TokenTree::{open_tt,close_tt}.

aefbbee

They both have a single call site.

Inline and remove Parser::next_tok().

89ec75b

It has a single call site.

Merge TokenCursor::{next,next_desugared}.

b1e6dee

And likewise for the inlined variants. I did this for simplicity, but interesting it was a performance win as well.

Rearrange TokenCursor::inlined_next().

0231754

In particular, avoid wrapping a token within `TokenTree::Token` and then immediately matching it and returning the token within. Just return the token immediately.

Add {open,close}_delim arguments to TokenCursorFrame::new().

29c78cc

This will facilitate the change in the next commit. `boolean` arguments aren't great, but the function is only used in three places within this one file.

Handle Delimited opening immediately.

d235ac7

Instead of letting the next iteration of the loop handle it.

rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Apr 19, 2022

rust-highfive assigned petrochenkov Apr 19, 2022

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Apr 19, 2022

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 19, 2022

This comment has been minimized.

Sign in to view

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 19, 2022

Move desugaring code into its own function.

f1c32c1

It's not hot, so shouldn't be within the always inlined part.

nnethercote force-pushed the speed-up-TokenCursor branch from 36c7c82 to e0b2272 Compare April 19, 2022 22:53

nnethercote added 5 commits April 20, 2022 11:48

Add a size assertion for Parser.

804103b

Use true for open_delim/close_delim in one spot.

86723d3

The `DelimToken` here is `NoDelim`, which means the returned delim tokens will just be ignored by `Parser::bump()`. This commit changes things so the delim tokens won't be returned.

Remove TokenCursorFrame::open_delim.

3cd5e34

Because it's now always true.

Inline Cursor::next_with_spacing.

5b653c1

petrochenkov reviewed Apr 20, 2022

View reviewed changes

petrochenkov added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 20, 2022

nnethercote added 3 commits April 21, 2022 09:21

Avoid some tuple destructuring.

7a89255

Surprisingly, this is a non-trivial performance win.

Introduced Cursor::next_with_spacing_ref.

643e9f7

This lets us clone just the parts within a `TokenTree` that need cloning, rather than the entire thing. This is a surprisingly large performance win, up to 4% on `async-std-1.10.0`.

nnethercote force-pushed the speed-up-TokenCursor branch from e5d0a36 to 643e9f7 Compare April 21, 2022 05:25

petrochenkov added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Apr 21, 2022

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 21, 2022

bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Apr 21, 2022

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 21, 2022

bors added the merged-by-bors This PR was explicitly merged by bors. label Apr 21, 2022

bors merged commit b04c532 into rust-lang:master Apr 21, 2022

rustbot added this to the 1.62.0 milestone Apr 21, 2022

nnethercote deleted the speed-up-TokenCursor branch April 21, 2022 20:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up `TokenCursor` #96210

Speed up `TokenCursor` #96210

nnethercote commented Apr 19, 2022

nnethercote commented Apr 19, 2022

nnethercote commented Apr 19, 2022

rust-timer commented Apr 19, 2022

bors commented Apr 19, 2022

This comment has been minimized.

bors commented Apr 19, 2022

rust-timer commented Apr 19, 2022

rust-timer commented Apr 19, 2022

nnethercote commented Apr 19, 2022

nnethercote commented Apr 19, 2022

petrochenkov Apr 20, 2022

nnethercote commented Apr 21, 2022

petrochenkov commented Apr 21, 2022

bors commented Apr 21, 2022

bors commented Apr 21, 2022

rust-log-analyzer commented Apr 21, 2022

bors commented Apr 21, 2022

lqd commented Apr 21, 2022

bors commented Apr 21, 2022

bors commented Apr 21, 2022

rust-timer commented Apr 21, 2022

Speed up TokenCursor #96210

Speed up TokenCursor #96210

Conversation

nnethercote commented Apr 19, 2022

nnethercote commented Apr 19, 2022

nnethercote commented Apr 19, 2022

rust-timer commented Apr 19, 2022

bors commented Apr 19, 2022

This comment has been minimized.

bors commented Apr 19, 2022

rust-timer commented Apr 19, 2022

rust-timer commented Apr 19, 2022

Footnotes

nnethercote commented Apr 19, 2022

nnethercote commented Apr 19, 2022

petrochenkov Apr 20, 2022

Choose a reason for hiding this comment

nnethercote commented Apr 21, 2022

petrochenkov commented Apr 21, 2022

bors commented Apr 21, 2022

bors commented Apr 21, 2022

rust-log-analyzer commented Apr 21, 2022

bors commented Apr 21, 2022

lqd commented Apr 21, 2022

bors commented Apr 21, 2022

bors commented Apr 21, 2022

rust-timer commented Apr 21, 2022

Footnotes

Speed up `TokenCursor` #96210

Speed up `TokenCursor` #96210