A new matcher representation for use in `parse_tt` #95555

nnethercote · 2022-04-01T06:54:22Z

By transforming the matcher into a different form, parse_tt can run faster and be easier to understand.

`parse_tt` currently traverses a `&[TokenTree]` to do matching. But this is a bad representation for the traversal. - `TokenTree` is nested, and there's a bunch of expensive and fiddly state required to handle entering and exiting nested submatchers. - There are three positions (sequence separators, sequence Kleene ops, and end of the matcher) that are represented by an index that exceeds the end of the `&[TokenTree]`, which is clumsy and error-prone. This commit introduces a new representation called `MatcherLoc` that is designed specifically for matching. It fixes all the above problems, making the code much easier to read. A `&[TokenTree]` is converted to a `&[MatcherLoc]` before matching begins. Despite the cost of the conversion, it's still a net performance win, because various pieces of traversal state are computed once up-front, rather than having to be recomputed repeatedly during the macro matching. Some improvements worth noting. - `parse_tt_inner` is *much* easier to read. No more having to compare `idx` against `len` and read comments to understand what the result means. - The handling of `Delimited` in `parse_tt_inner` is now trivial. - The three end-of-sequence cases in `parse_tt_inner` are now handled in three separate match arms, and the control flow is much simpler. - `nameize` is no longer recursive. - There were two places that issued "missing fragment specifier" errors: one in `parse_tt_inner()`, and one in `nameize()`. Presumably the latter was never executed. There's now a single place issuing these errors, in `compute_locs()`. - The number of heap allocations done for a `check full` build of `async-std-1.10.0` (an extreme example of heavy macro use) drops from 11.8M to 2.6M, and most of these occur outside of macro matching. - The size of `MatcherPos` drops from 64 bytes to 16 bytes. Small enough that it no longer needs boxing, which partly accounts for the reduction in allocations. - The rest of the drop in allocations is due to the removal of `MatcherKind`, because we no longer need to record anything for the parent matcher when entering a submatcher. - Overall it reduces code size by 45 lines.

To match the order the variants are declared in.

nnethercote · 2022-04-04T07:23:32Z

r? @petrochenkov

Best reviewed one commit at a time. Apologies for the first commit being quite large; the change is big and invasive enough that there wasn't really a good way to break it into smaller pieces.

nnethercote · 2022-04-04T07:29:18Z

Here are some instruction count results for check full builds. Those marked with * are from the rustc-perf suite.

Benchmark	Profile	Scenario	% Change	Significance Factor?
async-std-1.10.0	check	full	-9.94%	49.68x
time-macros-0.2.3	check	full	-7.60%	38.01x
quote-stress	check	full	-7.17%	35.84x
yansi-0.5.0	check	full	-5.89%	29.43x
inotify-0.10.0	check	full	-3.52%	17.58x
*deep-vector	check	full	-1.90%	9.51x
*html5ever-0.26.0	check	full	-1.71%	8.56x
*token-stream-stress	check	full	-1.36%	6.80x
pest_generator-2.1.3	check	full	-0.41%	2.05x
futures-macro-0.3.19	check	full	-0.37%	1.84x
num-derive-0.3.3	check	full	-0.35%	1.76x
libc-0.2.121	check	full	-0.41%	2.04x
ctor-0.1.21	check	full	-0.35%	1.73x
wast-39.0.0	check	full	-0.33%	1.64x
syn-1.0.89	check	full	-0.32%	1.61x
mockall_derive-0.11.0	check	full	-0.30%	1.52x

@bors try @rust-timer queue

rust-timer · 2022-04-04T07:29:19Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2022-04-04T07:29:25Z

⌛ Trying commit 0bd47e8 with merge 6ac9d4cf75f3aa6be784c21636d0f83c9f8d7302...

bors · 2022-04-04T08:58:44Z

☀️ Try build successful - checks-actions
Build commit: 6ac9d4cf75f3aa6be784c21636d0f83c9f8d7302 (6ac9d4cf75f3aa6be784c21636d0f83c9f8d7302)

rust-timer · 2022-04-04T08:58:45Z

Queued 6ac9d4cf75f3aa6be784c21636d0f83c9f8d7302 with parent ec667fb, future comparison URL.

rust-timer · 2022-04-04T10:11:28Z

Finished benchmarking commit (6ac9d4cf75f3aa6be784c21636d0f83c9f8d7302): comparison url.

Summary:

Primary benchmarks: mixed results
Secondary benchmarks: mixed results

	Regressions 😿 (primary)	Regressions 😿 (secondary)	Improvements 🎉 (primary)	Improvements 🎉 (secondary)	All 😿 🎉 (primary)
count¹	4	10	19	11	23
mean²	0.3%	1.0%	-1.0%	-2.6%	-0.7%
max	0.4%	2.8%	-2.0%	-4.6%	0.4%

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

number of relevant changes ↩
the arithmetic mean of the percent change ↩

petrochenkov · 2022-04-04T17:10:57Z

@bors r+

bors · 2022-04-04T17:10:59Z

📌 Commit 0bd47e8 has been approved by petrochenkov

bors · 2022-04-04T17:27:51Z

⌛ Testing commit 0bd47e8 with merge 6a9080b...

bors · 2022-04-04T19:51:44Z

☀️ Test successful - checks-actions
Approved by: petrochenkov
Pushing 6a9080b to master...

rust-timer · 2022-04-04T21:05:57Z

Finished benchmarking commit (6a9080b): comparison url.

Summary:

Primary benchmarks: 🎉 relevant improvements found
Secondary benchmarks: mixed results

	Regressions 😿 (primary)	Regressions 😿 (secondary)	Improvements 🎉 (primary)	Improvements 🎉 (secondary)	All 😿 🎉 (primary)
count¹	3	10	18	11	21
mean²	0.4%	1.0%	-1.0%	-2.4%	-0.8%
max	0.4%	2.8%	-2.0%	-4.6%	-2.0%

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

@rustbot label: -perf-regression

number of relevant changes ↩
the arithmetic mean of the percent change ↩

nnethercote · 2022-04-04T22:21:58Z

I looked at the regressions for wg-grammar. It's due to the conversion in compute_locs. In particular, wg-grammar has a very large machine-generated macro. It's squeezed onto a single line without any spaces, and is over 12,000 characters, with 84 rules. This is very much a worst-case scenario for compute_locs, and this comment applies.

Likewise for hyper-0.14.18, though the effect there is much smaller. There are a number of macros in that benchmark, I'm not sure which one(s) is responsible, but the Cachegrind diff shows that compute_locs accounts for much of the (very small) regression.

Overall I'm not worried because the speed benefits outweigh the regressions, but I'll still see if I can improve things some more, either by doing what the comment above suggests, or something else.

In rust-lang#95555 this was moved out of `parse_tt_inner` and `nameize` into `compute_locs`. But the next commit will be moving `compute_locs` outwards to a place that isn't suitable for the missing fragment identifier checking. So this reinstates the old checking.

…r-rule, r=petrochenkov Call `compute_locs` once per rule This fixes the small regressions on `wg-grammar` and `hyper-0.14.18` seen in rust-lang#95555. r? `@petrochenkov`

mark-i-m · 2022-04-07T15:16:55Z

This is a very nice code improvement! Thanks @nnethercote !

PR rust-lang#95159 unintentionally changed the behaviour of declarative macro matching for `NoDelim` delimited sequences. - Before rust-lang#95159, delimiters were implicit in `mbe::Delimited`. When doing macro matching, normal delimiters were generated out of thin air as necessary, but `NoDelim` delimiters were not. This was done within `TokenTree::{get_tt,len}`. - After rust-lang#95159, the delimiters were explicit. There was an unintentional change whereby `NoDelim` delimiters were represented explicitly just like normal delimeters. - rust-lang#95555 introduced a new matcher representation (`MatcherLoc`) and the `NoDelim` delimiters were made explicit within it, just like `mbe::Delimited`. - rust-lang#95797 then changed `mbe::Delimited` back to having implicit delimiters, but because matching is now being done on `MatcherLoc`, the behavioural change persisted. The fix is simple: remove the explicit `NoDelim` delimiters in the `MatcherLoc` representation. This gives back the original behaviour. As for why this took so long to be found: it seems that `NoDelim` sequences are unusual. It took a macro that generates another macro (from the `yarte_lexer` crate, found via a crater run) to uncover this. Fixes rust-lang#96305.

Remove box syntax from rustc_mir_dataflow and rustc_mir_transform Continuation of rust-lang#87781, inspired by rust-lang#97239. The usages that this PR removes have not appeared from nothing, instead the usage in `rustc_mir_dataflow` and `rustc_mir_transform` was from rust-lang#80522 which split up `rustc_mir`, and which was filed before I filed rust-lang#87781, so it was using the state from before my PR. But it was merged after my PR was merged, so the `box_syntax` uses were able to survive here. Outside of this introduction due to the code being outside of the master branch at the point of merging of my PR, there was only one other introduction of box syntax, in rust-lang#95159. That box syntax was removed again though in rust-lang#95555. Outside of that, `box_syntax` has not made its reoccurrance in compiler crates.

rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Apr 1, 2022

nnethercote marked this pull request as draft April 1, 2022 06:54

nnethercote added 2 commits April 4, 2022 17:01

Reorder match arms in parse_tt_inner.

0bd47e8

To match the order the variants are declared in.

nnethercote force-pushed the parse_tt-new-representation branch from 04103c6 to 0bd47e8 Compare April 4, 2022 07:21

nnethercote marked this pull request as ready for review April 4, 2022 07:22

rust-highfive assigned petrochenkov Apr 4, 2022

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 4, 2022

rustbot added perf-regression Performance regression. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Apr 4, 2022

Kobzol mentioned this pull request Apr 4, 2022

Correctly calculate largest change independent of its direction in PR summary table rust-lang/rustc-perf#1272

Merged

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 4, 2022

bors added the merged-by-bors This PR was explicitly merged by bors. label Apr 4, 2022

bors merged commit 6a9080b into rust-lang:master Apr 4, 2022

rustbot added this to the 1.62.0 milestone Apr 4, 2022

nnethercote deleted the parse_tt-new-representation branch April 4, 2022 20:51

rustbot removed the perf-regression Performance regression. label Apr 4, 2022

nnethercote mentioned this pull request Apr 5, 2022

Call compute_locs once per rule #95669

Merged

JohnTitor added a commit to JohnTitor/rustc-dev-guide that referenced this pull request Apr 15, 2022

Update parser_tt docs following rust-lang/rust#95555

1b06477

JohnTitor added a commit to JohnTitor/rustc-dev-guide that referenced this pull request Apr 15, 2022

Update parser_tt docs following rust-lang/rust#95555

b8debde

nnethercote mentioned this pull request Apr 22, 2022

Fix macro matching for NoDelim delimited sequences. #96307

Closed

est31 mentioned this pull request May 22, 2022

Remove box syntax from rustc_mir_dataflow and rustc_mir_transform #97281

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A new matcher representation for use in `parse_tt` #95555

A new matcher representation for use in `parse_tt` #95555

nnethercote commented Apr 1, 2022 •

edited

Loading

nnethercote commented Apr 4, 2022

nnethercote commented Apr 4, 2022

rust-timer commented Apr 4, 2022

bors commented Apr 4, 2022

bors commented Apr 4, 2022

rust-timer commented Apr 4, 2022

rust-timer commented Apr 4, 2022

petrochenkov commented Apr 4, 2022

bors commented Apr 4, 2022

bors commented Apr 4, 2022

bors commented Apr 4, 2022

rust-timer commented Apr 4, 2022

nnethercote commented Apr 4, 2022

mark-i-m commented Apr 7, 2022

A new matcher representation for use in parse_tt #95555

A new matcher representation for use in parse_tt #95555

Conversation

nnethercote commented Apr 1, 2022 • edited Loading

nnethercote commented Apr 4, 2022

nnethercote commented Apr 4, 2022

rust-timer commented Apr 4, 2022

bors commented Apr 4, 2022

bors commented Apr 4, 2022

rust-timer commented Apr 4, 2022

rust-timer commented Apr 4, 2022

Footnotes

petrochenkov commented Apr 4, 2022

bors commented Apr 4, 2022

bors commented Apr 4, 2022

bors commented Apr 4, 2022

rust-timer commented Apr 4, 2022

Footnotes

nnethercote commented Apr 4, 2022

mark-i-m commented Apr 7, 2022

A new matcher representation for use in `parse_tt` #95555

A new matcher representation for use in `parse_tt` #95555

nnethercote commented Apr 1, 2022 •

edited

Loading