Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proc_macro/bridge: cache static spans in proc_macro's client thread-local state #98187

Merged
merged 3 commits into from
Jun 27, 2022

Conversation

mystor
Copy link
Contributor

@mystor mystor commented Jun 17, 2022

This is the second part of #86822, split off as requested in #86822 (review). This patch removes the RPC calls required for the very common operations of Span::call_site(), Span::def_site() and Span::mixed_site().

Some notes:

This part is one of the ones I don't love as a final solution from a design standpoint, because I don't like how the spans are serialized immediately at macro invocation. I think a more elegant solution might've been to reserve special IDs for call_site, def_site, and mixed_site at compile time (either starting at 1 or from u32::MAX) and making reading a Span handle automatically map these IDs to the relevant values, rather than doing extra serialization.

This would also have an advantage for potential future work to allow proc_macro to operate more independently from the compiler (e.g. to reduce the necessity of proc-macro2), as methods like Span::call_site() could be made to function without access to the compiler backend.

That was unfortunately tricky to do at the time, as this was the first part I wrote of the patches. After the later part (#98188, #98189), the other uses of InternedStore are removed meaning that a custom serialization strategy for Span is easier to implement.

If we want to go that path, we'll still need the majority of the work to split the bridge object and introduce the Context trait for free methods, and it will be easier to do after Span is the only user of InternedStore (after #98189).

@rustbot rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Jun 17, 2022
@rust-highfive

This comment was marked as off-topic.

@rust-highfive

This comment was marked as outdated.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jun 17, 2022
@mystor
Copy link
Contributor Author

mystor commented Jun 17, 2022

r? @eddyb

@mystor

This comment was marked as duplicate.

@mystor mystor changed the title Cache static spans in proc_macro's client thread-local state proc_macro/bridge: cache static spans in proc_macro's client thread-local state Jun 18, 2022
@eddyb
Copy link
Member

eddyb commented Jun 18, 2022

I think a more elegant solution might've been to reserve special IDs for call_site, def_site, and mixed_site at compile time (either starting at 1 or from u32::MAX) and making reading a Span handle automatically map these IDs to the relevant values, rather than doing extra serialization.

That sounds nice, but it ends up with Spans that can be stored in TLS by the proc macro code, and they would still work across invocations (while changing meaning).

Despite looking "global", they're effectively inputs to the proc macro invocation (except Span::def_site perhaps, but that one depends on what proc macro it's called from so it's not really constant either).

But frankly a lot of these concerns would probably go away once we have thread isolation as a bare minimum, and we can start relying on !Send/!Sync for "constants" (though that wouldn't help at all with proc_macro API usage outside of invocations, and we would still probably want to check that the bridge is active and expanding an invocation).

@eddyb
Copy link
Member

eddyb commented Jun 18, 2022

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 18, 2022
@bors
Copy link
Contributor

bors commented Jun 18, 2022

⌛ Trying commit 6c67f256af135fa768c9a22e9b462de4d79aaf1c with merge ffc980468071ac4d6aec0110094de58df965751e...

@bors
Copy link
Contributor

bors commented Jun 18, 2022

☀️ Try build successful - checks-actions
Build commit: ffc980468071ac4d6aec0110094de58df965751e (ffc980468071ac4d6aec0110094de58df965751e)

@rust-timer
Copy link
Collaborator

Queued ffc980468071ac4d6aec0110094de58df965751e with parent 2cec687, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (ffc980468071ac4d6aec0110094de58df965751e): comparison url.

Instruction count

  • Primary benchmarks: 🎉 relevant improvements found
  • Secondary benchmarks: no relevant changes found
mean1 max count2
Regressions 😿
(primary)
N/A N/A 0
Regressions 😿
(secondary)
N/A N/A 0
Improvements 🎉
(primary)
-0.6% -2.1% 11
Improvements 🎉
(secondary)
N/A N/A 0
All 😿🎉 (primary) -0.6% -2.1% 11

Max RSS (memory usage)

Results
  • Primary benchmarks: 🎉 relevant improvement found
  • Secondary benchmarks: mixed results
mean1 max count2
Regressions 😿
(primary)
N/A N/A 0
Regressions 😿
(secondary)
2.3% 2.3% 1
Improvements 🎉
(primary)
-0.9% -0.9% 1
Improvements 🎉
(secondary)
-1.6% -1.6% 1
All 😿🎉 (primary) -0.9% -0.9% 1

Cycles

Results
  • Primary benchmarks: mixed results
  • Secondary benchmarks: 😿 relevant regression found
mean1 max count2
Regressions 😿
(primary)
2.1% 2.1% 2
Regressions 😿
(secondary)
4.7% 4.7% 1
Improvements 🎉
(primary)
-2.3% -2.3% 1
Improvements 🎉
(secondary)
N/A N/A 0
All 😿🎉 (primary) 0.6% -2.3% 3

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf -perf-regression

Footnotes

  1. the arithmetic mean of the percent change 2 3

  2. number of relevant changes 2 3

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 18, 2022
Comment on lines 468 to 475
/// Context provided alongside the initial inputs for a macro expansion.
/// Provides values such as spans which are used frequently to avoid RPC.
#[derive(Clone)]
struct ExpnContext<S> {
def_site: S,
call_site: S,
mixed_site: S,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking about this a bunch and something like this might be good:

struct ExpandRequest<Span, I> {
    def_site: Span,
    call_site: Span,
    mixed_site: Span,
    input: I,
}

and then ExpandRequest<S::Span, S::TokenStream> and ExpandRequest<S::Span, (S::TokenStream, S::TokenStream)> would be used for the two kinds of proc macros.

And if that type is actually used in the API, the Context trait wouldn't be needed.

I'm not super happy with the names, but this kind of "request" type would fit with some other ideas around interacting with a proc macro crate as a whole with S->C "requests" (at that point the client/server terminology sadly starts getting in the way, but I hope it's clear what I'm referring to).

Copy link
Contributor Author

@mystor mystor Jun 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been planning to use the Context trait more in part 4 (#98189). It's being used for handling the serialization/deserialization of Symbol types, as well as to provide a free function for validating Idents without requiring the proc-macro crate to depend on unicode_xid and unicode-normalization (relevant diff). I could introduce a similar trait in part 4 for that purpose though, and move the spans off of it in this part.

One downside of moving this directly into the API is that it will move more things out of the shared proc macro server and into the 3 seperate proc macro wrapper implementations (1, 2, 3), as we'd need to do the dance to get the spans there, or continue to compute them within the server type, then pull them out to pass alongside, though we can probably get away with that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this a bit async, and agreed to move the methods to the Server trait instead of the Context trait. We'll revisit changing the shape of the Client API to make context passing as a parameter more explicit in the future, but it'll be a bit tricky to do in this patch.

This greatly improves the performance of the very frequently called
`call_site()` macro when running in a cross-thread configuration.
Copy link
Member

@eddyb eddyb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(if anyone is confused by the changes to the PR, @mystor and I have discussed some bits of it out-of-band)

@eddyb
Copy link
Member

eddyb commented Jun 26, 2022

@bors r+

@bors
Copy link
Contributor

bors commented Jun 26, 2022

📌 Commit e32ee19 has been approved by eddyb

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 26, 2022
@bors
Copy link
Contributor

bors commented Jun 26, 2022

⌛ Testing commit e32ee19 with merge 3b0d481...

@bors
Copy link
Contributor

bors commented Jun 27, 2022

☀️ Test successful - checks-actions
Approved by: eddyb
Pushing 3b0d481 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Jun 27, 2022
@bors bors merged commit 3b0d481 into rust-lang:master Jun 27, 2022
@rustbot rustbot added this to the 1.64.0 milestone Jun 27, 2022
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (3b0d481): comparison url.

Instruction count

  • Primary benchmarks: 🎉 relevant improvements found
  • Secondary benchmarks: no relevant changes found
mean1 max count2
Regressions 😿
(primary)
N/A N/A 0
Regressions 😿
(secondary)
N/A N/A 0
Improvements 🎉
(primary)
-0.4% -0.7% 12
Improvements 🎉
(secondary)
N/A N/A 0
All 😿🎉 (primary) -0.4% -0.7% 12

Max RSS (memory usage)

Results
  • Primary benchmarks: 😿 relevant regressions found
  • Secondary benchmarks: mixed results
mean1 max count2
Regressions 😿
(primary)
3.5% 4.1% 2
Regressions 😿
(secondary)
3.2% 3.8% 2
Improvements 🎉
(primary)
N/A N/A 0
Improvements 🎉
(secondary)
-2.2% -2.2% 1
All 😿🎉 (primary) 3.5% 4.1% 2

Cycles

Results
  • Primary benchmarks: 🎉 relevant improvements found
  • Secondary benchmarks: mixed results
mean1 max count2
Regressions 😿
(primary)
N/A N/A 0
Regressions 😿
(secondary)
3.3% 3.4% 2
Improvements 🎉
(primary)
-1.8% -2.2% 2
Improvements 🎉
(secondary)
-3.5% -3.5% 1
All 😿🎉 (primary) -1.8% -2.2% 2

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

@rustbot label: -perf-regression

Footnotes

  1. the arithmetic mean of the percent change 2 3

  2. number of relevant changes 2 3

mystor added a commit to mystor/rust that referenced this pull request Jul 29, 2022
…idge

This is done by having the crossbeam dependency inserted into the
proc_macro server code from the server side, to avoid adding a
dependency to proc_macro.

In addition, this introduces a -Z command-line option which will switch
rustc to run proc-macros using this cross-thread executor. With the
changes to the bridge in rust-lang#98186, rust-lang#98187, rust-lang#98188 and rust-lang#98189, the
performance of the executor should be much closer to same-thread
execution.

In local testing, the crossbeam executor was substantially more
performant than either of the two existing CrossThread strategies, so
they have been removed to keep things simple.
bors added a commit to rust-lang-ci/rust that referenced this pull request Jul 30, 2022
proc_macro: use crossbeam channels for the proc_macro cross-thread bridge

This is done by having the crossbeam dependency inserted into the `proc_macro` server code from the server side, to avoid adding a dependency to `proc_macro`.

In addition, this introduces a -Z command-line option which will switch rustc to run proc-macros using this cross-thread executor. With the changes to the bridge in rust-lang#98186, rust-lang#98187, rust-lang#98188 and rust-lang#98189, the performance of the executor should be much closer to same-thread execution.

In local testing, the crossbeam executor was substantially more performant than either of the two existing `CrossThread` strategies, so they have been removed to keep things simple.

r? `@eddyb`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants