-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce DynSend
and DynSync
auto trait for parallel compiler
#107586
Conversation
@@ -952,6 +958,151 @@ fn analysis(tcx: TyCtxt<'_>, (): ()) -> Result<()> { | |||
Ok(()) | |||
} | |||
|
|||
fn non_par_analysis(tcx: TyCtxt<'_>) -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code duplication is really unfortunate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right. Haven't thought of an elegant way to write it yet, but I'll fix that soon
// runtime whether these non-shared data structures actually exist. | ||
unsafe impl<'tcx> DynSendSyncCheck for TyCtxt<'tcx> { | ||
#[inline] | ||
fn check_send_sync(&self) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use let GlobalCtxt { a, b, c } = self
for exhaustiveness checking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, it makes sense!
// Only set by the `-Z threads` compile option | ||
pub unsafe fn set_parallel() { | ||
let p = SyncUnsafeCell::raw_get(&PARALLEL as *const _); | ||
*p = true; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all, it would be great to have a doc comment here, especially given that this is an unsafe
function. Second of all, at first glance it seems like this can be more simply written as *PARALLEL.get() = true
, am I missing something? Lastly, is is_parallel
hot? Can we use an AtomicUsize
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if it's a little hot, it's unlikely that an atomic integer will have a performance impact, since this is just reading from it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review! PARALLEL
will only be set once, so I want to take advantage of this to minimize the cost of reading it. with_context_opt
might be hot, but I doubt the necessary to check thread safety here. Except this I think is_parallel()
is not hot, since it is only used in relatively top-level logic to determine whether to switch to a parallel environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should first benchmark it before going for the more unsafe variant. Atomics have no to minimal overhead depending on the exact use and ordering (which I think can be relaxed here because we don't need to sync any other writes?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I changed to AtomicBool instead. Can you help run a perf?Thanks!
I think we can just use Relaxed, yea
fc5e931
to
4086eba
Compare
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
⌛ Trying commit 4086ebae9c6b917e51b0f314c18a3dfd032d0e14 with merge 62ba597a41741055fcf131dcee8b691cc9445515... |
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (62ba597a41741055fcf131dcee8b691cc9445515): comparison URL. Overall result: ❌ regressions - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
|
I think there may be two reasons for regression : |
I've been thinking over how to best approach specialization. I think the dynamic dispatch entry point to It seems like a good idea to land locks with a runtime switch first so there is an optimized baseline to compare with specialization. I suggest finishing my branch by extracting just the lock implementation and moving it to a new |
In my local test, Also as I guessed, |
Can we run another perf? Thanks! |
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
⌛ Trying commit b2d7910a011fe52c5ab434c695a5caf1a5602725 with merge 2949dcde96d9502e79a5af27f252db8c97e8533e... |
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
25b4906
to
d8ca8e6
Compare
☔ The latest upstream changes (presumably #110243) made this pull request unmergeable. Please resolve the merge conflicts. |
@SparrowLii if I read the comment thread correctly @cjgillot should approve this PR after one last rebase? Thanks! @rustbot author |
@cjgillot Can it be merged now? : ) I don't have privileges so I need your help |
@bors r+ rollup=never |
☀️ Test successful - checks-actions |
Finished benchmarking commit (dd8ec9c): comparison URL. Overall result: no relevant changes - no action needed@rustbot label: -perf-regression Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesThis benchmark run did not return any relevant results for this metric. Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 660.401s -> 660.339s (-0.01%) |
part of parallel-rustc #101566
This PR introduces
DynSend / DynSync
trait andFromDyn / IntoDyn
structure in rustc_data_structure::marker.FromDyn
can dynamically check data structures for thread safety when switching to parallel environments (such as callingpar_for_each_in
). This happens only when-Z threads > 1
so it doesn't affect single-threaded mode's compile efficiency.r? @cjgillot