-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up plug_leaks
#36917
Speed up plug_leaks
#36917
Conversation
(rust_highfive has picked a reviewer for you, use r? to override) |
BTW, I used Valgrind's "DHAT" profilter to find this. It told me that lots of malloc calls were happening in and beneath |
r? @eddyb |
debug!("plug_leaks: result={:?}", | ||
result); | ||
if skol_map.is_empty() { | ||
result = value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can use early return to avoid indenting the entire function to the right.
This commit avoids the `fold_regions` call in `plug_leaks` when `skol_map` is empty, which is the common case. This gives speed-ups of up to 1.14x on some of the rustc-benchmarks.
This commit avoids the `resolve_type_vars_if_possible` call in `plug_leaks` when `skol_map` is empty, which is the common case. It also changes the signature of `plug_leaks` slightly to avoid the need for a `clone` of `value`. These changes give speed-ups of up a few percent on some of the rustc-benchmarks.
6016e8c
to
3779971
Compare
@bors r+ |
📌 Commit 3779971 has been approved by |
We should also be doing this for |
|
leak-check done by #36931. |
…eddyb Speed up `plug_leaks` Profiling shows that `plug_leaks` and the functions it calls are hot on some benchmarks. It's very common that `skol_map` is empty in this function, and we can specialize `plug_leaks` in that case for some big speed-ups. The PR has two commits. I'm fairly confident that the first one is correct -- I traced through the code to confirm that the `fold_regions` and `pop_skolemized` calls are no-ops when `skol_map` is empty, and I also temporarily added an assertion to check that `result` ends up having the same value as `value` in that case. This commit is responsible for most of the improvement. I'm less confident about the second commit. The call to `resolve_type_vars_is_possible` can change `value` when `skol_map` is empty... but testing suggests that it doesn't matter if the call is omitted. So, please check both patches carefully, especially the second one! Here are the speed-ups for the first commit alone. stage1 compiler (built with old rustc, using glibc malloc), doing debug builds: ``` futures-rs-test 4.710s vs 4.538s --> 1.038x faster (variance: 1.009x, 1.005x) issue-32062-equ 0.415s vs 0.368s --> 1.129x faster (variance: 1.009x, 1.010x) issue-32278-big 1.884s vs 1.808s --> 1.042x faster (variance: 1.020x, 1.017x) jld-day15-parse 1.907s vs 1.668s --> 1.143x faster (variance: 1.011x, 1.007x) piston-image-0. 13.024s vs 12.421s --> 1.049x faster (variance: 1.004x, 1.012x) rust-encoding-0 3.335s vs 3.276s --> 1.018x faster (variance: 1.021x, 1.028x) ``` stage2 compiler (built with new rustc, using jemalloc), doing debug builds: ``` futures-rs-test 4.167s vs 4.065s --> 1.025x faster (variance: 1.006x, 1.018x) issue-32062-equ 0.383s vs 0.343s --> 1.118x faster (variance: 1.012x, 1.016x) issue-32278-big 1.680s vs 1.621s --> 1.036x faster (variance: 1.007x, 1.007x) jld-day15-parse 1.671s vs 1.478s --> 1.131x faster (variance: 1.016x, 1.004x) piston-image-0. 11.336s vs 10.852s --> 1.045x faster (variance: 1.003x, 1.006x) rust-encoding-0 3.036s vs 2.971s --> 1.022x faster (variance: 1.030x, 1.032x) ``` I've omitted the benchmarks for which the change was negligible. And here are the speed-ups for the first and second commit in combination. stage1 compiler (built with old rustc, using glibc malloc), doing debug builds: ``` futures-rs-test 4.684s vs 4.498s --> 1.041x faster (variance: 1.012x, 1.012x) issue-32062-equ 0.413s vs 0.355s --> 1.162x faster (variance: 1.019x, 1.006x) issue-32278-big 1.869s vs 1.763s --> 1.060x faster (variance: 1.013x, 1.018x) jld-day15-parse 1.900s vs 1.602s --> 1.186x faster (variance: 1.010x, 1.003x) piston-image-0. 12.907s vs 12.352s --> 1.045x faster (variance: 1.005x, 1.006x) rust-encoding-0 3.254s vs 3.248s --> 1.002x faster (variance: 1.063x, 1.045x) ``` stage2 compiler (built with new rustc, using jemalloc), doing debug builds: ``` futures-rs-test 4.183s vs 4.046s --> 1.034x faster (variance: 1.007x, 1.004x) issue-32062-equ 0.380s vs 0.340s --> 1.117x faster (variance: 1.020x, 1.003x) issue-32278-big 1.671s vs 1.616s --> 1.034x faster (variance: 1.031x, 1.012x) jld-day15-parse 1.661s vs 1.417s --> 1.172x faster (variance: 1.013x, 1.005x) piston-image-0. 11.347s vs 10.841s --> 1.047x faster (variance: 1.007x, 1.010x) rust-encoding-0 3.050s vs 3.000s --> 1.017x faster (variance: 1.016x, 1.012x) ``` @eddyb: `git blame` suggests that you should review this. Thanks!
How do you run the rustc benchmarks? I've been measuring bootstrap times. My patch short-cuts the fast path to |
@arielb1: I updated my local repo to get #36931 and the speed-ups from my local patch have now evaporated, so I guess its effect overlapped with the effect of your commit. My patch was simpler, though, adding just a As for running the benchmarks, I compared two stage1 compilers that were both configured with |
It should be functionally equivalent. Is it performance-equivalent (my patch skips a bunch of unneeded setup code)? [checking] |
The |
Profiling shows that
plug_leaks
and the functions it calls are hot on some benchmarks. It's very common thatskol_map
is empty in this function, and we can specializeplug_leaks
in that case for some big speed-ups.The PR has two commits. I'm fairly confident that the first one is correct -- I traced through the code to confirm that the
fold_regions
andpop_skolemized
calls are no-ops whenskol_map
is empty, and I also temporarily added an assertion to check thatresult
ends up having the same value asvalue
in that case. This commit is responsible for most of the improvement.I'm less confident about the second commit. The call to
resolve_type_vars_is_possible
can changevalue
whenskol_map
is empty... but testing suggests that it doesn't matter if the call isomitted.
So, please check both patches carefully, especially the second one!
Here are the speed-ups for the first commit alone.
stage1 compiler (built with old rustc, using glibc malloc), doing debug builds:
stage2 compiler (built with new rustc, using jemalloc), doing debug builds:
I've omitted the benchmarks for which the change was negligible.
And here are the speed-ups for the first and second commit in combination.
stage1 compiler (built with old rustc, using glibc malloc), doing debug
builds:
stage2 compiler (built with new rustc, using jemalloc), doing debug builds:
@eddyb:
git blame
suggests that you should review this. Thanks!