-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gvn: Promote/propagate const local array #126444
Conversation
Some changes occurred to MIR optimizations cc @rust-lang/wg-mir-opt Some changes occurred to the CTFE / Miri engine cc @rust-lang/miri |
This comment has been minimized.
This comment has been minimized.
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
[WIP] gvn: Promote/propagate const local array Rewriting of rust-lang#125916 which used PromoteTemps pass. Fix rust-lang#73825 ### Current status - [ ] Waiting for [consensus](https://rust-lang.zulipchat.com/#narrow/stream/136281-t-opsem/topic/Could.20const.20read-only.20arrays.20be.20const.20promoted.3F). r? ghost
This comment has been minimized.
This comment has been minimized.
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (e26c0b3): comparison URL. Overall result: no relevant changes - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)Results (primary 3.1%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (primary -4.0%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResults (primary -0.1%, secondary 0.0%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 671.518s -> 674.73s (0.48%) |
This comment has been minimized.
This comment has been minimized.
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
[WIP] gvn: Promote/propagate const local array Rewriting of rust-lang#125916 which used PromoteTemps pass. Fix rust-lang#73825 ### Current status - [ ] Waiting for [consensus](https://rust-lang.zulipchat.com/#narrow/stream/136281-t-opsem/topic/Could.20const.20read-only.20arrays.20be.20const.20promoted.3F). r? ghost
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (7e160d4): comparison URL. Overall result: no relevant changes - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)Results (primary 1.7%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (secondary 9.0%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResults (primary -0.1%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 670.938s -> 673.147s (0.33%) |
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
gvn: Promote/propagate const local array Rewriting of rust-lang#125916 which used `PromoteTemps` pass. This allows promoting constant local arrays as anonymous constants. So that's in codegen for a local array, rustc outputs `llvm.memcpy` (which is easy for LLVM to optimize) instead of a series of `store` on stack (a.k.a in-place initialization). This makes rustc on par with clang on this specific case. See more in rust-lang#73825 or [zulip][opsem] for more info. [Here is a simple micro benchmark][bench] that shows the performance differences between promoting arrays or not. [Prior discussions on zulip][opsem]. This patch [saves about 600 KB][perf] (~0.5%) of `librustc_driver.so`. ![image](https://github.com/rust-lang/rust/assets/15225902/0e37559c-f5d9-4cdf-b7e3-a2956fd17bc1) Fix rust-lang#73825 r? cjgillot ### Unresolved questions - [ ] Should we ignore nested arrays? I think that promoting nested arrays is bloating codegen. - [ ] Should stack_threshold be at least 32 bytes? Like the benchmark showed. If yes, the test should be updated to make arrays larger than 32 bytes. - [x] ~Is this concerning that `call(move _1)` is now `call(const [array])`?~ It reverted back to `call(move _1)` [opsem]: https://rust-lang.zulipchat.com/#narrow/stream/136281-t-opsem/topic/Could.20const.20read-only.20arrays.20be.20const.20promoted.3F [bench]: rust-lang/rust-clippy#12854 (comment) [perf]: https://perf.rust-lang.org/compare.html?start=f9515fdd5aa132e27d9b580a35b27f4b453251c1&end=7e160d4b55bb5a27be0696f45db247ccc2e166d9&stat=size%3Alinked_artifact&tab=artifact-size
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (ac2e9cd): comparison URL. Overall result: ✅ improvements - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)Results (primary -1.2%, secondary -0.7%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (secondary -2.3%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResults (primary -0.1%, secondary -0.0%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 705.571s -> 705.908s (0.05%) |
This comment has been minimized.
This comment has been minimized.
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
gvn: Promote/propagate const local array Rewriting of rust-lang#125916 which used `PromoteTemps` pass. This allows promoting constant local arrays as anonymous constants. So that's in codegen for a local array, rustc outputs `llvm.memcpy` (which is easy for LLVM to optimize) instead of a series of `store` on stack (a.k.a in-place initialization). This makes rustc on par with clang on this specific case. See more in rust-lang#73825 or [zulip][opsem] for more info. [Here is a simple micro benchmark][bench] that shows the performance differences between promoting arrays or not. [Prior discussions on zulip][opsem]. This patch [saves about 600 KB][perf] (~0.5%) of `librustc_driver.so`. ![image](https://github.com/rust-lang/rust/assets/15225902/0e37559c-f5d9-4cdf-b7e3-a2956fd17bc1) Fix rust-lang#73825 r? cjgillot ### Unresolved questions - [ ] Should we ignore nested arrays? I think that promoting nested arrays is bloating codegen. - [ ] Should stack_threshold be at least 32 bytes? Like the benchmark showed. If yes, the test should be updated to make arrays larger than 32 bytes. - [x] ~Is this concerning that `call(move _1)` is now `call(const [array])`?~ It reverted back to `call(move _1)` [opsem]: https://rust-lang.zulipchat.com/#narrow/stream/136281-t-opsem/topic/Could.20const.20read-only.20arrays.20be.20const.20promoted.3F [bench]: rust-lang/rust-clippy#12854 (comment) [perf]: https://perf.rust-lang.org/compare.html?start=f9515fdd5aa132e27d9b580a35b27f4b453251c1&end=7e160d4b55bb5a27be0696f45db247ccc2e166d9&stat=size%3Alinked_artifact&tab=artifact-size
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (b6d6d25): comparison URL. Overall result: ✅ improvements - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)Results (primary -1.5%, secondary 2.1%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesThis benchmark run did not return any relevant results for this metric. Binary sizeResults (secondary -0.0%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 699.561s -> 699.867s (0.04%) |
From the last perf. run, it seems that there are no performance advantages to avoid LLVM de-duplicating arrays. |
@@ -418,9 +421,7 @@ impl<'body, 'tcx> VnState<'body, 'tcx> { | |||
self.ecx.copy_op(op, &field_dest).ok()?; | |||
} | |||
self.ecx.write_discriminant(variant.unwrap_or(FIRST_VARIANT), &dest).ok()?; | |||
self.ecx | |||
.alloc_mark_immutable(dest.ptr().provenance.unwrap().alloc_id()) | |||
.ok()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we stop marking as immutable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct me if I'm wrong but I think let dest = dest.map_provenance(|prov| prov.as_immutable());
in the line below could serve the same purpose.
Well! Nothing to do here! |
Rewriting of #125916 which used
PromoteTemps
pass.This allows promoting constant local arrays as anonymous constants. So that's in codegen for
a local array, rustc outputs
llvm.memcpy
(which is easy for LLVM to optimize) instead of a seriesof
store
on stack (a.k.a in-place initialization). This makes rustc on par with clang on this specific case.See more in #73825 or zulip for more info.
Here is a simple micro benchmark that shows the performance differences between promoting arrays or not.
Prior discussions on zulip.
This patch saves about -50.36 KiB (-0.038%) of
librustc_driver.so
.Fix #73825