Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reuse allocation for always_storage_live_locals bitset #107927

Closed
wants to merge 1 commit into from

Conversation

klensy
Copy link
Contributor

@klensy klensy commented Feb 11, 2023

This allows to reduce allocated block count about ~30% for ctfe-stress-5.

@rustbot
Copy link
Collaborator

rustbot commented Feb 11, 2023

r? @petrochenkov

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Feb 11, 2023
@rustbot
Copy link
Collaborator

rustbot commented Feb 11, 2023

Some changes occurred to MIR optimizations

cc @rust-lang/wg-mir-opt

Some changes occurred to the CTFE / Miri engine

cc @rust-lang/miri

Comment on lines +50 to +52

// reuse allocation for bit set
always_live_locals_cache: GrowableBitSet<mir::Local>,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't know good place for this, any suggestions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably the most global state you can add it to.

However note that this InterpCx is re-created per constant, so if there are many constants to evaluate, then each will still do their own thing. I am also concerned that we might keep large buffers around for longer than we have to.

@@ -1,11 +1,12 @@
use rustc_index::bit_set::BitSet;
use rustc_index::bit_set::GrowableBitSet;
use rustc_middle::mir::{self, Local};

/// The set of locals in a MIR body that do not have `StorageLive`/`StorageDead` annotations.
///
/// These locals have fixed storage for the duration of the body.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment should explain the unconventional signature for this function, and what the caller is supposed to pass for the locals argument.

@RalfJung
Copy link
Member

This allows to reduce allocated block count about ~30% for ctfe-stress-5.

Note that this is a very artificial stress test benchmark. We shouldn't optimize for it blindly. Is there evidence that this helps for regular code as well? Out of all the call sites for this function, it seems like only the CTFE interpreter actually benefits from the optimization.

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 11, 2023
@bors
Copy link
Contributor

bors commented Feb 11, 2023

⌛ Trying commit 9d71a6f with merge 8842f892720a42837501b09631b12efcaf14c904...

@RalfJung
Copy link
Member

Cc @nnethercote for some compiler perf expertise

@bors
Copy link
Contributor

bors commented Feb 11, 2023

☀️ Try build successful - checks-actions
Build commit: 8842f892720a42837501b09631b12efcaf14c904 (8842f892720a42837501b09631b12efcaf14c904)

1 similar comment
@bors
Copy link
Contributor

bors commented Feb 11, 2023

☀️ Try build successful - checks-actions
Build commit: 8842f892720a42837501b09631b12efcaf14c904 (8842f892720a42837501b09631b12efcaf14c904)

@rust-timer

This comment has been minimized.

@@ -705,13 +710,19 @@ impl<'mir, 'tcx: 'mir, M: Machine<'mir, 'tcx>> InterpCx<'mir, 'tcx, M> {
let mut locals = IndexVec::from_elem(dummy, &body.local_decls);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is ~60% total allocated memory for ctfe-stress-5.

At first, i tried to cache it too, but it saved later at self.frame_mut().locals = locals;, so i don't see easy way to do that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is ~60% total allocated memory for ctfe-stress-5.

Yeah that test involves a lot of CTFE stack variables.

But I don't think this is worth optimizing for. This is totally unrealistic for real code. We shouldn't make rustc harder to maintain just for the sake of meaningless benchmarks -- that is not the purpose of these stress tests.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (8842f892720a42837501b09631b12efcaf14c904): comparison URL.

Overall result: no relevant changes - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.4% [2.4%, 2.4%] 1
Regressions ❌
(secondary)
2.3% [1.4%, 3.6%] 3
Improvements ✅
(primary)
-2.7% [-4.5%, -0.9%] 2
Improvements ✅
(secondary)
-1.8% [-1.8%, -1.8%] 1
All ❌✅ (primary) -1.0% [-4.5%, 2.4%] 3

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-2.1% [-2.1%, -2.1%] 2
All ❌✅ (primary) - - 0

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 11, 2023
@RalfJung
Copy link
Member

RalfJung commented Feb 11, 2023

Looks like we are not getting a perf benefit from this even on the stress test. So... looks like it doesn't actually help? Basically what you did is write a small allocator for these IndexSet (with a simple 1-element cache), and sure that means the underlying jemalloc gets called less often -- but that doesn't actually seem to translate into any benefit, the cache isn't any faster than jemalloc. Or am I missing something?

@klensy
Copy link
Contributor Author

klensy commented Feb 11, 2023

No, result's looks boring.

@klensy klensy closed this Feb 11, 2023
@RalfJung
Copy link
Member

All right. Thanks for giving it a shot!

@nnethercote
Copy link
Contributor

@RalfJung: is there a better benchmark we could be using for CTFE perf evaluation?

@RalfJung
Copy link
Member

RalfJung commented Feb 12, 2023

I am not sure what would be a benchmark that reflects real-world CTFE use. (The stress test is good to ensure that CTFE changes do not regress performance, as it will strongly overemphasize CTFE perf changes.)

Maybe someone else in @rust-lang/wg-const-eval has an idea.

@oli-obk
Copy link
Contributor

oli-obk commented Feb 13, 2023

We could check out some of the crates that are hitting the const eval limit (see #93481, #67217, and #103814 for any links to those issues)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants