reuse allocation for always_storage_live_locals bitset #107927

klensy · 2023-02-11T15:44:15Z

This allows to reduce allocated block count about ~30% for ctfe-stress-5.

rustbot · 2023-02-11T15:44:20Z

r? @petrochenkov

(rustbot has picked a reviewer for you, use r? to override)

rustbot · 2023-02-11T15:44:23Z

Some changes occurred to MIR optimizations

cc @rust-lang/wg-mir-opt

Some changes occurred to the CTFE / Miri engine

cc @rust-lang/miri

klensy · 2023-02-11T15:44:47Z

compiler/rustc_const_eval/src/interpret/eval_context.rs

+
+    // reuse allocation for bit set
+    always_live_locals_cache: GrowableBitSet<mir::Local>,


Don't know good place for this, any suggestions?

This is probably the most global state you can add it to.

However note that this InterpCx is re-created per constant, so if there are many constants to evaluate, then each will still do their own thing. I am also concerned that we might keep large buffers around for longer than we have to.

RalfJung · 2023-02-11T15:57:42Z

compiler/rustc_mir_dataflow/src/storage.rs

@@ -1,11 +1,12 @@
-use rustc_index::bit_set::BitSet;
+use rustc_index::bit_set::GrowableBitSet;
 use rustc_middle::mir::{self, Local};

 /// The set of locals in a MIR body that do not have `StorageLive`/`StorageDead` annotations.
 ///
 /// These locals have fixed storage for the duration of the body.


The comment should explain the unconventional signature for this function, and what the caller is supposed to pass for the locals argument.

RalfJung · 2023-02-11T16:00:20Z

This allows to reduce allocated block count about ~30% for ctfe-stress-5.

Note that this is a very artificial stress test benchmark. We shouldn't optimize for it blindly. Is there evidence that this helps for regular code as well? Out of all the call sites for this function, it seems like only the CTFE interpreter actually benefits from the optimization.

@bors try @rust-timer queue

bors · 2023-02-11T16:00:28Z

⌛ Trying commit 9d71a6f with merge 8842f892720a42837501b09631b12efcaf14c904...

RalfJung · 2023-02-11T16:02:55Z

Cc @nnethercote for some compiler perf expertise

bors · 2023-02-11T18:12:33Z

☀️ Try build successful - checks-actions
Build commit: 8842f892720a42837501b09631b12efcaf14c904 (8842f892720a42837501b09631b12efcaf14c904)

bors · 2023-02-11T18:12:33Z

☀️ Try build successful - checks-actions
Build commit: 8842f892720a42837501b09631b12efcaf14c904 (8842f892720a42837501b09631b12efcaf14c904)

klensy · 2023-02-11T19:37:16Z

compiler/rustc_const_eval/src/interpret/eval_context.rs

@@ -705,13 +710,19 @@ impl<'mir, 'tcx: 'mir, M: Machine<'mir, 'tcx>> InterpCx<'mir, 'tcx, M> {
        let mut locals = IndexVec::from_elem(dummy, &body.local_decls);


This line is ~60% total allocated memory for ctfe-stress-5.

At first, i tried to cache it too, but it saved later at self.frame_mut().locals = locals;, so i don't see easy way to do that.

This line is ~60% total allocated memory for ctfe-stress-5.

Yeah that test involves a lot of CTFE stack variables.

But I don't think this is worth optimizing for. This is totally unrealistic for real code. We shouldn't make rustc harder to maintain just for the sake of meaningless benchmarks -- that is not the purpose of these stress tests.

rust-timer · 2023-02-11T20:37:48Z

Finished benchmarking commit (8842f892720a42837501b09631b12efcaf14c904): comparison URL.

Overall result: no relevant changes - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	2.4%	[2.4%, 2.4%]	1
Regressions ❌ (secondary)	2.3%	[1.4%, 3.6%]	3
Improvements ✅ (primary)	-2.7%	[-4.5%, -0.9%]	2
Improvements ✅ (secondary)	-1.8%	[-1.8%, -1.8%]	1
All ❌✅ (primary)	-1.0%	[-4.5%, 2.4%]	3

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-2.1%	[-2.1%, -2.1%]	2
All ❌✅ (primary)	-	-	0

RalfJung · 2023-02-11T21:57:31Z

Looks like we are not getting a perf benefit from this even on the stress test. So... looks like it doesn't actually help? Basically what you did is write a small allocator for these IndexSet (with a simple 1-element cache), and sure that means the underlying jemalloc gets called less often -- but that doesn't actually seem to translate into any benefit, the cache isn't any faster than jemalloc. Or am I missing something?

klensy · 2023-02-11T22:00:04Z

No, result's looks boring.

RalfJung · 2023-02-11T22:13:14Z

All right. Thanks for giving it a shot!

nnethercote · 2023-02-12T22:16:04Z

@RalfJung: is there a better benchmark we could be using for CTFE perf evaluation?

RalfJung · 2023-02-12T22:19:20Z

I am not sure what would be a benchmark that reflects real-world CTFE use. (The stress test is good to ensure that CTFE changes do not regress performance, as it will strongly overemphasize CTFE perf changes.)

Maybe someone else in @rust-lang/wg-const-eval has an idea.

oli-obk · 2023-02-13T10:45:54Z

We could check out some of the crates that are hitting the const eval limit (see #93481, #67217, and #103814 for any links to those issues)

reuse allocation used by always_storage_live_locals bitset

9d71a6f

rustbot assigned petrochenkov Feb 11, 2023

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Feb 11, 2023

klensy commented Feb 11, 2023

View reviewed changes

RalfJung reviewed Feb 11, 2023

View reviewed changes

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 11, 2023

This comment has been minimized.

Sign in to view

klensy commented Feb 11, 2023

View reviewed changes

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 11, 2023

klensy closed this Feb 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reuse allocation for always_storage_live_locals bitset #107927

reuse allocation for always_storage_live_locals bitset #107927

klensy commented Feb 11, 2023

rustbot commented Feb 11, 2023

rustbot commented Feb 11, 2023

klensy Feb 11, 2023

RalfJung Feb 11, 2023

RalfJung Feb 11, 2023

RalfJung commented Feb 11, 2023

This comment has been minimized.

bors commented Feb 11, 2023

RalfJung commented Feb 11, 2023

bors commented Feb 11, 2023

bors commented Feb 11, 2023

This comment has been minimized.

klensy Feb 11, 2023

RalfJung Feb 11, 2023

rust-timer commented Feb 11, 2023

RalfJung commented Feb 11, 2023 •

edited

Loading

klensy commented Feb 11, 2023

RalfJung commented Feb 11, 2023

nnethercote commented Feb 12, 2023

RalfJung commented Feb 12, 2023 •

edited

Loading

oli-obk commented Feb 13, 2023


		// reuse allocation for bit set
		always_live_locals_cache: GrowableBitSet<mir::Local>,

		@@ -705,13 +710,19 @@ impl<'mir, 'tcx: 'mir, M: Machine<'mir, 'tcx>> InterpCx<'mir, 'tcx, M> {
		let mut locals = IndexVec::from_elem(dummy, &body.local_decls);

reuse allocation for always_storage_live_locals bitset #107927

reuse allocation for always_storage_live_locals bitset #107927

Conversation

klensy commented Feb 11, 2023

rustbot commented Feb 11, 2023

rustbot commented Feb 11, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RalfJung commented Feb 11, 2023

This comment has been minimized.

bors commented Feb 11, 2023

RalfJung commented Feb 11, 2023

bors commented Feb 11, 2023

bors commented Feb 11, 2023

This comment has been minimized.

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rust-timer commented Feb 11, 2023

Overall result: no relevant changes - no action needed

RalfJung commented Feb 11, 2023 • edited Loading

klensy commented Feb 11, 2023

RalfJung commented Feb 11, 2023

nnethercote commented Feb 12, 2023

RalfJung commented Feb 12, 2023 • edited Loading

oli-obk commented Feb 13, 2023

RalfJung commented Feb 11, 2023 •

edited

Loading

RalfJung commented Feb 12, 2023 •

edited

Loading