interpret: Unify projections for MPlaceTy, PlaceTy, OpTy #114011

RalfJung · 2023-07-24T13:21:44Z

For ~forever, we didn't really have proper shared code for handling projections into those three types. This is mostly because PlaceTy projections require &mut self: they might have to force_allocate to be able to represent a project part-way into a local.

This PR finally fixes that, by enhancing Place::Local with an offset so that such an optimized place can point into a part of a place without having requiring an in-memory representation. If we later write to that place, we will still do force_allocate -- for now we don't have an optimized path in write_immediate that would avoid allocation for partial overwrites of immediately stored locals. But in write_immediate we have &mut self so at least this no longer pollutes all our type signatures.

(Ironically, I seem to distantly remember that many years ago, Place::Local did have an offset, and I removed it to simplify things. I guess I didn't realize why it was so useful... I am also not sure if this was actually used to achieve place projection on &self back then.)

The offset had type Option<Size>, where None represent "no projection was applied". This is needed because locals can be unsized (when they are arguments) but Place::Local cannot store metadata: if the offset is None, this refers to the entire local, so we can use the metadata of the local itself (which must be indirect); if a projection gets applied, since the local is indirect, it will turn into a Place::Ptr. (Note that even for indirect locals we can have Place::Local: when the local appears in MIR, we always start with Place::Local, and only check frame.locals later. We could eagerly normalize to Place::Ptr but I don't think that would actually simplify things much.)

Having done all that, we can finally properly abstract projections: we have a new Projectable trait that has the basic methods required for projecting, and then all projection methods are implemented for anything that implements that trait. We can even implement it for ImmTy! (Not that we need that, but it seems neat.) The visitor can be greatly simplified; it doesn't need its own trait any more but it can use the Projectable trait. We also don't need the separate Mut visitor any more; that was required only to reflect that projections on PlaceTy needed &mut self.

It is possible that there are some more &mut self that can now become &self... I guess we'll notice that over time.

r? @oli-obk

rustbot · 2023-07-24T13:21:51Z

The Miri subtree was changed

cc @rust-lang/miri

Some changes occurred to the CTFE / Miri engine

cc @rust-lang/miri

Some changes occurred to the CTFE / Miri engine

cc @rust-lang/miri

RalfJung · 2023-07-24T13:25:38Z

src/tools/miri/src/helpers.rs

@@ -479,6 +480,16 @@ pub trait EvalContextExt<'mir, 'tcx: 'mir>: crate::MiriInterpCxExt<'mir, 'tcx> {
                self.ecx
            }

+            fn aggregate_field_order(memory_index: &IndexVec<FieldIdx, u32>, idx: usize) -> usize {
+                // We need to do an *inverse* lookup: find the field that has position `idx` in memory order.
+                for (src_field, &mem_pos) in memory_index.iter_enumerated() {


This loop is unfortunate, it makes aggregate traversal n^2 in the number of fields. But at least this only affects SB (and I doubt it is a limiting factor there). I didn't find a way to avoid this without introducing overhead for the common case (where we don't care about the order). The only nice way I came up with required return-position impl trait in a trait.

RalfJung · 2023-07-24T13:27:46Z

compiler/rustc_const_eval/src/interpret/validity.rs

-        Ok(())
-    }
-
-    fn visit_aggregate(


I don't entirely understand why we hooked visit_aggregate for these optimizations, they seem entirely implementable by hooking visit_value...

RalfJung · 2023-07-24T13:36:47Z

compiler/rustc_const_eval/src/interpret/projection.rs

+            if base.layout().is_sized() {
+                // An unsized field of a sized type? Sure...
+                // But const-prop actually feeds us such nonsense MIR!
+                throw_inval!(ConstPropNonsense);
+            }


This is wild, took me quite a while to debug. It is triggered in tests/ui/const_prop/issue-86351.rs.

RalfJung · 2023-07-24T13:38:34Z

@bors try @rust-timer queue

bors · 2023-07-24T13:38:43Z

⌛ Trying commit f3f4cf470cd07fb0c53a261205edd3665d230038 with merge 5bc60a1ed7d987a7881444c4e80ed9ed697acd95...

RalfJung · 2023-07-24T15:56:30Z

@bors try @rust-timer queue

bors · 2023-07-24T15:56:40Z

⌛ Trying commit 2c296dcc079483d1cbc5845068f75b5645f16a72 with merge a367073b0b085658182079202c18c2b339640bbf...

RalfJung · 2023-07-24T16:13:45Z

compiler/rustc_const_eval/src/interpret/projection.rs

+        if layout.abi.is_uninhabited() {
+            // `read_discriminant` should have excluded uninhabited variants... but ConstProp calls
+            // us on dead code.
+            throw_inval!(ConstPropNonsense)


This failed somewhere during bootstrap. Sadly I don't know how to reproduce it.

rustbot · 2023-07-24T17:42:32Z

Some changes occurred in src/tools/clippy

cc @rust-lang/clippy

RalfJung · 2023-07-24T17:42:58Z

@bors try

oli-obk · 2023-07-25T07:53:11Z

The command can be found in the details of every perf run: in https://perf.rust-lang.org/detailed-query.html?commit=945ab669165d66b3ab6088bece01844e36f6a029&base_commit=fc8a3e357a0a5e317132e5ff8858ec70970fb07a&benchmark=ctfe-stress-5-opt&scenario=full it says "Local Profile (diff)", that command can then be run within the https://github.com/rust-lang/rustc-perf repo (after building it in release mode)

RalfJung · 2023-07-25T10:14:38Z

Okay, thanks. I can't promise when I will have the time to take a look at that.

oli-obk · 2023-07-25T12:21:03Z

Here's the entire cachegrind diff for the perf run I linked above:

516,954,467  ???:rustc_const_eval::interpret::eval_context::InterpCx::statement
 19,661,268  ???:rustc_const_eval::interpret::eval_context::InterpCx::write_immediate_no_validate
 19,634,772  ???:rustc_const_eval::interpret::eval_context::InterpCx::eval_place
  9,324,311  ???:rustc_const_eval::interpret::eval_context::InterpCx::terminator
  6,606,370  ???:rustc_const_eval::interpret::eval_context::InterpCx::pass_argument::<core::iter::adapters::filter::Filter<core::iter::adapters::zip::Zip<core::slice::iter::Iter<rustc_const_eval::interpret::terminator::FnArg>, core::slice::iter::Iter<rustc_target::abi::call::ArgAbi<rustc_middle::ty::Ty>>>, rustc_const_eval::interpret::eval_context::InterpCx::eval_fn_call::{closure
 -4,613,666  ???:rustc_const_eval::interpret::eval_context::InterpCx::unsize_into_ptr
  2,936,128  ???:rustc_const_eval::interpret::eval_context::InterpCx::eval_mir_constant
 -1,107,248  ???:rustc_const_eval::interpret::eval_context::InterpCx::eval_operand
  1,101,010  ???:rustc_const_eval::interpret::eval_context::InterpCx::push_stack_frame

so not very helpful I guess. Optimizer noise? Or actually new work being done?

RalfJung · 2023-07-25T12:30:02Z

Thanks!

eval_place became more expensive? That is very strange. I would have expected it to become cheaper since it doesn't need to force_allocation ever any more. But maybe that codepath is just not hit very much during the stress test?

write_immediate_no_validate is now doing an extra case distinction (but that's just a discriminant read so should be cheap) and it is doing the force_allocation that previously were done during projection, so that one becoming more expensive is somewhat expected. It should just be balanced by projection becoming cheaper...

compiler/rustc_middle/src/mir/interpret/error.rs

…e that for visitors

oli-obk · 2023-07-25T12:33:57Z

@bors r+

Perf is mixed, and I'd rather have the CTFE stress test regress a bit while generally having improvements, and massively improving the CTFE logic.

bors · 2023-07-25T12:33:58Z

📌 Commit d127600 has been approved by oli-obk

It is now in the queue for this repository.

bors · 2023-07-25T14:18:11Z

⌛ Testing commit d127600 with merge 4fc6b33...

bors · 2023-07-25T16:18:17Z

☀️ Test successful - checks-actions
Approved by: oli-obk
Pushing 4fc6b33 to master...

rust-timer · 2023-07-25T20:54:30Z

Finished benchmarking commit (4fc6b33): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	2.8%	[2.4%, 3.2%]	6
Improvements ✅ (primary)	-0.7%	[-0.8%, -0.7%]	3
Improvements ✅ (secondary)	-1.0%	[-1.5%, -0.5%]	10
All ❌✅ (primary)	-0.7%	[-0.8%, -0.7%]	3

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	2.6%	[2.5%, 2.7%]	2
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-	-	0

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	8.3%	[6.4%, 10.2%]	6
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-	-	0

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 651.438s -> 650.083s (-0.21%)

rylev · 2023-08-05T13:10:40Z

@RalfJung the regressions that are least likely to be noise are in CTE stress tests. Given the stress nature of the test, the smallish regressions probably aren't worth investigating, but I'd just like to check with you. Is this something worth looking into?

@rustbot label: +perf-regression-triaged

RalfJung · 2023-08-05T13:47:57Z

Given that we have a bunch of improvements, in particular some on primary benchmarks, and given the code quality improvements, I am totally fine with accepting this perf loss on the ctfe stress test.

rustbot assigned oli-obk Jul 24, 2023

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jul 24, 2023

RalfJung commented Jul 24, 2023

View reviewed changes