Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

interpret: Unify projections for MPlaceTy, PlaceTy, OpTy #114011

Merged
merged 5 commits into from
Jul 25, 2023

Conversation

RalfJung
Copy link
Member

For ~forever, we didn't really have proper shared code for handling projections into those three types. This is mostly because PlaceTy projections require &mut self: they might have to force_allocate to be able to represent a project part-way into a local.

This PR finally fixes that, by enhancing Place::Local with an offset so that such an optimized place can point into a part of a place without having requiring an in-memory representation. If we later write to that place, we will still do force_allocate -- for now we don't have an optimized path in write_immediate that would avoid allocation for partial overwrites of immediately stored locals. But in write_immediate we have &mut self so at least this no longer pollutes all our type signatures.

(Ironically, I seem to distantly remember that many years ago, Place::Local did have an offset, and I removed it to simplify things. I guess I didn't realize why it was so useful... I am also not sure if this was actually used to achieve place projection on &self back then.)

The offset had type Option<Size>, where None represent "no projection was applied". This is needed because locals can be unsized (when they are arguments) but Place::Local cannot store metadata: if the offset is None, this refers to the entire local, so we can use the metadata of the local itself (which must be indirect); if a projection gets applied, since the local is indirect, it will turn into a Place::Ptr. (Note that even for indirect locals we can have Place::Local: when the local appears in MIR, we always start with Place::Local, and only check frame.locals later. We could eagerly normalize to Place::Ptr but I don't think that would actually simplify things much.)

Having done all that, we can finally properly abstract projections: we have a new Projectable trait that has the basic methods required for projecting, and then all projection methods are implemented for anything that implements that trait. We can even implement it for ImmTy! (Not that we need that, but it seems neat.) The visitor can be greatly simplified; it doesn't need its own trait any more but it can use the Projectable trait. We also don't need the separate Mut visitor any more; that was required only to reflect that projections on PlaceTy needed &mut self.

It is possible that there are some more &mut self that can now become &self... I guess we'll notice that over time.

r? @oli-obk

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jul 24, 2023
@rustbot
Copy link
Collaborator

rustbot commented Jul 24, 2023

The Miri subtree was changed

cc @rust-lang/miri

Some changes occurred to the CTFE / Miri engine

cc @rust-lang/miri

Some changes occurred to the CTFE / Miri engine

cc @rust-lang/miri

@@ -479,6 +480,16 @@ pub trait EvalContextExt<'mir, 'tcx: 'mir>: crate::MiriInterpCxExt<'mir, 'tcx> {
self.ecx
}

fn aggregate_field_order(memory_index: &IndexVec<FieldIdx, u32>, idx: usize) -> usize {
// We need to do an *inverse* lookup: find the field that has position `idx` in memory order.
for (src_field, &mem_pos) in memory_index.iter_enumerated() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loop is unfortunate, it makes aggregate traversal n^2 in the number of fields. But at least this only affects SB (and I doubt it is a limiting factor there). I didn't find a way to avoid this without introducing overhead for the common case (where we don't care about the order). The only nice way I came up with required return-position impl trait in a trait.

@rust-log-analyzer

This comment has been minimized.

Ok(())
}

fn visit_aggregate(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't entirely understand why we hooked visit_aggregate for these optimizations, they seem entirely implementable by hooking visit_value...

Comment on lines +104 to +107
if base.layout().is_sized() {
// An unsized field of a sized type? Sure...
// But const-prop actually feeds us such nonsense MIR!
throw_inval!(ConstPropNonsense);
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wild, took me quite a while to debug. It is triggered in tests/ui/const_prop/issue-86351.rs.

@RalfJung
Copy link
Member Author

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 24, 2023
@bors
Copy link
Contributor

bors commented Jul 24, 2023

⌛ Trying commit f3f4cf470cd07fb0c53a261205edd3665d230038 with merge 5bc60a1ed7d987a7881444c4e80ed9ed697acd95...

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@RalfJung
Copy link
Member Author

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Jul 24, 2023

⌛ Trying commit 2c296dcc079483d1cbc5845068f75b5645f16a72 with merge a367073b0b085658182079202c18c2b339640bbf...

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

if layout.abi.is_uninhabited() {
// `read_discriminant` should have excluded uninhabited variants... but ConstProp calls
// us on dead code.
throw_inval!(ConstPropNonsense)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This failed somewhere during bootstrap. Sadly I don't know how to reproduce it.

@rust-log-analyzer

This comment has been minimized.

@rustbot
Copy link
Collaborator

rustbot commented Jul 24, 2023

Some changes occurred in src/tools/clippy

cc @rust-lang/clippy

@RalfJung
Copy link
Member Author

@bors try

@oli-obk
Copy link
Contributor

oli-obk commented Jul 25, 2023

The command can be found in the details of every perf run: in https://perf.rust-lang.org/detailed-query.html?commit=945ab669165d66b3ab6088bece01844e36f6a029&base_commit=fc8a3e357a0a5e317132e5ff8858ec70970fb07a&benchmark=ctfe-stress-5-opt&scenario=full it says "Local Profile (diff)", that command can then be run within the https://github.com/rust-lang/rustc-perf repo (after building it in release mode)

@rust-log-analyzer

This comment has been minimized.

@RalfJung
Copy link
Member Author

RalfJung commented Jul 25, 2023

Okay, thanks. I can't promise when I will have the time to take a look at that.

@oli-obk
Copy link
Contributor

oli-obk commented Jul 25, 2023

Here's the entire cachegrind diff for the perf run I linked above:

516,954,467  ???:rustc_const_eval::interpret::eval_context::InterpCx::statement
 19,661,268  ???:rustc_const_eval::interpret::eval_context::InterpCx::write_immediate_no_validate
 19,634,772  ???:rustc_const_eval::interpret::eval_context::InterpCx::eval_place
  9,324,311  ???:rustc_const_eval::interpret::eval_context::InterpCx::terminator
  6,606,370  ???:rustc_const_eval::interpret::eval_context::InterpCx::pass_argument::<core::iter::adapters::filter::Filter<core::iter::adapters::zip::Zip<core::slice::iter::Iter<rustc_const_eval::interpret::terminator::FnArg>, core::slice::iter::Iter<rustc_target::abi::call::ArgAbi<rustc_middle::ty::Ty>>>, rustc_const_eval::interpret::eval_context::InterpCx::eval_fn_call::{closure
 -4,613,666  ???:rustc_const_eval::interpret::eval_context::InterpCx::unsize_into_ptr
  2,936,128  ???:rustc_const_eval::interpret::eval_context::InterpCx::eval_mir_constant
 -1,107,248  ???:rustc_const_eval::interpret::eval_context::InterpCx::eval_operand
  1,101,010  ???:rustc_const_eval::interpret::eval_context::InterpCx::push_stack_frame

so not very helpful I guess. Optimizer noise? Or actually new work being done?

@RalfJung
Copy link
Member Author

Thanks!

eval_place became more expensive? That is very strange. I would have expected it to become cheaper since it doesn't need to force_allocation ever any more. But maybe that codepath is just not hit very much during the stress test?

write_immediate_no_validate is now doing an extra case distinction (but that's just a discriminant read so should be cheap) and it is doing the force_allocation that previously were done during projection, so that one becoming more expensive is somewhat expected. It should just be balanced by projection becoming cheaper...

@oli-obk
Copy link
Contributor

oli-obk commented Jul 25, 2023

@bors r+

Perf is mixed, and I'd rather have the CTFE stress test regress a bit while generally having improvements, and massively improving the CTFE logic.

@bors
Copy link
Contributor

bors commented Jul 25, 2023

📌 Commit d127600 has been approved by oli-obk

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 25, 2023
@bors
Copy link
Contributor

bors commented Jul 25, 2023

⌛ Testing commit d127600 with merge 4fc6b33...

@bors
Copy link
Contributor

bors commented Jul 25, 2023

☀️ Test successful - checks-actions
Approved by: oli-obk
Pushing 4fc6b33 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Jul 25, 2023
@bors bors merged commit 4fc6b33 into rust-lang:master Jul 25, 2023
11 checks passed
@rustbot rustbot added this to the 1.73.0 milestone Jul 25, 2023
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (4fc6b33): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
2.8% [2.4%, 3.2%] 6
Improvements ✅
(primary)
-0.7% [-0.8%, -0.7%] 3
Improvements ✅
(secondary)
-1.0% [-1.5%, -0.5%] 10
All ❌✅ (primary) -0.7% [-0.8%, -0.7%] 3

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
2.6% [2.5%, 2.7%] 2
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
8.3% [6.4%, 10.2%] 6
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 651.438s -> 650.083s (-0.21%)

@RalfJung RalfJung deleted the place-projection branch July 26, 2023 21:26
@rylev
Copy link
Member

rylev commented Aug 5, 2023

@RalfJung the regressions that are least likely to be noise are in CTE stress tests. Given the stress nature of the test, the smallish regressions probably aren't worth investigating, but I'd just like to check with you. Is this something worth looking into?

@rustbot label: +perf-regression-triaged

@rustbot rustbot added the perf-regression-triaged The performance regression has been triaged. label Aug 5, 2023
@RalfJung
Copy link
Member Author

RalfJung commented Aug 5, 2023

Given that we have a bunch of improvements, in particular some on primary benchmarks, and given the code quality improvements, I am totally fine with accepting this perf loss on the ctfe stress test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. perf-regression Performance regression. perf-regression-triaged The performance regression has been triaged. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants