Make `mem::replace` simpler in codegen #111010

scottmcm · 2023-04-30T06:56:47Z

Since they'd mentioned more intrinsics for simplifying stuff recently,
r? @WaffleLapkin

This is a continuation of me looking at foundational stuff that ends up with more instructions than it really needs. Specifically I noticed this one because Range::next isn't MIR-inlining, and one of the largest parts of it is a replace::<usize> that's a good dozen instructions instead of the two it could be.

So this means that ptr::write with a Copy type no longer generates worse IR than manually dereferencing (well, at least in LLVM -- MIR still has bonus pointer casts), and in doing so means that we're finally down to just the two essential memcpys when emitting mem::replace for a large type, rather than the bonus-alloca and three memcpys we emitted before this (or the 6 we currently emit in 1.69 stable). That said, LLVM does usually manage to optimize the extra code away. But it's still nice for it not to have to do as much, thanks to (for example) not going through an alloca when replaceing a primitive like a usize.

(This is a new intrinsic, but one that's immediately lowered to existing MIR constructs, so not anything that MIRI or the codegen backends or MIR semantics needs to do work to handle.)

rustbot · 2023-04-30T06:56:54Z

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with @rustbot label +T-libs-api -T-libs to tag it appropriately. If this PR contains changes to any unstable APIs please edit the PR description to add a link to the relevant API Change Proposal or create one if you haven't already. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

Stabilizing library features
Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
Changing public documentation in ways that create new stability guarantees
Changing observable runtime behavior of library APIs

Some changes occurred to MIR optimizations

cc @rust-lang/wg-mir-opt

scottmcm · 2023-04-30T07:00:42Z

tests/mir-opt/pre-codegen/mem_replace.mem_replace.PreCodegen.after.mir

+        _4 = &raw mut (*_1);             // scope 3 at $SRC_DIR/core/src/mem/mod.rs:LL:COL
+        StorageLive(_6);                 // scope 3 at $SRC_DIR/core/src/mem/mod.rs:LL:COL
+        (*_4) = _2;                      // scope 8 at $SRC_DIR/core/src/ptr/mod.rs:LL:COL
+        StorageDead(_6);                 // scope 3 at $SRC_DIR/core/src/mem/mod.rs:LL:COL


These un-useful StorageLive(_6)+StorageDead(_6) will go away with #110702

rustbot · 2023-04-30T07:39:24Z

The Miri subtree was changed

cc @rust-lang/miri

scottmcm · 2023-04-30T08:43:08Z

@bors try @rust-timer queue

bors · 2023-04-30T08:43:17Z

⌛ Trying commit 328e1db036ffa25505d1c758566535b7daee3b29 with merge 585ceb46c7c35b1ad9ade2ba4c3ebea1512ad64d...

tests/codegen/mem-replace-big-type.rs

bors · 2023-04-30T10:52:48Z

☀️ Try build successful - checks-actions
Build commit: 585ceb46c7c35b1ad9ade2ba4c3ebea1512ad64d (585ceb46c7c35b1ad9ade2ba4c3ebea1512ad64d)

est31 · 2023-04-30T11:14:31Z

This could maybe also be used inside the vec macro, in stead of #[rustc_box]. see the discussion in #110715.

CC also #80290 which this PR is a reversal of I think. To be clear, I'm in favour of this PR.

scottmcm · 2023-04-30T18:46:26Z

Thanks, @est31 !

I think the difference that makes sense here is related to this note from that PR:

This means we can also remove move_val_init implementations in codegen and Miri, and its special handling in the borrow checker.

If this PR needed to update the borrow checker particularly, but even just codegen or CTFE or Miri code, then it'd absolutely not be worth doing.

But using a new intrinsic that lowers to existing MIR functionality (instead of using intrinsics::forget!) seems reasonable to me.

scottmcm · 2023-04-30T18:50:23Z

tests/codegen/mem-replace-direct-memcpy.rs

-
-pub fn replace_byte(dst: &mut u8, src: u8) -> u8 {
-    std::mem::replace(dst, src)
-}


The "direct memcpy" name for this test no longer made sense, as it doesn't call memcpy any more.

What it was testing is subsumed by the new tests/codegen/mem-replace-simple-type.rs test below.

rust-timer · 2023-04-30T19:49:45Z

Finished benchmarking commit (585ceb46c7c35b1ad9ade2ba4c3ebea1512ad64d): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.6%	[0.6%, 0.6%]	1
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-1.3%	[-1.5%, -1.0%]	3
Improvements ✅ (secondary)	-0.6%	[-0.6%, -0.6%]	1
All ❌✅ (primary)	-0.8%	[-1.5%, 0.6%]	4

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	6.0%	[3.1%, 10.3%]	4
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-2.4%	[-3.8%, -0.1%]	3
Improvements ✅ (secondary)	-2.2%	[-2.2%, -2.2%]	1
All ❌✅ (primary)	2.4%	[-3.8%, 10.3%]	7

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-1.6%	[-1.8%, -1.3%]	3
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-1.6%	[-1.8%, -1.3%]	3

library/core/src/ptr/mod.rs

WaffleLapkin

Changes look good to me.

Perf wise it looks like the codegen_crate is now slower (I assume due to more lowering happening?) while LLVM is faster (I assume due to better llvm-ir passed to it), this is along with a lot of noise, probably because of the changes of how compiler itself is compiled. ripgrep regression is weird, but overall I think this looks fine.

r=me, with or without the nit.

WaffleLapkin · 2023-05-01T10:29:10Z

library/core/src/intrinsics.rs

+    // to `dst` while `src` is owned by this function.
+    unsafe {
+        copy_nonoverlapping::<T>(&value, ptr, 1);
+        forget(value);


Not important, but could you use ManuallyDrop instead?

I probably could, but the preexisting implementation of write went out of its way to use intrinsics::forget

rust/library/core/src/ptr/mod.rs

Line 1369 in 4b87ed9

intrinsics::forget(src);

even though mem::forget doesn't

rust/library/core/src/mem/mod.rs

Lines 148 to 150 in 4b87ed9

pub const fn forget<T>(t: T) {

let _ = ManuallyDrop::new(t);

}

which is probably just because it has much better codegen, but in case it actually matters I'd rather just leave it like this since it sounds like you don't feel particularly strongly about it.

Okay 👍🏻

WaffleLapkin · 2023-05-01T11:15:57Z

@bors r+ rollup=never

bors · 2023-05-01T11:15:59Z

📌 Commit 3456f77 has been approved by WaffleLapkin

It is now in the queue for this repository.

bors · 2023-05-01T14:29:18Z

⌛ Testing commit 3456f77 with merge 6db1e5e...

bors · 2023-05-01T17:34:17Z

☀️ Test successful - checks-actions
Approved by: WaffleLapkin
Pushing 6db1e5e to master...

rust-timer · 2023-05-01T18:49:51Z

Finished benchmarking commit (6db1e5e): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.4%	[0.2%, 0.6%]	3
Regressions ❌ (secondary)	0.3%	[0.2%, 0.5%]	2
Improvements ✅ (primary)	-1.3%	[-1.8%, -1.0%]	3
Improvements ✅ (secondary)	-0.5%	[-0.7%, -0.4%]	3
All ❌✅ (primary)	-0.5%	[-1.8%, 0.6%]	6

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	4.6%	[3.0%, 8.5%]	4
Regressions ❌ (secondary)	2.6%	[2.2%, 3.0%]	2
Improvements ✅ (primary)	-2.9%	[-5.0%, -0.1%]	3
Improvements ✅ (secondary)	-1.0%	[-1.0%, -0.9%]	2
All ❌✅ (primary)	1.4%	[-5.0%, 8.5%]	7

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-1.5%	[-1.6%, -1.3%]	2
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-1.5%	[-1.6%, -1.3%]	2

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.2%	[0.0%, 0.4%]	12
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.1%	[-1.0%, -0.0%]	37
Improvements ✅ (secondary)	-0.0%	[-0.1%, -0.0%]	12
All ❌✅ (primary)	-0.1%	[-1.0%, 0.4%]	49

Bootstrap: 656.574s -> 656.548s (-0.00%)

RalfJung · 2023-05-01T18:55:56Z

src/tools/miri/tests/fail/dangling_pointers/null_pointer_write_zst.stderr

  --> $DIR/null_pointer_write_zst.rs:LL:CC
   |
 LL |     unsafe { std::ptr::null_mut::<[u8; 0]>().write(zst_val) };
-   |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ memory access failed: null pointer is a dangling pointer (it has no provenance)
+   |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ dereferencing pointer failed: null pointer is a dangling pointer (it has no provenance)


Almost seems like for nice error messages in Miri it'd be better not to lower this to an assignment. Currently it looks like *ptr = ..., so Miri sees the deref in *ptr and emits the error accordingly.

Hmm, I guess I didn't think of it being a copy instead of a deref as meaningful here.

One other thing I tried was implementing write(p, x) as *p.cast() = ManuallyDrop::new(x) which I think is also a perfectly reasonable no-intrinsic implementation -- more obviously a typed write, which I think this is, given passing the parameter to the function is typed -- but would have the same "dereferencing pointer" error in MIRI.

Fair, I guess really this is about write implementation details where the user can't tell whether a deref happens before the write or not. I guess 'dereferencing' is not so bad here, I was just primed because of rust-lang/miri#2859.

nnethercote · 2023-05-02T00:03:47Z

The few perf improvements match or outweigh the few perf regressions.

@rustbot label: +perf-regression-triaged

rustbot assigned WaffleLapkin Apr 30, 2023

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Apr 30, 2023

scottmcm commented Apr 30, 2023

View reviewed changes

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 30, 2023

lukas-code reviewed Apr 30, 2023

View reviewed changes

tests/codegen/mem-replace-big-type.rs Outdated Show resolved Hide resolved

This comment has been minimized.

Sign in to view

scottmcm commented Apr 30, 2023

View reviewed changes

MIR pre-codegen test for mem::replace

ca3f742

scottmcm force-pushed the mem-replace-simpler branch from 328e1db to a0a0d69 Compare April 30, 2023 19:06

rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Apr 30, 2023

JakobDegen approved these changes May 1, 2023

View reviewed changes

library/core/src/ptr/mod.rs Outdated Show resolved Hide resolved

scottmcm added 2 commits April 30, 2023 22:33

Codegen fewer instructions in mem::replace

5292d48

Update MIRI compiletests

3456f77

scottmcm force-pushed the mem-replace-simpler branch from a0a0d69 to 3456f77 Compare May 1, 2023 05:33

WaffleLapkin approved these changes May 1, 2023

View reviewed changes

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 1, 2023

bors added the merged-by-bors This PR was explicitly merged by bors. label May 1, 2023

bors merged commit 6db1e5e into rust-lang:master May 1, 2023

rustbot added this to the 1.71.0 milestone May 1, 2023

scottmcm deleted the mem-replace-simpler branch May 1, 2023 17:35

RalfJung reviewed May 1, 2023

View reviewed changes

RalfJung mentioned this pull request May 1, 2023

Moving #[rustc_box] to move_val_init intrinsic #110715

Closed

rustbot added the perf-regression-triaged The performance regression has been triaged. label May 2, 2023

scottmcm mentioned this pull request May 5, 2023

Allow reading a *mut without an internal cast #111233

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make `mem::replace` simpler in codegen #111010

Make `mem::replace` simpler in codegen #111010

scottmcm commented Apr 30, 2023

rustbot commented Apr 30, 2023

scottmcm Apr 30, 2023

This comment has been minimized.

rustbot commented Apr 30, 2023

scottmcm commented Apr 30, 2023

This comment has been minimized.

bors commented Apr 30, 2023

bors commented Apr 30, 2023

This comment has been minimized.

est31 commented Apr 30, 2023

scottmcm commented Apr 30, 2023

scottmcm Apr 30, 2023

rust-timer commented Apr 30, 2023

WaffleLapkin left a comment

WaffleLapkin May 1, 2023

scottmcm May 1, 2023

WaffleLapkin May 1, 2023

WaffleLapkin commented May 1, 2023

bors commented May 1, 2023

bors commented May 1, 2023

bors commented May 1, 2023

rust-timer commented May 1, 2023

RalfJung May 1, 2023

scottmcm May 1, 2023

RalfJung May 1, 2023

nnethercote commented May 2, 2023

	pub const fn forget<T>(t: T) {
	let _ = ManuallyDrop::new(t);
	}

Make mem::replace simpler in codegen #111010

Make mem::replace simpler in codegen #111010

Conversation

scottmcm commented Apr 30, 2023

rustbot commented Apr 30, 2023

Choose a reason for hiding this comment

This comment has been minimized.

rustbot commented Apr 30, 2023

scottmcm commented Apr 30, 2023

This comment has been minimized.

bors commented Apr 30, 2023

bors commented Apr 30, 2023

This comment has been minimized.

est31 commented Apr 30, 2023

scottmcm commented Apr 30, 2023

Choose a reason for hiding this comment

rust-timer commented Apr 30, 2023

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

WaffleLapkin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WaffleLapkin commented May 1, 2023

bors commented May 1, 2023

bors commented May 1, 2023

bors commented May 1, 2023

rust-timer commented May 1, 2023

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nnethercote commented May 2, 2023

Make `mem::replace` simpler in codegen #111010

Make `mem::replace` simpler in codegen #111010