Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement intrinsic for swapping values #111744

Conversation

AngelicosPhosphoros
Copy link
Contributor

@AngelicosPhosphoros AngelicosPhosphoros commented May 18, 2023

This allows move target- and backend-specific optmization from library code to codegen.
Also, this should make MIR-optimization simpler.

Main optimization implemented in this PR makes backend generate swap without using allocas
removing unneccessary memory writes and reads and reducing stack usage.

One of the main optimizations is using larger integer chunks for swapping in x86_64 by utilizing unaligned reads/writes. It reduces code size (especially for debug builds) and prevent cases of ineffective vectorizations like load <4 x i8> (LLVM doesn't vectorize it further despite vectorizing load i32).

Also added more tests.

rustc_codegen_cranelift implementation is naive because bjorn3 promised to rewrite it later.

rustc_codegen_gcc uses same implementation via rustc_codegen_ssa.

Previous try: #98892

@rustbot
Copy link
Collaborator

rustbot commented May 18, 2023

r? @eholk

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels May 18, 2023
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@AngelicosPhosphoros AngelicosPhosphoros force-pushed the swap_intrinsic_impl branch 2 times, most recently from 569cb1a to 57714da Compare May 19, 2023 14:44
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

self.read_pointer(&args[0])?,
self.read_pointer(&args[1])?,
1,
self.deref_operand(&args[0])?.layout,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using args[0] twice here isn't good, each argument should be converted to its high-level representation exactly once.

Does let layout = self.layout_of(substs.type_at(0))? like in some of the other intrinsics help?

@@ -1222,6 +1222,76 @@ impl<'mir, 'tcx: 'mir, M: Machine<'mir, 'tcx>> InterpCx<'mir, 'tcx, M> {

Ok(())
}

pub fn swap_memory(
Copy link
Member

@RalfJung RalfJung Jun 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mem_swap_nonoverlapping seems like a better name

EDIT: But first decide what the semantics should be -- typed copy or untyped copy.

let align = layout.align.abi;
let Some((x_alloc_id, x_offset, _)) = self.get_ptr_access(x_ptr, size, align)? else {
// Called on ZST so it is noop.
return Ok(())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not good. If the 2nd pointer is invalid you would miss the UB! It is important to check all conditions before short-circuiting. Roughly follow that mem_copy above does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @RalfJung !
Is is now correct way or do I need to call M::before_memory_read and M::before_memory_write too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll need to call these hooks the same way mem_copy does.

count: u64,
layout: ty::layout::TyAndLayout<'tcx>,
) -> InterpResult<'tcx> {
let size = layout.size;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The size of the access is count * layout.size, isn't it?


let tmp_stack_alloc = {
let can_use_values = layout.ty.is_primitive() || layout.ty.is_slice();
if can_use_values {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need 2 code paths here?

let x_mplace = MPlaceTy::from_aligned_ptr(curr_x_ptr, layout);
let y_mplace = MPlaceTy::from_aligned_ptr(curr_y_ptr, layout);
if let Some((tmp_x_place, tmp_y_place)) = tmp_stack_alloc {
self.copy_op(&x_mplace.into(), &tmp_x_place.into(), false)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is doing a typed swap! Is that really what we want? If it is, then this is the wrong file. memory.rs is for byte-level, untyped operations. This should probably be a function in intrinsics.rs then. Also it should take places as arguments, not pointers, if that is the level of abstraction it acts upon.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Judging from these docs the copy probably should be untyped. That means memory.rs is the right file and taking pointers makes sense. But it means you must not use copy_op in here as that would have too much UB. Use mem_copy instead.

///
/// Swaps 2 values using minimal extra memory depending on target.
/// Created to remove target/backend specific optimizations from library code to
/// avoid confusing const evaluation and MIRI.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The creation of this creates extra cost for CTFE and Miri, I don't understand the point of the comment here.

///
/// * The region of memory beginning at `x` with a size of `size_of::<T>()`
/// bytes must *not* overlap with the region of memory beginning at `y`
/// with the same size.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should clarify whether the swapping happens typed or untyped. For example: if I swap two bool here and they are not 0 or 1, is that UB?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is used to implement std::ptr::swap_nonoverlapping? In that case ut must be an untyped copy (see the docs). So that should be stated here.

@eholk
Copy link
Contributor

eholk commented Jun 5, 2023

@RalfJung - do you mind if I reassign this to you? You seem to have a much better idea of what's needed than I do. Feel free to reroll if you want to review the whole thing. Thanks!

r? @RalfJung

@rustbot rustbot assigned RalfJung and unassigned eholk Jun 5, 2023
@RalfJung
Copy link
Member

RalfJung commented Jun 6, 2023

Sorry, I can only review the interpreter parts of this, not the codegen parts. That will need someone from r? compiler.

@rustbot rustbot assigned b-naber and unassigned RalfJung Jun 6, 2023
This allows move target- and backend-specific optmization from library code to codegen.
Also, this should make const eval/miri evaluation simpler.

Main optimization implemented in this PR makes backend generate swap without using allocas
removing unneccessary memory writes and reads and reducing stack usage.

One of the main optimizations is using larger integer chunks for swapping in x86_64 by utilizing unaligned reads/writes. It reduces code size (especially for debug builds) and prevent cases of ineffective vectorizations like `load <4 x i8>` (LLVM doesn't vectorize it further despite vectorizing `load i32`).

Also added more tests.
@b-naber
Copy link
Contributor

b-naber commented Jun 13, 2023

Sorry also not that familiar with codegen stuff. r? compiler

@rustbot rustbot assigned TaKO8Ki and unassigned b-naber Jun 13, 2023
Copy link
Member

@RalfJung RalfJung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This already looks much better, we are getting there. :)

Comment on lines +1226 to +1231
pub fn mem_swap_nonoverlapping(
&mut self,
x_ptr: Pointer<Option<M::Provenance>>,
y_ptr: Pointer<Option<M::Provenance>>,
count: u64,
layout: ty::layout::TyAndLayout<'tcx>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an untyped copy, it shouldn't take a layout. As I said before, it should follow the style of mem_copy.

let elem_size = layout.size;
let align = layout.align.abi;

if count > i64::MAX as u64 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't be needed, you can't have a valid ptr for 2^63 bytes anyway.

throw_ub_format!("`count` argument to `swap_nonoverlapping_many` is too large.");
}

let first_ptr_acc = self.get_ptr_access(x_ptr, elem_size * count, align)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use consistent naming: the arguments are x and y but here it becomes first and second and later you talk about "left" and "right". Stick to a single naming scheme.

let align = layout.align.abi;
let Some((x_alloc_id, x_offset, _)) = self.get_ptr_access(x_ptr, size, align)? else {
// Called on ZST so it is noop.
return Ok(())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll need to call these hooks the same way mem_copy does.

return Ok(());
}

let tmp_stack_alloc = self.allocate(layout, MemoryKind::Stack)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making this 'stack' memory is not right. This special magic memory.^^ We'll probably need a new MemoryKind, Scratch or something like that.

Copy link
Contributor Author

@AngelicosPhosphoros AngelicosPhosphoros Jun 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting to add this new MemoryKind?

P.S. What is so special about MemoryKind::Stack?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting to add this new MemoryKind?

Yes.

P.S. What is so special about MemoryKind::Stack?

It's not special, but it's for stack memory. This here is not stack memory, so we shouldn't give a wrong MemoryKind.

@apiraino
Copy link
Contributor

Switching to waiting on author, seems there are some review comments waiting for feedback. Feel free to request a review with @rustbot ready, thanks!

@rustbot author

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 13, 2023
@bors
Copy link
Contributor

bors commented Jul 29, 2023

☔ The latest upstream changes (presumably #114148) made this pull request unmergeable. Please resolve the merge conflicts.

@Dylan-DPC
Copy link
Member

@AngelicosPhosphoros any updates on this?

@Dylan-DPC
Copy link
Member

Closing this as inactive. Feel free to reöpen this pr or create a new pr if you get the time to work on this. Thanks

@Dylan-DPC Dylan-DPC added S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Oct 27, 2023
@Dylan-DPC Dylan-DPC closed this Oct 27, 2023
bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 16, 2024
Let codegen decide when to `mem::swap` with immediates

Making `libcore` decide this is silly; the backend has so much better information about when it's a good idea.

Thus this PR introduces a new `typed_swap` intrinsic with a fallback body, and replaces that fallback implementation when swapping immediates or scalar pairs.

r? oli-obk

Replaces rust-lang#111744, and means we'll never need more libs PRs like rust-lang#111803 or rust-lang#107140
bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 17, 2024
Let codegen decide when to `mem::swap` with immediates

Making `libcore` decide this is silly; the backend has so much better information about when it's a good idea.

Thus this PR introduces a new `typed_swap` intrinsic with a fallback body, and replaces that fallback implementation when swapping immediates or scalar pairs.

r? oli-obk

Replaces rust-lang#111744, and means we'll never need more libs PRs like rust-lang#111803 or rust-lang#107140
bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 23, 2024
Let codegen decide when to `mem::swap` with immediates

Making `libcore` decide this is silly; the backend has so much better information about when it's a good idea.

Thus this PR introduces a new `typed_swap` intrinsic with a fallback body, and replaces that fallback implementation when swapping immediates or scalar pairs.

r? oli-obk

Replaces rust-lang#111744, and means we'll never need more libs PRs like rust-lang#111803 or rust-lang#107140
bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 23, 2024
Let codegen decide when to `mem::swap` with immediates

Making `libcore` decide this is silly; the backend has so much better information about when it's a good idea.

Thus this PR introduces a new `typed_swap` intrinsic with a fallback body, and replaces that fallback implementation when swapping immediates or scalar pairs.

r? oli-obk

Replaces rust-lang#111744, and means we'll never need more libs PRs like rust-lang#111803 or rust-lang#107140
bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 23, 2024
Let codegen decide when to `mem::swap` with immediates

Making `libcore` decide this is silly; the backend has so much better information about when it's a good idea.

Thus this PR introduces a new `typed_swap` intrinsic with a fallback body, and replaces that fallback implementation when swapping immediates or scalar pairs.

r? oli-obk

Replaces rust-lang#111744, and means we'll never need more libs PRs like rust-lang#111803 or rust-lang#107140
RenjiSann pushed a commit to RenjiSann/rust that referenced this pull request Mar 25, 2024
Let codegen decide when to `mem::swap` with immediates

Making `libcore` decide this is silly; the backend has so much better information about when it's a good idea.

Thus this PR introduces a new `typed_swap` intrinsic with a fallback body, and replaces that fallback implementation when swapping immediates or scalar pairs.

r? oli-obk

Replaces rust-lang#111744, and means we'll never need more libs PRs like rust-lang#111803 or rust-lang#107140
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.