-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement intrinsic for swapping values #111744
Implement intrinsic for swapping values #111744
Conversation
r? @eholk (rustbot has picked a reviewer for you, use r? to override) |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
569cb1a
to
57714da
Compare
This comment has been minimized.
This comment has been minimized.
57714da
to
6d29013
Compare
This comment has been minimized.
This comment has been minimized.
6d29013
to
3ee1ae5
Compare
This comment has been minimized.
This comment has been minimized.
3ee1ae5
to
0b23a99
Compare
This comment has been minimized.
This comment has been minimized.
0b23a99
to
260908f
Compare
This comment has been minimized.
This comment has been minimized.
260908f
to
e611cd5
Compare
This comment has been minimized.
This comment has been minimized.
e611cd5
to
b1cb39f
Compare
This comment has been minimized.
This comment has been minimized.
a31c079
to
4bfaa1d
Compare
17714ca
to
3d0cc7b
Compare
a456d13
to
1a20b9a
Compare
self.read_pointer(&args[0])?, | ||
self.read_pointer(&args[1])?, | ||
1, | ||
self.deref_operand(&args[0])?.layout, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using args[0]
twice here isn't good, each argument should be converted to its high-level representation exactly once.
Does let layout = self.layout_of(substs.type_at(0))?
like in some of the other intrinsics help?
@@ -1222,6 +1222,76 @@ impl<'mir, 'tcx: 'mir, M: Machine<'mir, 'tcx>> InterpCx<'mir, 'tcx, M> { | |||
|
|||
Ok(()) | |||
} | |||
|
|||
pub fn swap_memory( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mem_swap_nonoverlapping
seems like a better name
EDIT: But first decide what the semantics should be -- typed copy or untyped copy.
let align = layout.align.abi; | ||
let Some((x_alloc_id, x_offset, _)) = self.get_ptr_access(x_ptr, size, align)? else { | ||
// Called on ZST so it is noop. | ||
return Ok(()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not good. If the 2nd pointer is invalid you would miss the UB! It is important to check all conditions before short-circuiting. Roughly follow that mem_copy
above does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @RalfJung !
Is is now correct way or do I need to call M::before_memory_read
and M::before_memory_write
too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll need to call these hooks the same way mem_copy
does.
count: u64, | ||
layout: ty::layout::TyAndLayout<'tcx>, | ||
) -> InterpResult<'tcx> { | ||
let size = layout.size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The size of the access is count * layout.size
, isn't it?
|
||
let tmp_stack_alloc = { | ||
let can_use_values = layout.ty.is_primitive() || layout.ty.is_slice(); | ||
if can_use_values { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need 2 code paths here?
let x_mplace = MPlaceTy::from_aligned_ptr(curr_x_ptr, layout); | ||
let y_mplace = MPlaceTy::from_aligned_ptr(curr_y_ptr, layout); | ||
if let Some((tmp_x_place, tmp_y_place)) = tmp_stack_alloc { | ||
self.copy_op(&x_mplace.into(), &tmp_x_place.into(), false)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is doing a typed swap! Is that really what we want? If it is, then this is the wrong file. memory.rs
is for byte-level, untyped operations. This should probably be a function in intrinsics.rs
then. Also it should take places as arguments, not pointers, if that is the level of abstraction it acts upon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Judging from these docs the copy probably should be untyped. That means memory.rs
is the right file and taking pointers makes sense. But it means you must not use copy_op
in here as that would have too much UB. Use mem_copy
instead.
library/core/src/intrinsics.rs
Outdated
/// | ||
/// Swaps 2 values using minimal extra memory depending on target. | ||
/// Created to remove target/backend specific optimizations from library code to | ||
/// avoid confusing const evaluation and MIRI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The creation of this creates extra cost for CTFE and Miri, I don't understand the point of the comment here.
/// | ||
/// * The region of memory beginning at `x` with a size of `size_of::<T>()` | ||
/// bytes must *not* overlap with the region of memory beginning at `y` | ||
/// with the same size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should clarify whether the swapping happens typed or untyped. For example: if I swap two bool
here and they are not 0 or 1, is that UB?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this is used to implement std::ptr::swap_nonoverlapping
? In that case ut must be an untyped copy (see the docs). So that should be stated here.
Sorry, I can only review the interpreter parts of this, not the codegen parts. That will need someone from r? compiler. |
This allows move target- and backend-specific optmization from library code to codegen. Also, this should make const eval/miri evaluation simpler. Main optimization implemented in this PR makes backend generate swap without using allocas removing unneccessary memory writes and reads and reducing stack usage. One of the main optimizations is using larger integer chunks for swapping in x86_64 by utilizing unaligned reads/writes. It reduces code size (especially for debug builds) and prevent cases of ineffective vectorizations like `load <4 x i8>` (LLVM doesn't vectorize it further despite vectorizing `load i32`). Also added more tests.
063220c
to
c6db014
Compare
Sorry also not that familiar with codegen stuff. r? compiler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This already looks much better, we are getting there. :)
pub fn mem_swap_nonoverlapping( | ||
&mut self, | ||
x_ptr: Pointer<Option<M::Provenance>>, | ||
y_ptr: Pointer<Option<M::Provenance>>, | ||
count: u64, | ||
layout: ty::layout::TyAndLayout<'tcx>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an untyped copy, it shouldn't take a layout
. As I said before, it should follow the style of mem_copy
.
let elem_size = layout.size; | ||
let align = layout.align.abi; | ||
|
||
if count > i64::MAX as u64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't be needed, you can't have a valid ptr for 2^63 bytes anyway.
throw_ub_format!("`count` argument to `swap_nonoverlapping_many` is too large."); | ||
} | ||
|
||
let first_ptr_acc = self.get_ptr_access(x_ptr, elem_size * count, align)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use consistent naming: the arguments are x
and y
but here it becomes first
and second
and later you talk about "left" and "right". Stick to a single naming scheme.
let align = layout.align.abi; | ||
let Some((x_alloc_id, x_offset, _)) = self.get_ptr_access(x_ptr, size, align)? else { | ||
// Called on ZST so it is noop. | ||
return Ok(()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll need to call these hooks the same way mem_copy
does.
return Ok(()); | ||
} | ||
|
||
let tmp_stack_alloc = self.allocate(layout, MemoryKind::Stack)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making this 'stack' memory is not right. This special magic memory.^^ We'll probably need a new MemoryKind, Scratch
or something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting to add this new MemoryKind?
P.S. What is so special about MemoryKind::Stack?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting to add this new MemoryKind?
Yes.
P.S. What is so special about MemoryKind::Stack?
It's not special, but it's for stack memory. This here is not stack memory, so we shouldn't give a wrong MemoryKind.
Switching to waiting on author, seems there are some review comments waiting for feedback. Feel free to request a review with @rustbot author |
☔ The latest upstream changes (presumably #114148) made this pull request unmergeable. Please resolve the merge conflicts. |
@AngelicosPhosphoros any updates on this? |
Closing this as inactive. Feel free to reöpen this pr or create a new pr if you get the time to work on this. Thanks |
Let codegen decide when to `mem::swap` with immediates Making `libcore` decide this is silly; the backend has so much better information about when it's a good idea. Thus this PR introduces a new `typed_swap` intrinsic with a fallback body, and replaces that fallback implementation when swapping immediates or scalar pairs. r? oli-obk Replaces rust-lang#111744, and means we'll never need more libs PRs like rust-lang#111803 or rust-lang#107140
Let codegen decide when to `mem::swap` with immediates Making `libcore` decide this is silly; the backend has so much better information about when it's a good idea. Thus this PR introduces a new `typed_swap` intrinsic with a fallback body, and replaces that fallback implementation when swapping immediates or scalar pairs. r? oli-obk Replaces rust-lang#111744, and means we'll never need more libs PRs like rust-lang#111803 or rust-lang#107140
Let codegen decide when to `mem::swap` with immediates Making `libcore` decide this is silly; the backend has so much better information about when it's a good idea. Thus this PR introduces a new `typed_swap` intrinsic with a fallback body, and replaces that fallback implementation when swapping immediates or scalar pairs. r? oli-obk Replaces rust-lang#111744, and means we'll never need more libs PRs like rust-lang#111803 or rust-lang#107140
Let codegen decide when to `mem::swap` with immediates Making `libcore` decide this is silly; the backend has so much better information about when it's a good idea. Thus this PR introduces a new `typed_swap` intrinsic with a fallback body, and replaces that fallback implementation when swapping immediates or scalar pairs. r? oli-obk Replaces rust-lang#111744, and means we'll never need more libs PRs like rust-lang#111803 or rust-lang#107140
Let codegen decide when to `mem::swap` with immediates Making `libcore` decide this is silly; the backend has so much better information about when it's a good idea. Thus this PR introduces a new `typed_swap` intrinsic with a fallback body, and replaces that fallback implementation when swapping immediates or scalar pairs. r? oli-obk Replaces rust-lang#111744, and means we'll never need more libs PRs like rust-lang#111803 or rust-lang#107140
Let codegen decide when to `mem::swap` with immediates Making `libcore` decide this is silly; the backend has so much better information about when it's a good idea. Thus this PR introduces a new `typed_swap` intrinsic with a fallback body, and replaces that fallback implementation when swapping immediates or scalar pairs. r? oli-obk Replaces rust-lang#111744, and means we'll never need more libs PRs like rust-lang#111803 or rust-lang#107140
This allows move target- and backend-specific optmization from library code to codegen.
Also, this should make MIR-optimization simpler.
Main optimization implemented in this PR makes backend generate swap without using allocas
removing unneccessary memory writes and reads and reducing stack usage.
One of the main optimizations is using larger integer chunks for swapping in x86_64 by utilizing unaligned reads/writes. It reduces code size (especially for debug builds) and prevent cases of ineffective vectorizations like
load <4 x i8>
(LLVM doesn't vectorize it further despite vectorizingload i32
).Also added more tests.
rustc_codegen_cranelift
implementation is naive because bjorn3 promised to rewrite it later.rustc_codegen_gcc
uses same implementation viarustc_codegen_ssa
.Previous try: #98892