-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Separate immediate and in-memory ScalarPair representation #118991
Conversation
Currently, we assume that ScalarPair is always represented using a two-element struct, both as an immediate value and when stored in memory. This currently works fairly well, but runs into problems with rust-lang#116672, where a ScalarPair involving an i128 type can no longer be represented as a two-element struct in memory. For example, the tuple `(i32, i128)` needs to be represented in-memory as `{ i32, [3 x i32], i128 }` to satisfy alignment requirement. Using `{ i32, i128 }` instead will result in the second element being stored at the wrong offset (prior to LLVM 18). Resolve this issue by no longer requiring that the immediate and in-memory type for ScalarPair are the same. The in-memory type will now look the same as for normal struct types (and will include padding filler and similar), while the immediate type stays a simple two-element struct type. This also means that booleans in immediate ScalarPair are now represented as i1 rather than i8, just like we do everywhere else. The core change here is to llvm_type (which now treats ScalarPair as a normal struct) and immediate_llvm_type (which returns the two-element struct that llvm_type used to produce). The rest is fixing things up to no longer assume these are the same. In particular, this switches places that try to get pointers to the ScalarPair elements to use byte-geps instead of struct-geps.
r? @b-naber (rustbot has picked a reviewer for you, use r? to override) |
I tested #116672 locally on top of this change, and the issue we were hitting before (building with built-in LLVM or LLVM-17 fails during stage 2) is resolved. |
#[no_mangle] | ||
pub fn pair_bool_bool(pair: (bool, bool)) -> (bool, bool) { | ||
pair | ||
} | ||
|
||
// CHECK: define{{.*}}{ i8, i32 } @pair_bool_i32(i1 noundef zeroext %pair.0, i32 noundef %pair.1) | ||
// CHECK: define{{.*}}{ i1, i32 } @pair_bool_i32(i1 noundef zeroext %pair.0, i32 noundef %pair.1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, this is amazing! 🎉
Should save trunc
s in the uses that (since there's no trunc nuw
) I've sometimes seem cause poor codegen.
Perf results are interesting in #116672 (comment), but we should test this on its own too... @bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Separate immediate and in-memory ScalarPair representation Currently, we assume that ScalarPair is always represented using a two-element struct, both as an immediate value and when stored in memory. This currently works fairly well, but runs into problems with rust-lang#116672, where a ScalarPair involving an i128 type can no longer be represented as a two-element struct in memory. For example, the tuple `(i32, i128)` needs to be represented in-memory as `{ i32, [3 x i32], i128 }` to satisfy alignment requirements. Using `{ i32, i128 }` instead will result in the second element being stored at the wrong offset (prior to LLVM 18). Resolve this issue by no longer requiring that the immediate and in-memory type for ScalarPair are the same. The in-memory type will now look the same as for normal struct types (and will include padding filler and similar), while the immediate type stays a simple two-element struct type. This also means that booleans in immediate ScalarPair are now represented as i1 rather than i8, just like we do everywhere else. The core change here is to llvm_type (which now treats ScalarPair as a normal struct) and immediate_llvm_type (which returns the two-element struct that llvm_type used to produce). The rest is fixing things up to no longer assume these are the same. In particular, this switches places that try to get pointers to the ScalarPair elements to use byte-geps instead of struct-geps.
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (8f49c16): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 671.894s -> 673.337s (0.21%) |
(Note: most of the small benchmarks with regressions here are currently being slightly noisy) |
Don't think I understand all the intricacies here. Maybe r? @davidtwco ? |
@@ -24,7 +24,7 @@ pub fn test() { | |||
let _s = S; | |||
// Check that the personality slot alloca gets a lifetime start in each cleanup block, not just | |||
// in the first one. | |||
// CHECK: [[SLOT:%[0-9]+]] = alloca { ptr, i32 } | |||
// CHECK: [[SLOT:%[0-9]+]] = alloca { ptr, i32, [1 x i32] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I don’t quite understand is why did this change? Presumably alignment? But this thing looks like it could also be 32-bit aligned on 32-bit achitectures in which case [1 x i32]
only serves to increase the size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is just because our struct generation makes all padding (including trailing padding) explicit. It makes no practical difference here (as long as Rust and LLVM data layout agree -- e.g. if the first element were an i128, then after the alignment change having an [3 x i32]
at the end would be important, otherwise LLVM would create a too small allocation).
@@ -179,7 +179,10 @@ impl<'ll, 'tcx> IntrinsicCallMethods<'tcx> for Builder<'_, 'll, 'tcx> { | |||
unsafe { | |||
llvm::LLVMSetAlignment(load, align); | |||
} | |||
self.to_immediate(load, self.layout_of(tp_ty)) | |||
if !result.layout.is_zst() { | |||
self.store(load, result.llval, result.align); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this change necessary? Is it because there might now be trailing padding where previously there wasn’t any?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The volatile_load implementation blindly loads the value using the in-memory type, so it can produce loads of array and (arbitrary) struct types. These are not really valid immediates as far as rustc is concerned (and non-canonical as far as LLVM is concerned).
The way this ends up getting handled is that if the value has Scalar ABI we convert it to an immediate, while everything else is left alone. After this change, this no longer works for ScalarPair ABI, which would need an adjustment.
What this change does it to never treat the value as an immediate in the first place, and just directly store it back in in-memory representation.
The implementation of volatile_load is really questionable in general (we really shouldn't be generating array/struct loads), but it's not really clear how it should be implemented given that we made the major design mistake of allowing volatile loads of arbitrary types, which is not a well-defined operation. We just leave it up to LLVM to interpret this in some way...
|
||
let mut load = |i, scalar: abi::Scalar, layout, align, offset| { | ||
let llptr = self.struct_gep(pair_ty, place.llval, i as u64); | ||
let llptr = if i == 0 { | ||
place.llval |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, this if
seems like a code smell to me. Since there are just two calls to this closure (as far as I can tell) and this if
is the sole use of the i
argument, perhaps consider constructing the llptr
in the caller and passing it into the closure instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The i
argument is also used to determine llty
.
I've tried moving this into the caller, but this ends up being rather ugly, especially as self
is captured by the closure, so we can't use it outside without further changes.
@bors r+ |
🌲 The tree is currently closed for pull requests below priority 100. This pull request will be tested once the tree is reopened. |
Separate immediate and in-memory ScalarPair representation Currently, we assume that ScalarPair is always represented using a two-element struct, both as an immediate value and when stored in memory. This currently works fairly well, but runs into problems with rust-lang#116672, where a ScalarPair involving an i128 type can no longer be represented as a two-element struct in memory. For example, the tuple `(i32, i128)` needs to be represented in-memory as `{ i32, [3 x i32], i128 }` to satisfy alignment requirements. Using `{ i32, i128 }` instead will result in the second element being stored at the wrong offset (prior to LLVM 18). Resolve this issue by no longer requiring that the immediate and in-memory type for ScalarPair are the same. The in-memory type will now look the same as for normal struct types (and will include padding filler and similar), while the immediate type stays a simple two-element struct type. This also means that booleans in immediate ScalarPair are now represented as i1 rather than i8, just like we do everywhere else. The core change here is to llvm_type (which now treats ScalarPair as a normal struct) and immediate_llvm_type (which returns the two-element struct that llvm_type used to produce). The rest is fixing things up to no longer assume these are the same. In particular, this switches places that try to get pointers to the ScalarPair elements to use byte-geps instead of struct-geps.
This comment has been minimized.
This comment has been minimized.
💔 Test failed - checks-actions |
@bors r=nagisa |
☀️ Test successful - checks-actions |
Finished benchmarking commit (432fffa): comparison URL. Overall result: ✅ improvements - no action needed@rustbot label: -perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 670.411s -> 668.199s (-0.33%) |
Currently, we assume that ScalarPair is always represented using a two-element struct, both as an immediate value and when stored in memory.
This currently works fairly well, but runs into problems with #116672, where a ScalarPair involving an i128 type can no longer be represented as a two-element struct in memory. For example, the tuple
(i32, i128)
needs to be represented in-memory as{ i32, [3 x i32], i128 }
to satisfy alignment requirements. Using{ i32, i128 }
instead will result in the second element being stored at the wrong offset (prior to LLVM 18).Resolve this issue by no longer requiring that the immediate and in-memory type for ScalarPair are the same. The in-memory type will now look the same as for normal struct types (and will include padding filler and similar), while the immediate type stays a simple two-element struct type. This also means that booleans in immediate ScalarPair are now represented as i1 rather than i8, just like we do everywhere else.
The core change here is to llvm_type (which now treats ScalarPair as a normal struct) and immediate_llvm_type (which returns the two-element struct that llvm_type used to produce). The rest is fixing things up to no longer assume these are the same. In particular, this switches places that try to get pointers to the ScalarPair elements to use byte-geps instead of struct-geps.