Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate immediate and in-memory ScalarPair representation #118991

Merged
merged 3 commits into from
Jan 5, 2024

Conversation

nikic
Copy link
Contributor

@nikic nikic commented Dec 15, 2023

Currently, we assume that ScalarPair is always represented using a two-element struct, both as an immediate value and when stored in memory.

This currently works fairly well, but runs into problems with #116672, where a ScalarPair involving an i128 type can no longer be represented as a two-element struct in memory. For example, the tuple (i32, i128) needs to be represented in-memory as { i32, [3 x i32], i128 } to satisfy alignment requirements. Using { i32, i128 } instead will result in the second element being stored at the wrong offset (prior to LLVM 18).

Resolve this issue by no longer requiring that the immediate and in-memory type for ScalarPair are the same. The in-memory type will now look the same as for normal struct types (and will include padding filler and similar), while the immediate type stays a simple two-element struct type. This also means that booleans in immediate ScalarPair are now represented as i1 rather than i8, just like we do everywhere else.

The core change here is to llvm_type (which now treats ScalarPair as a normal struct) and immediate_llvm_type (which returns the two-element struct that llvm_type used to produce). The rest is fixing things up to no longer assume these are the same. In particular, this switches places that try to get pointers to the ScalarPair elements to use byte-geps instead of struct-geps.

Currently, we assume that ScalarPair is always represented using
a two-element struct, both as an immediate value and when stored
in memory.

This currently works fairly well, but runs into problems with
rust-lang#116672, where a ScalarPair
involving an i128 type can no longer be represented as a two-element
struct in memory. For example, the tuple `(i32, i128)` needs to be
represented in-memory as `{ i32, [3 x i32], i128 }` to satisfy
alignment requirement. Using `{ i32, i128 }` instead will result in
the second element being stored at the wrong offset (prior to
LLVM 18).

Resolve this issue by no longer requiring that the immediate and
in-memory type for ScalarPair are the same. The in-memory type
will now look the same as for normal struct types (and will include
padding filler and similar), while the immediate type stays a
simple two-element struct type. This also means that booleans in
immediate ScalarPair are now represented as i1 rather than i8,
just like we do everywhere else.

The core change here is to llvm_type (which now treats ScalarPair
as a normal struct) and immediate_llvm_type (which returns the
two-element struct that llvm_type used to produce). The rest is
fixing things up to no longer assume these are the same. In
particular, this switches places that try to get pointers to the
ScalarPair elements to use byte-geps instead of struct-geps.
@rustbot
Copy link
Collaborator

rustbot commented Dec 15, 2023

r? @b-naber

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Dec 15, 2023
@maurer
Copy link
Contributor

maurer commented Dec 15, 2023

I tested #116672 locally on top of this change, and the issue we were hitting before (building with built-in LLVM or LLVM-17 fails during stage 2) is resolved.

#[no_mangle]
pub fn pair_bool_bool(pair: (bool, bool)) -> (bool, bool) {
pair
}

// CHECK: define{{.*}}{ i8, i32 } @pair_bool_i32(i1 noundef zeroext %pair.0, i32 noundef %pair.1)
// CHECK: define{{.*}}{ i1, i32 } @pair_bool_i32(i1 noundef zeroext %pair.0, i32 noundef %pair.1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this is amazing! 🎉

Should save truncs in the uses that (since there's no trunc nuw) I've sometimes seem cause poor codegen.

@cuviper
Copy link
Member

cuviper commented Dec 15, 2023

Perf results are interesting in #116672 (comment), but we should test this on its own too...

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Dec 15, 2023
@bors
Copy link
Contributor

bors commented Dec 15, 2023

⌛ Trying commit c2fd26a with merge 8f49c16...

bors added a commit to rust-lang-ci/rust that referenced this pull request Dec 15, 2023
Separate immediate and in-memory ScalarPair representation

Currently, we assume that ScalarPair is always represented using a two-element struct, both as an immediate value and when stored in memory.

This currently works fairly well, but runs into problems with rust-lang#116672, where a ScalarPair involving an i128 type can no longer be represented as a two-element struct in memory. For example, the tuple `(i32, i128)` needs to be represented in-memory as `{ i32, [3 x i32], i128 }` to satisfy alignment requirements. Using `{ i32, i128 }` instead will result in the second element being stored at the wrong offset (prior to LLVM 18).

Resolve this issue by no longer requiring that the immediate and in-memory type for ScalarPair are the same. The in-memory type will now look the same as for normal struct types (and will include padding filler and similar), while the immediate type stays a simple two-element struct type. This also means that booleans in immediate ScalarPair are now represented as i1 rather than i8, just like we do everywhere else.

The core change here is to llvm_type (which now treats ScalarPair as a normal struct) and immediate_llvm_type (which returns the two-element struct that llvm_type used to produce). The rest is fixing things up to no longer assume these are the same. In particular, this switches places that try to get pointers to the ScalarPair elements to use byte-geps instead of struct-geps.
@bors
Copy link
Contributor

bors commented Dec 16, 2023

☀️ Try build successful - checks-actions
Build commit: 8f49c16 (8f49c1689d271405ae5c5994468ab01b777caa70)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (8f49c16): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.7% [0.7%, 0.8%] 5
Regressions ❌
(secondary)
0.4% [0.2%, 0.6%] 23
Improvements ✅
(primary)
-0.7% [-1.6%, -0.1%] 14
Improvements ✅
(secondary)
-0.3% [-0.3%, -0.3%] 1
All ❌✅ (primary) -0.4% [-1.6%, 0.8%] 19

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.9% [2.9%, 2.9%] 1
Regressions ❌
(secondary)
3.8% [3.0%, 4.3%] 3
Improvements ✅
(primary)
-2.7% [-2.7%, -2.7%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.1% [-2.7%, 2.9%] 2

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.7% [0.7%, 0.7%] 1
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.0% [-1.4%, -0.6%] 3
Improvements ✅
(secondary)
-2.3% [-2.3%, -2.3%] 1
All ❌✅ (primary) -0.6% [-1.4%, 0.7%] 4

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.2% [0.2%, 0.2%] 3
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.0% [-0.0%, -0.0%] 1
Improvements ✅
(secondary)
-1.3% [-1.3%, -1.3%] 1
All ❌✅ (primary) 0.1% [-0.0%, 0.2%] 4

Bootstrap: 671.894s -> 673.337s (0.21%)
Artifact size: 312.51 MiB -> 312.50 MiB (-0.00%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Dec 16, 2023
@lqd
Copy link
Member

lqd commented Dec 16, 2023

(Note: most of the small benchmarks with regressions here are currently being slightly noisy)

@b-naber
Copy link
Contributor

b-naber commented Dec 22, 2023

Don't think I understand all the intricacies here. Maybe r? @davidtwco ?

@rustbot rustbot assigned davidtwco and unassigned b-naber Dec 22, 2023
@@ -24,7 +24,7 @@ pub fn test() {
let _s = S;
// Check that the personality slot alloca gets a lifetime start in each cleanup block, not just
// in the first one.
// CHECK: [[SLOT:%[0-9]+]] = alloca { ptr, i32 }
// CHECK: [[SLOT:%[0-9]+]] = alloca { ptr, i32, [1 x i32] }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I don’t quite understand is why did this change? Presumably alignment? But this thing looks like it could also be 32-bit aligned on 32-bit achitectures in which case [1 x i32] only serves to increase the size?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is just because our struct generation makes all padding (including trailing padding) explicit. It makes no practical difference here (as long as Rust and LLVM data layout agree -- e.g. if the first element were an i128, then after the alignment change having an [3 x i32] at the end would be important, otherwise LLVM would create a too small allocation).

@@ -179,7 +179,10 @@ impl<'ll, 'tcx> IntrinsicCallMethods<'tcx> for Builder<'_, 'll, 'tcx> {
unsafe {
llvm::LLVMSetAlignment(load, align);
}
self.to_immediate(load, self.layout_of(tp_ty))
if !result.layout.is_zst() {
self.store(load, result.llval, result.align);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this change necessary? Is it because there might now be trailing padding where previously there wasn’t any?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The volatile_load implementation blindly loads the value using the in-memory type, so it can produce loads of array and (arbitrary) struct types. These are not really valid immediates as far as rustc is concerned (and non-canonical as far as LLVM is concerned).

The way this ends up getting handled is that if the value has Scalar ABI we convert it to an immediate, while everything else is left alone. After this change, this no longer works for ScalarPair ABI, which would need an adjustment.

What this change does it to never treat the value as an immediate in the first place, and just directly store it back in in-memory representation.

The implementation of volatile_load is really questionable in general (we really shouldn't be generating array/struct loads), but it's not really clear how it should be implemented given that we made the major design mistake of allowing volatile loads of arbitrary types, which is not a well-defined operation. We just leave it up to LLVM to interpret this in some way...


let mut load = |i, scalar: abi::Scalar, layout, align, offset| {
let llptr = self.struct_gep(pair_ty, place.llval, i as u64);
let llptr = if i == 0 {
place.llval
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, this if seems like a code smell to me. Since there are just two calls to this closure (as far as I can tell) and this if is the sole use of the i argument, perhaps consider constructing the llptr in the caller and passing it into the closure instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The i argument is also used to determine llty.

I've tried moving this into the caller, but this ends up being rather ugly, especially as self is captured by the closure, so we can't use it outside without further changes.

compiler/rustc_codegen_llvm/src/type_of.rs Outdated Show resolved Hide resolved
compiler/rustc_codegen_ssa/src/mir/place.rs Show resolved Hide resolved
@nagisa
Copy link
Member

nagisa commented Jan 2, 2024

@bors r+

@bors
Copy link
Contributor

bors commented Jan 2, 2024

📌 Commit 8e64fc9 has been approved by nagisa

It is now in the queue for this repository.

@bors
Copy link
Contributor

bors commented Jan 2, 2024

🌲 The tree is currently closed for pull requests below priority 100. This pull request will be tested once the tree is reopened.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 2, 2024
@bors
Copy link
Contributor

bors commented Jan 5, 2024

⌛ Testing commit 8e64fc9 with merge 8143a57...

bors added a commit to rust-lang-ci/rust that referenced this pull request Jan 5, 2024
Separate immediate and in-memory ScalarPair representation

Currently, we assume that ScalarPair is always represented using a two-element struct, both as an immediate value and when stored in memory.

This currently works fairly well, but runs into problems with rust-lang#116672, where a ScalarPair involving an i128 type can no longer be represented as a two-element struct in memory. For example, the tuple `(i32, i128)` needs to be represented in-memory as `{ i32, [3 x i32], i128 }` to satisfy alignment requirements. Using `{ i32, i128 }` instead will result in the second element being stored at the wrong offset (prior to LLVM 18).

Resolve this issue by no longer requiring that the immediate and in-memory type for ScalarPair are the same. The in-memory type will now look the same as for normal struct types (and will include padding filler and similar), while the immediate type stays a simple two-element struct type. This also means that booleans in immediate ScalarPair are now represented as i1 rather than i8, just like we do everywhere else.

The core change here is to llvm_type (which now treats ScalarPair as a normal struct) and immediate_llvm_type (which returns the two-element struct that llvm_type used to produce). The rest is fixing things up to no longer assume these are the same. In particular, this switches places that try to get pointers to the ScalarPair elements to use byte-geps instead of struct-geps.
@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Jan 5, 2024

💔 Test failed - checks-actions

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Jan 5, 2024
@nikic
Copy link
Contributor Author

nikic commented Jan 5, 2024

@bors r=nagisa

@bors
Copy link
Contributor

bors commented Jan 5, 2024

📌 Commit 3cd6cde has been approved by nagisa

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 5, 2024
@bors
Copy link
Contributor

bors commented Jan 5, 2024

⌛ Testing commit 3cd6cde with merge 432fffa...

@bors
Copy link
Contributor

bors commented Jan 5, 2024

☀️ Test successful - checks-actions
Approved by: nagisa
Pushing 432fffa to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Jan 5, 2024
@bors bors merged commit 432fffa into rust-lang:master Jan 5, 2024
12 checks passed
@rustbot rustbot added this to the 1.77.0 milestone Jan 5, 2024
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (432fffa): comparison URL.

Overall result: ✅ improvements - no action needed

@rustbot label: -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.6% [0.6%, 0.6%] 1
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.0% [-1.5%, -0.6%] 13
Improvements ✅
(secondary)
-0.2% [-0.2%, -0.2%] 1
All ❌✅ (primary) -0.9% [-1.5%, 0.6%] 14

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-3.7% [-3.7%, -3.7%] 1
Improvements ✅
(secondary)
-1.3% [-1.3%, -1.3%] 2
All ❌✅ (primary) -3.7% [-3.7%, -3.7%] 1

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
7.7% [7.3%, 8.1%] 5
Improvements ✅
(primary)
-1.2% [-1.3%, -1.1%] 2
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -1.2% [-1.3%, -1.1%] 2

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.2% [0.2%, 0.2%] 3
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.0% [-0.0%, -0.0%] 4
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.1% [-0.0%, 0.2%] 7

Bootstrap: 670.411s -> 668.199s (-0.33%)
Artifact size: 311.14 MiB -> 311.12 MiB (-0.01%)

@rustbot rustbot removed the perf-regression Performance regression. label Jan 5, 2024
LegNeato pushed a commit to LegNeato/rust-gpu that referenced this pull request Sep 17, 2024
Firestar99 pushed a commit to Firestar99/rust-gpu-new that referenced this pull request Sep 19, 2024
LegNeato pushed a commit to Rust-GPU/rust-gpu that referenced this pull request Sep 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.