-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop using LLVM struct types for alloca #122053
Conversation
Test changes will conflict with #122050. Additionally, I believe SROA still uses the alloca type in some cases, so this may cause perf regressions, and hence not be viable, until LLVM 19 (or later). |
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Stop using LLVM struct types for alloca The alloca type has no semantic meaning, only the size (and alignment, but we specify it explicitly) matter. Using `[N x i8]` is a more direct way to specify that we want `N` bytes, and avoids relying on LLVM's struct layout. It is likely that a future LLVM version will change to an untyped alloca representation. Split out from rust-lang#121577. r? `@ghost`
☀️ Try build successful - checks-actions |
1 similar comment
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (52e34ca): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)This benchmark run did not return any relevant results for this metric. CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 647.585s -> 646.111s (-0.23%) |
8515835
to
8536da4
Compare
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Stop using LLVM struct types for alloca The alloca type has no semantic meaning, only the size (and alignment, but we specify it explicitly) matter. Using `[N x i8]` is a more direct way to specify that we want `N` bytes, and avoids relying on LLVM's struct layout. It is likely that a future LLVM version will change to an untyped alloca representation. Split out from rust-lang#121577. r? `@ghost`
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (7a9b98b): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 671.201s -> 669.516s (-0.25%) |
Both perf runs look somewhat consistent. In terms of instruction count: Improvements to Regressions on a few other benchmarks--
that is, a bunch of inlining noise, and a real regression in Looking at cycle count, while Looking at the asm diff for mov rdx,QWORD PTR [rsi]
mov rdi,QWORD PTR [rsi+0xXXXX]
mov ecx,DWORD PTR [rsi+0xXXXX]
add ecx,edx
sub ecx,edi
+ shl r15,0x20
+ movzx r8d,r13b
+ shl r8d,0x18
+ or r8,r15
+ movzx r9d,r12b
+ shl r9d,0x10
+ or r9,r8
+ shl r14d,0x8
+ movzx r8d,r14w
+ or r8,r9
+ movzx r9d,bpl
+ or r9,r8
sub rdi,rdx
mov QWORD PTR [rsi+0xXXXX],rdi
- mov BYTE PTR [rax],r11b
- mov BYTE PTR [rax+0xXXXX],bpl
- mov BYTE PTR [rax+0xXXXX],r14b
- mov BYTE PTR [rax+0xXXXX],r15b
- mov DWORD PTR [rax+0xXXXX],r12d
+ mov QWORD PTR [rax],r9
mov DWORD PTR [rax+0xXXXX],ecx
add rsp,0x28
pop rbx This uses 7 more instructions, but it combines 5 stores into 1, which seems better. So, given that, and the lack of regressions in cycle count on those four benchmarks, I think this is not actually a regression, and we can go ahead with this change. |
Some changes occurred in compiler/rustc_codegen_gcc |
This comment has been minimized.
This comment has been minimized.
💔 Test failed - checks-actions |
Looks like there are two extra variants of the stack protector test for Windows, which will need the same updates as the Linux test. |
@bors r- |
@rustbot review |
@bors r+ |
☀️ Test successful - checks-actions |
Finished benchmarking commit (29a56a3): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDNext Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 675.097s -> 673.216s (-0.28%) |
Spot-checking a few of those instruction count regressions, e.g.
this looks like the same thing I investigated above (#122053 (comment)), where we merge 5 stores into 1 in So given the relative lack of cycle count changes (a small improvement on one benchmark, and a small regression on another benchmark), I think this change doesn't have any significant perf impact. |
Edit: Deleting comment, meant to leave it on another PR. |
Stop using LLVM struct types for alloca The alloca type has no semantic meaning, only the size (and alignment, but we specify it explicitly) matter. Using `[N x i8]` is a more direct way to specify that we want `N` bytes, and avoids relying on LLVM's struct layout. It is likely that a future LLVM version will change to an untyped alloca representation. Split out from rust-lang#121577. r? `@ghost`
The alloca type has no semantic meaning, only the size (and alignment, but we specify it explicitly) matter. Using
[N x i8]
is a more direct way to specify that we wantN
bytes, and avoids relying on LLVM's struct layout. It is likely that a future LLVM version will change to an untyped alloca representation.Split out from #121577.
r? @ghost