-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rustc should use llvm.lifetime intrinsics to reduce stack space #15665
Comments
cc @dotdash |
1 task
Started working on this again. |
dotdash
added a commit
to dotdash/rust
that referenced
this issue
Jul 22, 2014
…eneral Lifetime intrinsics help to reduce stack usage, because LLVM can apply stack coloring to reuse the stack slots of dead allocas for new ones. For example these functions now both use the same amount of stack, while previous `bar()` used five times as much as `foo()`: ````rust fn foo() { println("{}", 5); } fn bar() { println("{}", 5); println("{}", 5); println("{}", 5); println("{}", 5); println("{}", 5); } ```` On top of that, LLVM can also optimize out certain operations when it knows that memory is dead after a certain point. For example, it can sometimes remove the zeroing used to cancel the drop glue. This is possible when the glue drop itself was already removed because the zeroing dominated the drop glue call. For example in: ````rust pub fn bar(x: (Box<int>, int)) -> (Box<int>, int) { x } ```` With optimizations, this currently results in: ````llvm define void @_ZN3bar20h330fa42547df8179niaE({ i64*, i64 }* noalias nocapture nonnull sret, { i64*, i64 }* noalias nocapture nonnull) unnamed_addr #0 { "_ZN29_$LP$Box$LT$int$GT$$C$int$RP$39glue_drop.$x22glue_drop$x22$LP$1347$RP$17h88cf42702e5a322aE.exit": %2 = bitcast { i64*, i64 }* %1 to i8* %3 = bitcast { i64*, i64 }* %0 to i8* tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %3, i8* %2, i64 16, i32 8, i1 false) tail call void @llvm.memset.p0i8.i64(i8* %2, i8 0, i64 16, i32 8, i1 false) ret void } ```` But with lifetime intrinsics we get: ````llvm define void @_ZN3bar20h330fa42547df8179niaE({ i64*, i64 }* noalias nocapture nonnull sret, { i64*, i64 }* noalias nocapture nonnull) unnamed_addr #0 { "_ZN29_$LP$Box$LT$int$GT$$C$int$RP$39glue_drop.$x22glue_drop$x22$LP$1347$RP$17h88cf42702e5a322aE.exit": %2 = bitcast { i64*, i64 }* %1 to i8* %3 = bitcast { i64*, i64 }* %0 to i8* tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %3, i8* %2, i64 16, i32 8, i1 false) tail call void @llvm.lifetime.end(i64 16, i8* %2) ret void } ```` Fixes rust-lang#15665
bors
added a commit
that referenced
this issue
Jul 22, 2014
Lifetime intrinsics help to reduce stack usage, because LLVM can apply stack coloring to reuse the stack slots of dead allocas for new ones. For example these functions now both use the same amount of stack, while previous `bar()` used five times as much as `foo()`: ````rust fn foo() { println("{}", 5); } fn bar() { println("{}", 5); println("{}", 5); println("{}", 5); println("{}", 5); println("{}", 5); } ```` On top of that, LLVM can also optimize out certain operations when it knows that memory is dead after a certain point. For example, it can sometimes remove the zeroing used to cancel the drop glue. This is possible when the glue drop itself was already removed because the zeroing dominated the drop glue call. For example in: ````rust pub fn bar(x: (Box<int>, int)) -> (Box<int>, int) { x } ```` With optimizations, this currently results in: ````llvm define void @_ZN3bar20h330fa42547df8179niaE({ i64*, i64 }* noalias nocapture nonnull sret, { i64*, i64 }* noalias nocapture nonnull) unnamed_addr #0 { "_ZN29_$LP$Box$LT$int$GT$$C$int$RP$39glue_drop.$x22glue_drop$x22$LP$1347$RP$17h88cf42702e5a322aE.exit": %2 = bitcast { i64*, i64 }* %1 to i8* %3 = bitcast { i64*, i64 }* %0 to i8* tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %3, i8* %2, i64 16, i32 8, i1 false) tail call void @llvm.memset.p0i8.i64(i8* %2, i8 0, i64 16, i32 8, i1 false) ret void } ```` But with lifetime intrinsics we get: ````llvm define void @_ZN3bar20h330fa42547df8179niaE({ i64*, i64 }* noalias nocapture nonnull sret, { i64*, i64 }* noalias nocapture nonnull) unnamed_addr #0 { "_ZN29_$LP$Box$LT$int$GT$$C$int$RP$39glue_drop.$x22glue_drop$x22$LP$1347$RP$17h88cf42702e5a322aE.exit": %2 = bitcast { i64*, i64 }* %1 to i8* %3 = bitcast { i64*, i64 }* %0 to i8* tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %3, i8* %2, i64 16, i32 8, i1 false) tail call void @llvm.lifetime.end(i64 16, i8* %2) ret void } ```` Fixes #15665
bors
added a commit
that referenced
this issue
Jul 24, 2014
The allocas used in match expression currently don't get good lifetime markers, in fact they only get lifetime start markers, because their lifetimes don't match to cleanup scopes. While the bindings themselves are bog standard and just need a matching pair of start and end markers, they might need them twice, once for a guard clause and once for the match body. The __llmatch alloca OTOH needs a single lifetime start marker, but when there's a guard clause, it needs two end markers, because its lifetime ends either when the guard doesn't match or after the match body. With these intrinsics in place, LLVM can now, for example, optimize code like this: ````rust enum E { A1(int), A2(int), A3(int), A4(int), } pub fn variants(x: E) { match x { A1(m) => bar(&m), A2(m) => bar(&m), A3(m) => bar(&m), A4(m) => bar(&m), } } ```` To a single call to bar, using only a single stack slot. It still fails to eliminate some of checks. ````gas .Ltmp5: .cfi_def_cfa_offset 16 movb (%rdi), %al testb %al, %al je .LBB3_5 movzbl %al, %eax cmpl $1, %eax je .LBB3_5 cmpl $2, %eax .LBB3_5: movq 8(%rdi), %rax movq %rax, (%rsp) leaq (%rsp), %rdi callq _ZN3bar20hcb7a0d8be8e17e37daaE@PLT popq %rax retq ```` Refs #15665
bors
added a commit
to rust-lang-ci/rust
that referenced
this issue
Nov 13, 2023
…cola internal: De-`unwrap` `generate_function.rs` Fixes rust-lang/rust-analyzer#15398 (comment) cc `@Inicola`
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
It recently came up on dev-servo that Rust-compiled code is using an egregious amount of stack space. One possible solution that has been brought up is that we should use
llvm.lifetime.start
andllvm.lifetime.end
intrinsics to reduce the amount of stack space that LLVM uses for our functions: http://llvm.org/docs/LangRef.html#llvm-lifetime-start-intrinsic.An incomplete attempt at this has been made in an earlier PR: #12004, and the numbers there were promising.
The text was updated successfully, but these errors were encountered: