Emit LLVM lifetime intrinsics to improve stack usage and codegen in general #15863

dotdash · 2014-07-21T17:14:41Z

Lifetime intrinsics help to reduce stack usage, because LLVM can apply
stack coloring to reuse the stack slots of dead allocas for new ones.

For example these functions now both use the same amount of stack, while
previous bar() used five times as much as foo():

fn foo() {
  println("{}", 5);
}

fn bar() {
  println("{}", 5);
  println("{}", 5);
  println("{}", 5);
  println("{}", 5);
  println("{}", 5);
}

On top of that, LLVM can also optimize out certain operations when it
knows that memory is dead after a certain point. For example, it can
sometimes remove the zeroing used to cancel the drop glue. This is
possible when the glue drop itself was already removed because the
zeroing dominated the drop glue call. For example in:

pub fn bar(x: (Box<int>, int)) -> (Box<int>, int) {
    x
}

With optimizations, this currently results in:

define void @_ZN3bar20h330fa42547df8179niaE({ i64*, i64 }* noalias nocapture nonnull sret, { i64*, i64 }* noalias nocapture nonnull) unnamed_addr #0 {
"_ZN29_$LP$Box$LT$int$GT$$C$int$RP$39glue_drop.$x22glue_drop$x22$LP$1347$RP$17h88cf42702e5a322aE.exit":
  %2 = bitcast { i64*, i64 }* %1 to i8*
  %3 = bitcast { i64*, i64 }* %0 to i8*
  tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %3, i8* %2, i64 16, i32 8, i1 false)
  tail call void @llvm.memset.p0i8.i64(i8* %2, i8 0, i64 16, i32 8, i1 false)
  ret void
}

But with lifetime intrinsics we get:

define void @_ZN3bar20h330fa42547df8179niaE({ i64*, i64 }* noalias nocapture nonnull sret, { i64*, i64 }* noalias nocapture nonnull) unnamed_addr #0 {
"_ZN29_$LP$Box$LT$int$GT$$C$int$RP$39glue_drop.$x22glue_drop$x22$LP$1347$RP$17h88cf42702e5a322aE.exit":
  %2 = bitcast { i64*, i64 }* %1 to i8*
  %3 = bitcast { i64*, i64 }* %0 to i8*
  tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %3, i8* %2, i64 16, i32 8, i1 false)
  tail call void @llvm.lifetime.end(i64 16, i8* %2)
  ret void
}

Fixes #15665

alexcrichton · 2014-07-21T17:21:27Z

Amazing! 🍍

ghost · 2014-07-21T20:42:11Z

@dotdash Do you have the final numbers on the mem usage increase when building rustc?

dotdash · 2014-07-21T21:01:47Z

All I have right now is:

With 1654f08 (from PR #15871):
219.70user 1.02system 3:40.40elapsed 100%CPU (0avgtext+0avgdata 1738860maxresident)k

With df68c6f + PR #15871 + this PR:
232.58user 1.28system 3:55.48elapsed 99%CPU (0avgtext+0avgdata 1920924maxresident)k

dotdash · 2014-07-21T21:25:12Z

Emitting lifetime end intrinsics in unwind paths as well seems to give even better stack usage (I accidently overwrote the previous rustc.s so, I can't tell for sure just now), but we get:

251.14user 1.44system 4:14.47elapsed 99%CPU (0avgtext+0avgdata 2153380maxresident)k

…eneral Lifetime intrinsics help to reduce stack usage, because LLVM can apply stack coloring to reuse the stack slots of dead allocas for new ones. For example these functions now both use the same amount of stack, while previous `bar()` used five times as much as `foo()`: ````rust fn foo() { println("{}", 5); } fn bar() { println("{}", 5); println("{}", 5); println("{}", 5); println("{}", 5); println("{}", 5); } ```` On top of that, LLVM can also optimize out certain operations when it knows that memory is dead after a certain point. For example, it can sometimes remove the zeroing used to cancel the drop glue. This is possible when the glue drop itself was already removed because the zeroing dominated the drop glue call. For example in: ````rust pub fn bar(x: (Box<int>, int)) -> (Box<int>, int) { x } ```` With optimizations, this currently results in: ````llvm define void @_ZN3bar20h330fa42547df8179niaE({ i64*, i64 }* noalias nocapture nonnull sret, { i64*, i64 }* noalias nocapture nonnull) unnamed_addr #0 { "_ZN29_$LP$Box$LT$int$GT$$C$int$RP$39glue_drop.$x22glue_drop$x22$LP$1347$RP$17h88cf42702e5a322aE.exit": %2 = bitcast { i64*, i64 }* %1 to i8* %3 = bitcast { i64*, i64 }* %0 to i8* tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %3, i8* %2, i64 16, i32 8, i1 false) tail call void @llvm.memset.p0i8.i64(i8* %2, i8 0, i64 16, i32 8, i1 false) ret void } ```` But with lifetime intrinsics we get: ````llvm define void @_ZN3bar20h330fa42547df8179niaE({ i64*, i64 }* noalias nocapture nonnull sret, { i64*, i64 }* noalias nocapture nonnull) unnamed_addr #0 { "_ZN29_$LP$Box$LT$int$GT$$C$int$RP$39glue_drop.$x22glue_drop$x22$LP$1347$RP$17h88cf42702e5a322aE.exit": %2 = bitcast { i64*, i64 }* %1 to i8* %3 = bitcast { i64*, i64 }* %0 to i8* tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %3, i8* %2, i64 16, i32 8, i1 false) tail call void @llvm.lifetime.end(i64 16, i8* %2) ret void } ```` Fixes rust-lang#15665

dotdash · 2014-07-22T07:20:33Z

llsize_of returns a C_uint which is correct for memcpy etc., but not for the lifetime intrinsics which always want i64, thus the failure on x86. Fixed.

Lifetime intrinsics help to reduce stack usage, because LLVM can apply stack coloring to reuse the stack slots of dead allocas for new ones. For example these functions now both use the same amount of stack, while previous `bar()` used five times as much as `foo()`: ````rust fn foo() { println("{}", 5); } fn bar() { println("{}", 5); println("{}", 5); println("{}", 5); println("{}", 5); println("{}", 5); } ```` On top of that, LLVM can also optimize out certain operations when it knows that memory is dead after a certain point. For example, it can sometimes remove the zeroing used to cancel the drop glue. This is possible when the glue drop itself was already removed because the zeroing dominated the drop glue call. For example in: ````rust pub fn bar(x: (Box<int>, int)) -> (Box<int>, int) { x } ```` With optimizations, this currently results in: ````llvm define void @_ZN3bar20h330fa42547df8179niaE({ i64*, i64 }* noalias nocapture nonnull sret, { i64*, i64 }* noalias nocapture nonnull) unnamed_addr #0 { "_ZN29_$LP$Box$LT$int$GT$$C$int$RP$39glue_drop.$x22glue_drop$x22$LP$1347$RP$17h88cf42702e5a322aE.exit": %2 = bitcast { i64*, i64 }* %1 to i8* %3 = bitcast { i64*, i64 }* %0 to i8* tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %3, i8* %2, i64 16, i32 8, i1 false) tail call void @llvm.memset.p0i8.i64(i8* %2, i8 0, i64 16, i32 8, i1 false) ret void } ```` But with lifetime intrinsics we get: ````llvm define void @_ZN3bar20h330fa42547df8179niaE({ i64*, i64 }* noalias nocapture nonnull sret, { i64*, i64 }* noalias nocapture nonnull) unnamed_addr #0 { "_ZN29_$LP$Box$LT$int$GT$$C$int$RP$39glue_drop.$x22glue_drop$x22$LP$1347$RP$17h88cf42702e5a322aE.exit": %2 = bitcast { i64*, i64 }* %1 to i8* %3 = bitcast { i64*, i64 }* %0 to i8* tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %3, i8* %2, i64 16, i32 8, i1 false) tail call void @llvm.lifetime.end(i64 16, i8* %2) ret void } ```` Fixes #15665

nrc · 2014-09-12T03:46:31Z

@dotdash why do we not emit lifetime intrinsics in non-optimised builds? I would have thought we should leave that up to llvm.

thestinger · 2014-09-12T03:49:15Z

It adds a significant amount of IR so it will slow down builds. Clang does similar stuff like only outputting TBAA information when optimization is enabled.

dotdash · 2014-09-12T08:09:02Z

@nick29581 like @thestinger says, it's unnecessary overhead for non-optimised builds. The original proposal for those intrinsics also suggests this approach.

Since this increases the size of the IR, it would make sense for a front-end
to only generate this when in -O mode, not in -O0 mode.

http://nondot.org/sabre/LLVMNotes/MemoryUseMarkers.txt

…repo-mode, r=Veykril feature: Create `UnindexedProject` notification to be sent to the client (Note that this branch contains commits from rust-lang/rust-analyzer#15830, which I'll rebase atop of as needed.) Based on the discussion in rust-lang/rust-analyzer#15837, I've added a notification and off-by-default toggle to send that notification from `handle_did_open_text_document`. I'm happy to rename/tweak this as needed. I've been using this for a little bit, and it does seem to cause a little bit more indexing/work in rust-analyzer, but it's something that I'll profile as needed, I think.

bors closed this Jul 22, 2014

bors merged commit 92d1f15 into rust-lang:master Jul 22, 2014

huonw mentioned this pull request Aug 16, 2014

Rust miscompiles Servo after upgrading to new revision of Rust #16366

Closed

dotdash deleted the lifetimes3 branch February 4, 2015 12:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Emit LLVM lifetime intrinsics to improve stack usage and codegen in general #15863

Emit LLVM lifetime intrinsics to improve stack usage and codegen in general #15863

dotdash commented Jul 21, 2014

alexcrichton commented Jul 21, 2014

ghost commented Jul 21, 2014

dotdash commented Jul 21, 2014

dotdash commented Jul 21, 2014

dotdash commented Jul 22, 2014

nrc commented Sep 12, 2014

thestinger commented Sep 12, 2014

dotdash commented Sep 12, 2014

Emit LLVM lifetime intrinsics to improve stack usage and codegen in general #15863

Emit LLVM lifetime intrinsics to improve stack usage and codegen in general #15863

Conversation

dotdash commented Jul 21, 2014

alexcrichton commented Jul 21, 2014

ghost commented Jul 21, 2014

dotdash commented Jul 21, 2014

dotdash commented Jul 21, 2014

dotdash commented Jul 22, 2014

nrc commented Sep 12, 2014

thestinger commented Sep 12, 2014

dotdash commented Sep 12, 2014