Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emit LLVM lifetime intrinsics to improve stack usage and codegen in general #15863

Merged
merged 1 commit into from
Jul 22, 2014

Conversation

dotdash
Copy link
Contributor

@dotdash dotdash commented Jul 21, 2014

Lifetime intrinsics help to reduce stack usage, because LLVM can apply
stack coloring to reuse the stack slots of dead allocas for new ones.

For example these functions now both use the same amount of stack, while
previous bar() used five times as much as foo():

fn foo() {
  println("{}", 5);
}

fn bar() {
  println("{}", 5);
  println("{}", 5);
  println("{}", 5);
  println("{}", 5);
  println("{}", 5);
}

On top of that, LLVM can also optimize out certain operations when it
knows that memory is dead after a certain point. For example, it can
sometimes remove the zeroing used to cancel the drop glue. This is
possible when the glue drop itself was already removed because the
zeroing dominated the drop glue call. For example in:

pub fn bar(x: (Box<int>, int)) -> (Box<int>, int) {
    x
}

With optimizations, this currently results in:

define void @_ZN3bar20h330fa42547df8179niaE({ i64*, i64 }* noalias nocapture nonnull sret, { i64*, i64 }* noalias nocapture nonnull) unnamed_addr #0 {
"_ZN29_$LP$Box$LT$int$GT$$C$int$RP$39glue_drop.$x22glue_drop$x22$LP$1347$RP$17h88cf42702e5a322aE.exit":
  %2 = bitcast { i64*, i64 }* %1 to i8*
  %3 = bitcast { i64*, i64 }* %0 to i8*
  tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %3, i8* %2, i64 16, i32 8, i1 false)
  tail call void @llvm.memset.p0i8.i64(i8* %2, i8 0, i64 16, i32 8, i1 false)
  ret void
}

But with lifetime intrinsics we get:

define void @_ZN3bar20h330fa42547df8179niaE({ i64*, i64 }* noalias nocapture nonnull sret, { i64*, i64 }* noalias nocapture nonnull) unnamed_addr #0 {
"_ZN29_$LP$Box$LT$int$GT$$C$int$RP$39glue_drop.$x22glue_drop$x22$LP$1347$RP$17h88cf42702e5a322aE.exit":
  %2 = bitcast { i64*, i64 }* %1 to i8*
  %3 = bitcast { i64*, i64 }* %0 to i8*
  tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %3, i8* %2, i64 16, i32 8, i1 false)
  tail call void @llvm.lifetime.end(i64 16, i8* %2)
  ret void
}

Fixes #15665

@alexcrichton
Copy link
Member

Amazing! 🍍

@ghost
Copy link

ghost commented Jul 21, 2014

@dotdash Do you have the final numbers on the mem usage increase when building rustc?

@dotdash
Copy link
Contributor Author

dotdash commented Jul 21, 2014

All I have right now is:

With 1654f08 (from PR #15871):
219.70user 1.02system 3:40.40elapsed 100%CPU (0avgtext+0avgdata 1738860maxresident)k

With df68c6f + PR #15871 + this PR:
232.58user 1.28system 3:55.48elapsed 99%CPU (0avgtext+0avgdata 1920924maxresident)k

@dotdash
Copy link
Contributor Author

dotdash commented Jul 21, 2014

Emitting lifetime end intrinsics in unwind paths as well seems to give even better stack usage (I accidently overwrote the previous rustc.s so, I can't tell for sure just now), but we get:

251.14user 1.44system 4:14.47elapsed 99%CPU (0avgtext+0avgdata 2153380maxresident)k

…eneral

Lifetime intrinsics help to reduce stack usage, because LLVM can apply
stack coloring to reuse the stack slots of dead allocas for new ones.

For example these functions now both use the same amount of stack, while
previous `bar()` used five times as much as `foo()`:

````rust
fn foo() {
  println("{}", 5);
}

fn bar() {
  println("{}", 5);
  println("{}", 5);
  println("{}", 5);
  println("{}", 5);
  println("{}", 5);
}
````

On top of that, LLVM can also optimize out certain operations when it
knows that memory is dead after a certain point. For example, it can
sometimes remove the zeroing used to cancel the drop glue. This is
possible when the glue drop itself was already removed because the
zeroing dominated the drop glue call. For example in:

````rust
pub fn bar(x: (Box<int>, int)) -> (Box<int>, int) {
    x
}
````

With optimizations, this currently results in:

````llvm
define void @_ZN3bar20h330fa42547df8179niaE({ i64*, i64 }* noalias nocapture nonnull sret, { i64*, i64 }* noalias nocapture nonnull) unnamed_addr #0 {
"_ZN29_$LP$Box$LT$int$GT$$C$int$RP$39glue_drop.$x22glue_drop$x22$LP$1347$RP$17h88cf42702e5a322aE.exit":
  %2 = bitcast { i64*, i64 }* %1 to i8*
  %3 = bitcast { i64*, i64 }* %0 to i8*
  tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %3, i8* %2, i64 16, i32 8, i1 false)
  tail call void @llvm.memset.p0i8.i64(i8* %2, i8 0, i64 16, i32 8, i1 false)
  ret void
}
````

But with lifetime intrinsics we get:

````llvm
define void @_ZN3bar20h330fa42547df8179niaE({ i64*, i64 }* noalias nocapture nonnull sret, { i64*, i64 }* noalias nocapture nonnull) unnamed_addr #0 {
"_ZN29_$LP$Box$LT$int$GT$$C$int$RP$39glue_drop.$x22glue_drop$x22$LP$1347$RP$17h88cf42702e5a322aE.exit":
  %2 = bitcast { i64*, i64 }* %1 to i8*
  %3 = bitcast { i64*, i64 }* %0 to i8*
  tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %3, i8* %2, i64 16, i32 8, i1 false)
  tail call void @llvm.lifetime.end(i64 16, i8* %2)
  ret void
}
````

Fixes rust-lang#15665
@dotdash
Copy link
Contributor Author

dotdash commented Jul 22, 2014

llsize_of returns a C_uint which is correct for memcpy etc., but not for the lifetime intrinsics which always want i64, thus the failure on x86. Fixed.

bors added a commit that referenced this pull request Jul 22, 2014
Lifetime intrinsics help to reduce stack usage, because LLVM can apply
stack coloring to reuse the stack slots of dead allocas for new ones.

For example these functions now both use the same amount of stack, while
previous `bar()` used five times as much as `foo()`:

````rust
fn foo() {
  println("{}", 5);
}

fn bar() {
  println("{}", 5);
  println("{}", 5);
  println("{}", 5);
  println("{}", 5);
  println("{}", 5);
}
````

On top of that, LLVM can also optimize out certain operations when it
knows that memory is dead after a certain point. For example, it can
sometimes remove the zeroing used to cancel the drop glue. This is
possible when the glue drop itself was already removed because the
zeroing dominated the drop glue call. For example in:

````rust
pub fn bar(x: (Box<int>, int)) -> (Box<int>, int) {
    x
}
````

With optimizations, this currently results in:

````llvm
define void @_ZN3bar20h330fa42547df8179niaE({ i64*, i64 }* noalias nocapture nonnull sret, { i64*, i64 }* noalias nocapture nonnull) unnamed_addr #0 {
"_ZN29_$LP$Box$LT$int$GT$$C$int$RP$39glue_drop.$x22glue_drop$x22$LP$1347$RP$17h88cf42702e5a322aE.exit":
  %2 = bitcast { i64*, i64 }* %1 to i8*
  %3 = bitcast { i64*, i64 }* %0 to i8*
  tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %3, i8* %2, i64 16, i32 8, i1 false)
  tail call void @llvm.memset.p0i8.i64(i8* %2, i8 0, i64 16, i32 8, i1 false)
  ret void
}
````

But with lifetime intrinsics we get:

````llvm
define void @_ZN3bar20h330fa42547df8179niaE({ i64*, i64 }* noalias nocapture nonnull sret, { i64*, i64 }* noalias nocapture nonnull) unnamed_addr #0 {
"_ZN29_$LP$Box$LT$int$GT$$C$int$RP$39glue_drop.$x22glue_drop$x22$LP$1347$RP$17h88cf42702e5a322aE.exit":
  %2 = bitcast { i64*, i64 }* %1 to i8*
  %3 = bitcast { i64*, i64 }* %0 to i8*
  tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %3, i8* %2, i64 16, i32 8, i1 false)
  tail call void @llvm.lifetime.end(i64 16, i8* %2)
  ret void
}
````

Fixes #15665
@bors bors closed this Jul 22, 2014
@bors bors merged commit 92d1f15 into rust-lang:master Jul 22, 2014
@nrc
Copy link
Member

nrc commented Sep 12, 2014

@dotdash why do we not emit lifetime intrinsics in non-optimised builds? I would have thought we should leave that up to llvm.

@thestinger
Copy link
Contributor

It adds a significant amount of IR so it will slow down builds. Clang does similar stuff like only outputting TBAA information when optimization is enabled.

@dotdash
Copy link
Contributor Author

dotdash commented Sep 12, 2014

@nick29581 like @thestinger says, it's unnecessary overhead for non-optimised builds. The original proposal for those intrinsics also suggests this approach.

Since this increases the size of the IR, it would make sense for a front-end
to only generate this when in -O mode, not in -O0 mode.

http://nondot.org/sabre/LLVMNotes/MemoryUseMarkers.txt

@dotdash dotdash deleted the lifetimes3 branch February 4, 2015 12:41
matthiaskrgr pushed a commit to matthiaskrgr/rust that referenced this pull request Feb 11, 2024
…repo-mode, r=Veykril

feature: Create `UnindexedProject` notification to be sent to the client

(Note that this branch contains commits from rust-lang/rust-analyzer#15830, which I'll rebase atop of as needed.)

Based on the discussion in rust-lang/rust-analyzer#15837, I've added a notification and off-by-default toggle to send that notification from `handle_did_open_text_document`. I'm happy to rename/tweak this as needed.

I've been using this for a little bit, and it does seem to cause a little bit more indexing/work in rust-analyzer, but it's something that I'll profile as needed, I think.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

rustc should use llvm.lifetime intrinsics to reduce stack space
5 participants