Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MIR: Bootstrap up to a half-hour longer with --enable-orbit #33111

Closed
alexcrichton opened this issue Apr 20, 2016 · 7 comments
Closed

MIR: Bootstrap up to a half-hour longer with --enable-orbit #33111

alexcrichton opened this issue Apr 20, 2016 · 7 comments
Labels
A-MIR Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html I-compiletime Issue: Problems and improvements with respect to compile times.

Comments

@alexcrichton
Copy link
Member

I just took a look at our list of bots and was surprised that the builder lagging behind was the auto-win-msvc-64-opt-mir builder. Investigating it looks like a MIR bootstrap (--enable-orbit) is taking about a half hour longer than a normal bootstrap. For example compare:

In the normal case a bootstrap took 58m and the MIR case took 1h22m. The difference is not always quite as drastic, but historically the normal builders are consistently faster than MIR. The discrepancy also exists on the Linux MIR builders, although it's not quite as large as the Windows machines (~15m slower).

cc @eddyb, @nagisa

@alexcrichton alexcrichton added the A-MIR Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html label Apr 20, 2016
@nagisa
Copy link
Member

nagisa commented Apr 20, 2016

Not surprising, honestly. We still generate worse LLVM-IR (lack of cheap optimising/simplifying transformations) compared to the regular trans, so despite our translation taking a small fraction of the time old trans takes, the LLVM recoups doubly.

@nagisa nagisa added the I-slow Issue: Problems and improvements with respect to performance of generated code. label Apr 20, 2016
@Aatch
Copy link
Contributor

Aatch commented Apr 21, 2016

I checked against libsyntax, since it's a nice large crate that LLVM can chew through.

LLVM takes ~50% longer using the MIR trans path compared to the normal one to optimise. Translating to LLVM is actually about the same between them though.

It looks like the main issue is that we're simply generating more IR than before. Some of the LLVM passes have moved up in terms of relative time. One that has shot up quite a lot is SROA. I have a feeling that our handling of function arguments, especially fat pointers, is probably a big part of this.

@sanxiyn sanxiyn added I-compiletime Issue: Problems and improvements with respect to compile times. and removed I-slow Issue: Problems and improvements with respect to performance of generated code. labels Apr 21, 2016
@sanxiyn
Copy link
Member

sanxiyn commented May 4, 2016

nickel seems to exhibit an extreme form of this. With nightly-2016-04-29 (which includes #32980), LLVM takes nearly 8x longer.

$ git clone https://github.com/nickel-org/nickel.rs.git
$ cd nickel.rs
$ git checkout 0.8.0
$ cargo build
$ cargo clean -p nickel
$ cargo rustc -- -Z time-passes | grep LLVM
time: 0.740; rss: 144MB LLVM passes
$ cargo clean -p nickel
$ cargo rustc -- -Z orbit -Z time-passes | grep LLVM
time: 5.738; rss: 146MB LLVM passes

@MagaTailor
Copy link

Apart from being slower, what's the reason for much larger code size it generates? (e.g. a newly bootstrapped lib directory weighs at 208M vs 180M normally)

@nagisa
Copy link
Member

nagisa commented May 4, 2016

For the same reasons it takes longer to compile. I believe that fixing one will fix another.

Then again, MIR already generates more correct code than the old trans in some corner cases. It should cost some bytes too.

@dotdash
Copy link
Contributor

dotdash commented May 10, 2016

@sanxiyn A major reason for the difference in nickel in particular is a C-like enum with about 800 variants and a derived PartialEq implementation. That leads to nested matches with about 800 arms each and a basic block for each match arm. So we get about 640k basic blocks in total. SimplifyCfg actually removes most of that going down to about 1.6k basic blocks, but before we enter trans, we break all critical edges and that blows it up to about the initial size again, making LLVM struggle.

dotdash added a commit to dotdash/rust that referenced this issue May 11, 2016
Currently, to prepare for MIR trans, we break _all_ critical edges,
although we only actually need to do this for edges originating from a
call that gets translated to an invoke instruction in LLVM.

This has the unfortunate effect of undoing a bunch of the things that
SimplifyCfg has done. A particularly bad case arises when you have a
C-like enum with N variants and a derived PartialEq implementation.

In that case, the match on the (&lhs, &rhs) tuple gets translated into
nested matches with N arms each and a basic block each, resulting in N²
basic blocks. SimplifyCfg reduces that to roughly 2*N basic blocks, but
breaking the critical edges means that we go back to N².

In nickel.rs, there is such an enum with roughly N=800. So we get about
640K basic blocks or 2.5M lines of LLVM IR. LLVM takes a while to
reduce that to the final "disr_a == disr_b".

So before this patch, we had 2.5M lines of IR with 640K basic blocks,
which took about about 3.6s in LLVM to get optimized and translated.
After this patch, we get about 650K lines with about 1.6K basic blocks
and spent a little less than 0.2s in LLVM.

cc rust-lang#33111
arielb1 pushed a commit to arielb1/rust that referenced this issue May 11, 2016
Currently, to prepare for MIR trans, we break _all_ critical edges,
although we only actually need to do this for edges originating from a
call that gets translated to an invoke instruction in LLVM.

This has the unfortunate effect of undoing a bunch of the things that
SimplifyCfg has done. A particularly bad case arises when you have a
C-like enum with N variants and a derived PartialEq implementation.

In that case, the match on the (&lhs, &rhs) tuple gets translated into
nested matches with N arms each and a basic block each, resulting in N²
basic blocks. SimplifyCfg reduces that to roughly 2*N basic blocks, but
breaking the critical edges means that we go back to N².

In nickel.rs, there is such an enum with roughly N=800. So we get about
640K basic blocks or 2.5M lines of LLVM IR. LLVM takes a while to
reduce that to the final "disr_a == disr_b".

So before this patch, we had 2.5M lines of IR with 640K basic blocks,
which took about about 3.6s in LLVM to get optimized and translated.
After this patch, we get about 650K lines with about 1.6K basic blocks
and spent a little less than 0.2s in LLVM.

cc rust-lang#33111
dotdash added a commit to dotdash/rust that referenced this issue May 11, 2016
Currently, all switches in MIR are exhausitive, meaning that we can have
a lot of arms that all go to the same basic block, the extreme case
being an if-let expression which results in just 2 possible cases, be
might end up with hundreds of arms for large enums.

To improve this situation and give LLVM less code to chew on, we can
detect whether there's a pre-dominant target basic block in a switch
and then promote this to be the default target, not translating the
corresponding arms at all.

In combination with rust-lang#33544 this makes unoptimized MIR trans of
nickel.rs as fast as using old trans and greatly improves the times for
optimized builds, which are only 30-40% slower instead of ~300%.

cc rust-lang#33111
Manishearth added a commit to Manishearth/rust that referenced this issue May 12, 2016
… r=Aatch

Only break critical edges where actually needed

Currently, to prepare for MIR trans, we break _all_ critical edges,
although we only actually need to do this for edges originating from a
call that gets translated to an invoke instruction in LLVM.

This has the unfortunate effect of undoing a bunch of the things that
SimplifyCfg has done. A particularly bad case arises when you have a
C-like enum with N variants and a derived PartialEq implementation.

In that case, the match on the (&lhs, &rhs) tuple gets translated into
nested matches with N arms each and a basic block each, resulting in N²
basic blocks. SimplifyCfg reduces that to roughly 2*N basic blocks, but
breaking the critical edges means that we go back to N².

In nickel.rs, there is such an enum with roughly N=800. So we get about
640K basic blocks or 2.5M lines of LLVM IR. LLVM takes a while to
reduce that to the final "disr_a == disr_b".

So before this patch, we had 2.5M lines of IR with 640K basic blocks,
which took about about 3.6s in LLVM to get optimized and translated.
After this patch, we get about 650K lines with about 1.6K basic blocks
and spent a little less than 0.2s in LLVM.

cc rust-lang#33111

r? @Aatch
Manishearth added a commit to Manishearth/rust that referenced this issue May 12, 2016
[MIR trans] Optimize trans for biased switches

Currently, all switches in MIR are exhausitive, meaning that we can have
a lot of arms that all go to the same basic block, the extreme case
being an if-let expression which results in just 2 possible cases, be
might end up with hundreds of arms for large enums.

To improve this situation and give LLVM less code to chew on, we can
detect whether there's a pre-dominant target basic block in a switch
and then promote this to be the default target, not translating the
corresponding arms at all.

In combination with rust-lang#33544 this makes unoptimized MIR trans of
nickel.rs as fast as using old trans and greatly improves the times for
optimized builds, which are only 30-40% slower instead of ~300%.

cc rust-lang#33111
eddyb added a commit to eddyb/rust that referenced this issue May 12, 2016
… r=Aatch

Only break critical edges where actually needed

Currently, to prepare for MIR trans, we break _all_ critical edges,
although we only actually need to do this for edges originating from a
call that gets translated to an invoke instruction in LLVM.

This has the unfortunate effect of undoing a bunch of the things that
SimplifyCfg has done. A particularly bad case arises when you have a
C-like enum with N variants and a derived PartialEq implementation.

In that case, the match on the (&lhs, &rhs) tuple gets translated into
nested matches with N arms each and a basic block each, resulting in N²
basic blocks. SimplifyCfg reduces that to roughly 2*N basic blocks, but
breaking the critical edges means that we go back to N².

In nickel.rs, there is such an enum with roughly N=800. So we get about
640K basic blocks or 2.5M lines of LLVM IR. LLVM takes a while to
reduce that to the final "disr_a == disr_b".

So before this patch, we had 2.5M lines of IR with 640K basic blocks,
which took about about 3.6s in LLVM to get optimized and translated.
After this patch, we get about 650K lines with about 1.6K basic blocks
and spent a little less than 0.2s in LLVM.

cc rust-lang#33111

r? @Aatch
eddyb added a commit to eddyb/rust that referenced this issue May 12, 2016
[MIR trans] Optimize trans for biased switches

Currently, all switches in MIR are exhausitive, meaning that we can have
a lot of arms that all go to the same basic block, the extreme case
being an if-let expression which results in just 2 possible cases, be
might end up with hundreds of arms for large enums.

To improve this situation and give LLVM less code to chew on, we can
detect whether there's a pre-dominant target basic block in a switch
and then promote this to be the default target, not translating the
corresponding arms at all.

In combination with rust-lang#33544 this makes unoptimized MIR trans of
nickel.rs as fast as using old trans and greatly improves the times for
optimized builds, which are only 30-40% slower instead of ~300%.

cc rust-lang#33111
eddyb added a commit to eddyb/rust that referenced this issue May 13, 2016
… r=Aatch

Only break critical edges where actually needed

Currently, to prepare for MIR trans, we break _all_ critical edges,
although we only actually need to do this for edges originating from a
call that gets translated to an invoke instruction in LLVM.

This has the unfortunate effect of undoing a bunch of the things that
SimplifyCfg has done. A particularly bad case arises when you have a
C-like enum with N variants and a derived PartialEq implementation.

In that case, the match on the (&lhs, &rhs) tuple gets translated into
nested matches with N arms each and a basic block each, resulting in N²
basic blocks. SimplifyCfg reduces that to roughly 2*N basic blocks, but
breaking the critical edges means that we go back to N².

In nickel.rs, there is such an enum with roughly N=800. So we get about
640K basic blocks or 2.5M lines of LLVM IR. LLVM takes a while to
reduce that to the final "disr_a == disr_b".

So before this patch, we had 2.5M lines of IR with 640K basic blocks,
which took about about 3.6s in LLVM to get optimized and translated.
After this patch, we get about 650K lines with about 1.6K basic blocks
and spent a little less than 0.2s in LLVM.

cc rust-lang#33111

r? @Aatch
eddyb added a commit to eddyb/rust that referenced this issue May 13, 2016
[MIR trans] Optimize trans for biased switches

Currently, all switches in MIR are exhausitive, meaning that we can have
a lot of arms that all go to the same basic block, the extreme case
being an if-let expression which results in just 2 possible cases, be
might end up with hundreds of arms for large enums.

To improve this situation and give LLVM less code to chew on, we can
detect whether there's a pre-dominant target basic block in a switch
and then promote this to be the default target, not translating the
corresponding arms at all.

In combination with rust-lang#33544 this makes unoptimized MIR trans of
nickel.rs as fast as using old trans and greatly improves the times for
optimized builds, which are only 30-40% slower instead of ~300%.

cc rust-lang#33111
Manishearth added a commit to Manishearth/rust that referenced this issue May 14, 2016
… r=Aatch

Only break critical edges where actually needed

Currently, to prepare for MIR trans, we break _all_ critical edges,
although we only actually need to do this for edges originating from a
call that gets translated to an invoke instruction in LLVM.

This has the unfortunate effect of undoing a bunch of the things that
SimplifyCfg has done. A particularly bad case arises when you have a
C-like enum with N variants and a derived PartialEq implementation.

In that case, the match on the (&lhs, &rhs) tuple gets translated into
nested matches with N arms each and a basic block each, resulting in N²
basic blocks. SimplifyCfg reduces that to roughly 2*N basic blocks, but
breaking the critical edges means that we go back to N².

In nickel.rs, there is such an enum with roughly N=800. So we get about
640K basic blocks or 2.5M lines of LLVM IR. LLVM takes a while to
reduce that to the final "disr_a == disr_b".

So before this patch, we had 2.5M lines of IR with 640K basic blocks,
which took about about 3.6s in LLVM to get optimized and translated.
After this patch, we get about 650K lines with about 1.6K basic blocks
and spent a little less than 0.2s in LLVM.

cc rust-lang#33111

r? @Aatch
Manishearth added a commit to Manishearth/rust that referenced this issue May 14, 2016
[MIR trans] Optimize trans for biased switches

Currently, all switches in MIR are exhausitive, meaning that we can have
a lot of arms that all go to the same basic block, the extreme case
being an if-let expression which results in just 2 possible cases, be
might end up with hundreds of arms for large enums.

To improve this situation and give LLVM less code to chew on, we can
detect whether there's a pre-dominant target basic block in a switch
and then promote this to be the default target, not translating the
corresponding arms at all.

In combination with rust-lang#33544 this makes unoptimized MIR trans of
nickel.rs as fast as using old trans and greatly improves the times for
optimized builds, which are only 30-40% slower instead of ~300%.

cc rust-lang#33111
@eddyb
Copy link
Member

eddyb commented Jun 8, 2016

Looking at the latest builds, MIR takes 7 minutes less on x64 linux and 45 minutes less on x64 MSVC.

@eddyb eddyb closed this as completed Jun 8, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-MIR Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html I-compiletime Issue: Problems and improvements with respect to compile times.
Projects
None yet
Development

No branches or pull requests

7 participants