Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[experiment] Benchmark incremental ThinLTO'd compiler. #56678

Closed

Conversation

michaelwoerister
Copy link
Member

I figure we can hijack perf.rlo to get some numbers on the runtime performance of code compiled with incremental ThinLTO. To this end, we compile the compiler incrementally and then measure how much slower it is a building stuff.

@michaelwoerister
Copy link
Member Author

@bors try

@rust-highfive
Copy link
Collaborator

r? @alexcrichton

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Dec 10, 2018
@bors
Copy link
Contributor

bors commented Dec 10, 2018

⌛ Trying commit 208b9a7 with merge 5caf635...

bors added a commit that referenced this pull request Dec 10, 2018
[experiment] Benchmark incremental ThinLTO'd compiler.

I figure we can hijack perf.rlo to get some numbers on the runtime performance of code compiled with incremental ThinLTO. To this end, we compile the compiler incrementally and then measure how much slower it is a building stuff.
@rust-highfive
Copy link
Collaborator

The job dist-x86_64-linux of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
travis_fold:end:services

travis_fold:start:git.checkout
travis_time:start:004206c0
$ git clone --depth=2 --branch=try https://github.com/rust-lang/rust.git rust-lang/rust
---
[00:00:48] 
[00:00:48] Total download size: 4.9 M
[00:00:48] Downloading Packages:
[00:00:51] --------------------------------------------------------------------------------
[00:00:51] Total                                           1.8 MB/s | 4.9 MB     00:02     
[00:00:51] warning: rpmts_HdrFromFdno: Header V3 DSA signature: NOKEY, key ID e8562897
[00:00:51] Importing GPG key 0xE8562897 "CentOS-5 Key (CentOS 5 Official Signing Key) <centos-5-key@centos.org>" from /etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-5
[00:00:51] Running Transaction Test
[00:00:51] Finished Transaction Test
[00:00:51] Transaction Test Succeeded
[00:00:51] Running Transaction
---
[00:03:07] + hide_output make install
[00:03:07] + set +x
[00:03:25] shared.sh: line 11:   351 Terminated              bash -c "while true; do sleep 30; echo \$(date) - building ...; done"
[00:03:25] + cd ..
[00:03:25] + rm -rf openssl-1.0.2k
[00:03:25] ./build-openssl.sh: line 25:  4114 Terminated              bash -c "while true; do sleep 30; echo \$(date) - building ...; done"  (wd: /tmp/openssl-1.0.2k)
[00:03:25] + ln -nsf /etc/pki/tls/cert.pem /rustroot/ssl/
[00:03:26] Removing intermediate container e0a6287fa86f
[00:03:26] Step 14/41 : COPY dist-x86_64-linux/build-curl.sh /tmp/
[00:03:26]  ---> f2f711068964
[00:03:26] Step 15/41 : RUN ./build-curl.sh
[00:03:26] Step 15/41 : RUN ./build-curl.sh
[00:03:26]  ---> Running in 24cce2d607f7
[00:03:26] + source shared.sh
[00:03:26] + VERSION=7.51.0
[00:03:26] + curl http://cool.haxx.se/download/curl-7.51.0.tar.bz2
[00:03:28]   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
[00:03:28]                                  Dload  Upload   Total   Spent    Left  Speed
[00:03:29] 
  0 2509k    0 14215    0     0   9439      0  0:04:32  0:00:01  0:04:31  9439
  0 2509k    0 14215    0     0   9439      0  0:04:32  0:00:01  0:04:31  9439
  2 2509k    2 59559    0     0  33949      0  0:01:15  0:00:01  0:01:14  177k
100 2509k  100 2509k    0     0   939k      0  0:00:02  0:00:02 --:--:-- 2140k
[00:03:29] + mkdir curl-build
[00:03:29] + cd curl-build
[00:03:29] + hide_output ../curl-7.51.0/configure --prefix=/rustroot --with-ssl=/rustroot --disable-sspi --disable-gopher --disable-smtp --disable-smb --disable-imap --disable-pop3 --disable-tftp --disable-telnet --disable-manual --disable-dict --disable-rtsp --disable-ldaps --disable-ldap
[00:03:49] + hide_output make -j10
[00:03:49] + set +x
[00:04:02] shared.sh: line 11:    12 Terminated              bash -c "while true; do sleep 30; echo \$(date) - building ...; done"
[00:04:02] + hide_output make install
---
 98 82.1M   98 80.9M    0     0  1004k      0  0:01:23  0:01:22  0:00:01 1101k
 99 82.1M   99 82.0M    0     0  1006k      0  0:01:23  0:01:23 --:--:-- 1079k
100 82.1M  100 82.1M    0     0  1006k      0  0:01:23  0:01:23 --:--:-- 1181k
[00:08:56] + cd gcc-4.8.5
[00:08:56] + sed -i 's|ftp://gcc\.gnu\.org/|http://gcc.gnu.org/|g' ./contrib/download_prerequisites
[00:08:56] --2018-12-10 15:30:50--  http://gcc.gnu.org/pub/gcc/infrastructure/mpfr-2.4.2.tar.bz2
[00:08:56] Resolving gcc.gnu.org... 209.132.180.131
[00:08:56] Connecting to gcc.gnu.org|209.132.180.131|:80... connected.
[00:08:56] HTTP request sent, awaiting response... 200 OK
---
 29 27.0M   29 8128k    0     0  6610k      0  0:00:04  0:00:01  0:00:03 6762k
 61 27.0M   61 16.4M    0     0  7572k      0  0:00:03  0:00:02  0:00:01 7665k
 89 27.0M   89 24.0M    0     0  7636k      0  0:00:03  0:00:03 --:--:-- 7701k
100 27.0M  100 27.0M    0     0  7873k      0  0:00:03  0:00:03 --:--:-- 7937k
[00:42:44] + cd llvm-7.0.0.src
[00:42:44] + mkdir -p tools/clang
[00:42:44] + curl https://releases.llvm.org/7.0.0/cfe-7.0.0.src.tar.xz
[00:42:44] + xz -d
[00:42:44] + tar xf - -C tools/clang --strip-components=1
[00:42:44]                                  Dload  Upload   Total   Spent    Left  Speed
[00:42:46] 
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
 28 11.9M   28 3486k    0     0  4992k      0  0:00:02 --:--:--  0:00:02 5001k
---
[00:42:46]                                  Dload  Upload   Total   Spent    Left  Speed
[00:42:46] 
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100  894k  100  894k    0     0  1926k      0 --:--:-- --:--:-- --:--:-- 1935k
[00:42:46] + mkdir ../clang-build
[00:42:46] + cd ../clang-build
[00:42:46] + INC=/rustroot/include
[00:42:46] + INC=/rustroot/include:/rustroot/lib/gcc/x86_64-unknown-linux-gnu/4.8.5/include-fixed
[00:42:46] + INC=/rustroot/include:/rustroot/lib/gcc/x86_64-unknown-linux-gnu/4.8.5/include-fixed:/usr/include
[00:42:46] + hide_output cmake ../llvm-7.0.0.src -DCMAKE_C_COMPILER=/rustroot/bin/gcc -DCMAKE_CXX_COMPILER=/rustroot/bin/g++ -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/rustroot -DLLVM_TARGETS_TO_BUILD=X86 -DC_INCLUDE_DIRS=/rustroot/include:/rustroot/lib/gcc/x86_64-unknown-linux-gnu/4.8.5/include-fixed:/usr/include
[00:43:16] Mon Dec 10 16:05:11 UTC 2018 - building ...
[00:43:18] + hide_output make -j10
[00:43:18] + set +x
[00:43:48] Mon Dec 10 16:05:42 UTC 2018 - building ...
---
[01:41:18] + set +x
[01:41:19] + hide_output make INSTALL_HDR_PATH=dest headers_install
[01:41:19] + set +x
[01:41:22] shared.sh: line 11:    10 Terminated              bash -c "while true; do sleep 30; echo \$(date) - building ...; done"
[01:41:22] + find dest/include '(' -name .install -o -name ..install.cmd ')' -delete
[01:41:22] + yes
[01:41:22] + yes
[01:41:22] + cp -fr dest/include/asm dest/include/asm-generic dest/include/drm dest/include/linux dest/include/mtd dest/include/rdma dest/include/scsi dest/include/sound dest/include/video dest/include/xen /usr/include
[01:41:22] + rm -rf linux-3.2.84
[01:41:23]  ---> f7766546ed27
[01:41:23] Removing intermediate container beae4c2a93c2
[01:41:23] Step 31/41 : COPY dist-x86_64-linux/build-perl.sh /tmp/
[01:41:23] Step 31/41 : COPY dist-x86_64-linux/build-perl.sh /tmp/
[01:41:23]  ---> d6259b67dfa1
[01:41:23] Step 32/41 : RUN ./build-perl.sh
[01:41:23]  ---> Running in fb6f61c82495
[01:41:24] + source shared.sh
[01:41:24] + curl https://www.cpan.org/src/5.0/perl-5.28.0.tar.gz
[01:41:24]   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
[01:41:24]                                  Dload  Upload   Total   Spent    Left  Speed
[01:41:25] 
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@bors
Copy link
Contributor

bors commented Dec 10, 2018

💔 Test failed - status-travis

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 10, 2018
@alexcrichton
Copy link
Member

@bors: retry

@bors
Copy link
Contributor

bors commented Dec 10, 2018

⌛ Trying commit 208b9a7 with merge 8f33a3d05703e0a9180173ae6bd15875fc94f17e...

@bors
Copy link
Contributor

bors commented Dec 10, 2018

💔 Test failed - status-travis

@alexcrichton
Copy link
Member

Hm I'm not entirely sure what's going on.... I couldn't easily tell if that's a normal timeout or a spurious timeout

@michaelwoerister
Copy link
Member Author

@bors retry

@bors
Copy link
Contributor

bors commented Dec 11, 2018

⌛ Trying commit 208b9a7 with merge 26f96e5...

bors added a commit that referenced this pull request Dec 11, 2018
[experiment] Benchmark incremental ThinLTO'd compiler.

I figure we can hijack perf.rlo to get some numbers on the runtime performance of code compiled with incremental ThinLTO. To this end, we compile the compiler incrementally and then measure how much slower it is a building stuff.
@bors
Copy link
Contributor

bors commented Dec 11, 2018

☀️ Test successful - status-travis
State: approved= try=True

@michaelwoerister
Copy link
Member Author

@rust-timer build 26f96e5

@rust-timer
Copy link
Collaborator

Success: Queued 26f96e5 with parent 3a31213, comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit 26f96e5

@alexcrichton
Copy link
Member

All things considered that's actually a really impressive benchmark. The top 4 regressions are probably just one or two missing #[inline] annotations

@michaelwoerister
Copy link
Member Author

Yes, the medium sized crates are hardly affected. Pretty cool!

@michaelwoerister
Copy link
Member Author

OK, so this gives us one good, real-world test case where incremental+ThinLTO gives roughly the same result as regular ThinLTO as far as the runtime performance of the generated code goes. We'll need more examples (@nnethercote mentioned that he might take a look at Firefox as another big, real-world test case) but if things keep looking the way they are looking now, we should think about making incremental compilation the default for optimized builds too because the speedups to be had here are substantial.

The following table compares compile times with and without incremental compilation for optimized builds (numbers taken from a random, recent perf.rlo run)

non-incr incr-base incr-small-change one-time cost speed up
piston-image 7.82s 8.18s 1.45s +5% 5.4x
webrender 20.64s 23.34s 4.63s +13% 4.5x
ripgrep 8.04s 8.5s 1.82s +6% 4.4x
encoding 1.5s 1.73s 0.43s +15% 3.5x
futures 0.95s 1.16s 0.31s +22% 3.1x
regex 4.63s 5.08s 2.02s +10% 2.3x
clap-rs 25.37s 26.47s 13.32s +4% 1.9x
sentry-cli 19.1s 21.06s 11.43s +10% 1.7x
cargo 38.95s 43.63s 24.5s +12% 1.6x
tokio-webpush-simple 4.21s 4.77s 3.09s +13% 1.4x
helloworld 0.15s 0.16s 0.15s +7% 1.0x

Initial compilation still, as expected, is slower with incr. comp. but subsequent compilation sessions can be much faster! We should really try to get this into the hands of end-users somehow. I mean, how often do we get a chance to make compilation up to five times faster? :D

If we had a debug-opt Cargo profile, this would be a no-brainer. Since we only have release and debug, it's a bit more complicated. Actually released code should still be compiled non-incrementally, with one codegen unit, and possibly with whole-crate-graph LTO enabled.

cc @rust-lang/core @rust-lang/cargo @rust-lang/wg-compiler-performance

@alexcrichton
Copy link
Member

Those indeed are some compelling numbers, and I could get behind making incremental release mode the default! I think we could even go further in release mode and turn on "infinite codegen units" by default as well, as that wouldn't have the one-time cost of incremental but would benefit compiling all crates.io crates in release mode which aren't compiled incrementally (and may help eat the one time cost of the local project being incremental!)

I would personally be in favor of pursuing an aggressive strategy of "simply turn on incremental by default in release mode". There's always going to be some configuration which eeks out a few percentage points of performance here and there (LTO, PGO, one CGU, implementing a mode in rustc that just reruns all LLVM passes, etc, etc). What I think this is very convincingly showing is that there's no major regressions with incrementally compiled code in release mode and it's still very competitive.

In terms of next steps I think we'd probably want to do a call on internals like we initially did for ThinLTO, basically having a very short set of instructions for how to test and ask folks to post various numbers for various projects. I suspect CARGO_INCREMENTAL=1 cargo build --release is enough to test this, but I haven't tested!

@michaelwoerister
Copy link
Member Author

In terms of next steps I think we'd probably want to do a call on internals like we initially did for ThinLTO, basically having a very short set of instructions for how to test and ask folks to post various numbers for various projects.

Heh, you mean like this one? :)
Maybe the instructions were too complicated.

@alexcrichton
Copy link
Member

Ha yes I mean exactly that one!

We may want to try messaging yeah with an easy-to-run thing at the top, and then perhaps also message simultaneously say that we're considering turning this on by default so it's important to test!

@Mark-Simulacrum
Copy link
Member

FWIW, I've been running with CARGO_INCREMENTAL=1 on all my smaller Rust projects and have never noticed any slowdowns either -- including across all the Advent of Code benchmarks that I've run (so tight-loop code).

I think we may be able to just go ahead and do this without any call to action, and see if people come complaining. I believe the Cargo team is working long-term on a reworking of profiles which would help with a true "release" build here -- though at least in my experience incremental release builds today are more than fast enough.

@Eh2406
Copy link
Contributor

Eh2406 commented Dec 12, 2018

As I said in internals, we have lolbench now. We have a way to accurately measure the change on performance of the compiled output over a large number benchmarks from real world code. (I don't have access to a Computer I can dedicate to running this experiment or I would do it myself.) We don't intentionally make breaking changes without asking crater. We can do the same for the performance of compiled output!

@michaelwoerister
Copy link
Member Author

A simple way to test this via lolbench would be to make incr. comp. the default. We are early in the cycle, after all, and can revert if performance takes too much of a hit. I don't know how much work a local lolbench run would be but it sounded like it wouldn't work out of the box.

@XAMPPRocky
Copy link
Member

Triage; @michaelwoerister Hello, have you been able to get back to this PR?

@michaelwoerister
Copy link
Member Author

This PR was just an experiment and we got some good numbers out of it. Once beta has branched off, we could look into making incr. comp. the default for release builds temporarily in order to get some numbers via lolbench.

bors added a commit to rust-lang/cargo that referenced this pull request Jan 24, 2019
…, r=alexcrichton

Make incremental compilation the default for all profiles.

This PR makes incremental compilation the default for all profiles, that is, also `release` and `bench`. `rustc` performs ThinLTO by default for incremental release builds for a while now and the [data we've gathered so far](rust-lang/rust#56678) indicates that the generated binaries exhibit roughly the same runtime performance as non-incrementally compiled ones. At the same time, incremental release builds can be 2-5 times as fast as non-incremental ones.

Making incremental compilation (temporarily) the default in `cargo` would be a simple way of gathering more data about runtime performance via [lolbench.rs](https://lolbench.rs). If the results look acceptable, we can just leave it on and give a massive compile time reduction to everyone. If not, we can revert the change and think about a plan B.

This strategy assumes that lolbench will actually use the nightly `cargo` version. Is that true, @anp?

r? @alexcrichton
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants