-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[experiment] Benchmark incremental ThinLTO'd compiler. #56678
[experiment] Benchmark incremental ThinLTO'd compiler. #56678
Conversation
@bors try |
(rust_highfive has picked a reviewer for you, use r? to override) |
[experiment] Benchmark incremental ThinLTO'd compiler. I figure we can hijack perf.rlo to get some numbers on the runtime performance of code compiled with incremental ThinLTO. To this end, we compile the compiler incrementally and then measure how much slower it is a building stuff.
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
💔 Test failed - status-travis |
@bors: retry |
⌛ Trying commit 208b9a7 with merge 8f33a3d05703e0a9180173ae6bd15875fc94f17e... |
💔 Test failed - status-travis |
Hm I'm not entirely sure what's going on.... I couldn't easily tell if that's a normal timeout or a spurious timeout |
@bors retry |
[experiment] Benchmark incremental ThinLTO'd compiler. I figure we can hijack perf.rlo to get some numbers on the runtime performance of code compiled with incremental ThinLTO. To this end, we compile the compiler incrementally and then measure how much slower it is a building stuff.
☀️ Test successful - status-travis |
@rust-timer build 26f96e5 |
Success: Queued 26f96e5 with parent 3a31213, comparison URL. |
Finished benchmarking try commit 26f96e5 |
All things considered that's actually a really impressive benchmark. The top 4 regressions are probably just one or two missing |
Yes, the medium sized crates are hardly affected. Pretty cool! |
OK, so this gives us one good, real-world test case where incremental+ThinLTO gives roughly the same result as regular ThinLTO as far as the runtime performance of the generated code goes. We'll need more examples (@nnethercote mentioned that he might take a look at Firefox as another big, real-world test case) but if things keep looking the way they are looking now, we should think about making incremental compilation the default for optimized builds too because the speedups to be had here are substantial. The following table compares compile times with and without incremental compilation for optimized builds (numbers taken from a random, recent perf.rlo run)
Initial compilation still, as expected, is slower with incr. comp. but subsequent compilation sessions can be much faster! We should really try to get this into the hands of end-users somehow. I mean, how often do we get a chance to make compilation up to five times faster? If we had a cc @rust-lang/core @rust-lang/cargo @rust-lang/wg-compiler-performance |
Those indeed are some compelling numbers, and I could get behind making incremental release mode the default! I think we could even go further in release mode and turn on "infinite codegen units" by default as well, as that wouldn't have the one-time cost of incremental but would benefit compiling all crates.io crates in release mode which aren't compiled incrementally (and may help eat the one time cost of the local project being incremental!) I would personally be in favor of pursuing an aggressive strategy of "simply turn on incremental by default in release mode". There's always going to be some configuration which eeks out a few percentage points of performance here and there (LTO, PGO, one CGU, implementing a mode in rustc that just reruns all LLVM passes, etc, etc). What I think this is very convincingly showing is that there's no major regressions with incrementally compiled code in release mode and it's still very competitive. In terms of next steps I think we'd probably want to do a call on internals like we initially did for ThinLTO, basically having a very short set of instructions for how to test and ask folks to post various numbers for various projects. I suspect |
Heh, you mean like this one? |
Ha yes I mean exactly that one! We may want to try messaging yeah with an easy-to-run thing at the top, and then perhaps also message simultaneously say that we're considering turning this on by default so it's important to test! |
FWIW, I've been running with I think we may be able to just go ahead and do this without any call to action, and see if people come complaining. I believe the Cargo team is working long-term on a reworking of profiles which would help with a true "release" build here -- though at least in my experience incremental release builds today are more than fast enough. |
As I said in internals, we have |
A simple way to test this via lolbench would be to make incr. comp. the default. We are early in the cycle, after all, and can revert if performance takes too much of a hit. I don't know how much work a local lolbench run would be but it sounded like it wouldn't work out of the box. |
Triage; @michaelwoerister Hello, have you been able to get back to this PR? |
This PR was just an experiment and we got some good numbers out of it. Once beta has branched off, we could look into making incr. comp. the default for release builds temporarily in order to get some numbers via lolbench. |
…, r=alexcrichton Make incremental compilation the default for all profiles. This PR makes incremental compilation the default for all profiles, that is, also `release` and `bench`. `rustc` performs ThinLTO by default for incremental release builds for a while now and the [data we've gathered so far](rust-lang/rust#56678) indicates that the generated binaries exhibit roughly the same runtime performance as non-incrementally compiled ones. At the same time, incremental release builds can be 2-5 times as fast as non-incremental ones. Making incremental compilation (temporarily) the default in `cargo` would be a simple way of gathering more data about runtime performance via [lolbench.rs](https://lolbench.rs). If the results look acceptable, we can just leave it on and give a massive compile time reduction to everyone. If not, we can revert the change and think about a plan B. This strategy assumes that lolbench will actually use the nightly `cargo` version. Is that true, @anp? r? @alexcrichton
I figure we can hijack perf.rlo to get some numbers on the runtime performance of code compiled with incremental ThinLTO. To this end, we compile the compiler incrementally and then measure how much slower it is a building stuff.