-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracking Issue for making incremental compilation the default for Release Builds #57968
Comments
On Tue, Jan 29, 2019 at 02:36:31AM -0800, Michael Woerister wrote:
## Data on runtime performance of incrementally compiled release artifacts
Apart from anectodal evidence that runtime performance is "roughly the same" there have been two attempts to measure this in a more reliable way:
1. PR #56678 did an experiment where we compiled the compiler itself incrementally and then tested how the compiler's runtime performance was affected by this. The results are twofold:
1. In general performance drops by **1-2%** ([compare results](https://perf.rust-lang.org/compare.html?start=3a3121337122637fa11f0e5d42aec67551e8c125&end=26f96e5eea2d6d088fd20ebc14dc90bdf123e4a1) for `clean` builds)
2. For two of the small test cases (`helloworld`, `unify-linearly`) performance drops by 30%. It is known that these test cases are very sensitive to LLVM making the right inlining decisions, which we already saw when switching from single-CGU to non-incremental ThinLTO. This is indicative that microbenchmarks may see performance drops unless the author of the benchmark takes care of marking bottleneck functions with `#[inline]`.
I'm not especially worried about the increases in compile time, as they
seem worth the cost. However, these regressions in runtime performance
don't seem reasonable to me; I don't think we should change the default
to something that has any runtime performance cost.
|
I'm not sure. The current default already has a quite significant runtime performance cost because it's using ThinLTO instead of |
@alexcrichton To avoid ambiguity, what do you mean by "fastest compilation mode" here? I certainly think we don't need to worry about compiling as fast as possible, but I don't think our default compile should pay a runtime performance penalty like this. |
Ah by that I mean that producing the fastest code possible. Producing the fastest code by default for |
So if |
Yeah I'm honestly thinking that it may be time for a profile between debug and release, such that there is these use cases:
At the moment I'm seeing lots of people either sacrifice the debug profile for that "Development" use case (bumping optimization levels, but reducing the debugability of the project) or sacrifice the release profile by reducing optimizations, both are kind of suboptimal. |
rust-lang/cargo#2007 This came up a lot of times, but for some reason was never implemented. The discussions about it turned into talk about "workflows" and "profile overrides", although it's not very clear to me why:
|
Currently the compiler will produce an error if both incremental compilation and full fat LTO is requested. With recent changes and the advent of incremental ThinLTO, however, all the hard work is already done for us and it's actually not too bad to remove this error! This commit updates the codegen backend to allow incremental full fat LTO. The semantics are that the input modules to LTO are all produce incrementally, but the final LTO step is always done unconditionally regardless of whether the inputs changed or not. The only real incremental win we could have here is if zero of the input modules changed, but that's so rare it's unlikely to be worthwhile to implement such a code path. cc rust-lang#57968 cc rust-lang/cargo#6643
…haelwoerister rustc: Implement incremental "fat" LTO Currently the compiler will produce an error if both incremental compilation and full fat LTO is requested. With recent changes and the advent of incremental ThinLTO, however, all the hard work is already done for us and it's actually not too bad to remove this error! This commit updates the codegen backend to allow incremental full fat LTO. The semantics are that the input modules to LTO are all produce incrementally, but the final LTO step is always done unconditionally regardless of whether the inputs changed or not. The only real incremental win we could have here is if zero of the input modules changed, but that's so rare it's unlikely to be worthwhile to implement such a code path. cc rust-lang#57968 cc rust-lang/cargo#6643
@lnicola I've updated my custom profiles implementation now, here: rust-lang/cargo#6676 . Maybe it can be useful for this issue. |
This is so cool 😍. If any perf regressions do emerge, I'd like to briefly plug sending a quick PR to lolbench so they can be caught in the future. |
@michaelwoerister I was wondering, are there more blockers to this that you know of? |
Just the ones listed in the original post. Although, maybe also Windows performance. I worked on Windows 10 for a few days and got the impression that incr. comp. is very slow there. That should be verified though. |
@michaelwoerister Could NTFS' infamous low performance on many small files be at fault here? |
I'd rather blame Windows Defender which is enabled by default on Windows 10. |
Re Defender, I've seen someone investigating parallel extraction of the installation packages to split the scanning across multiple cores, since it's synchronous. Does the compiler use multiple threads when writing the files? Or, going the other way, would sticking then in a SQLite database help? |
Visited during the compiler team's backlog bonanza. Reading over the discussion, it seems like we still haven't decided if it makes sense to enable incremental compilation by default for release builds. |
At this time, I am strongly against making incr. comp. the default for release builds because that's the mode that's usually being used for production builds -- and incr. comp. is not a good choice for that because of increased code size, less effective optimizations, and the added risks for running into incr. comp. specific compiler bugs. That being said, optimized incremental builds certainly have their use cases. My recommendation would be for people to use Cargo custom profiles in those cases. In my opinion the stabilization of Cargo custom profiles makes this issue obsolete. |
I think the conflation of "debug" and "dev" in Cargo was a bit of a mistake, especially since Having some kind of "devopt" profile by default would've been great, especially if it includes some minimal amount of debuginfo (which would help with e.g. profiling, something one might want to do for local development with optimizations turned on). I'm personally fine with adding stuff like this to all my workspace-level # `release` but with enough debuginfo for `perf record --call-graph=dwarf`.
[profile.release-profiling]
inherits = "release"
debug = 1 |
Parts of what I said in my last comment above (#57968 (comment)) keep coming up for me. Most recently, a coworker was wondering why some backtrace was missing almost any useful information (there's even some frames there that make no sense, so the frame pointer is probably not used consistently either). You may ask "what does debuginfo have to do with incremental"? But that's not why I bring it up - the connection to this issue is that both "missing incremental while developing" and "missing debuginfo while developing" are side-effects of "Development" vs "Release"Cargo came with two profiles (until the recent custom ones):
Three different aspects of compilation, toggled with one convenient switch. But if we were to treat those aspects as independent, we may get this:
Instead, we fossilized
|
We discussed a little about having a |
I would argue you basically would never write The development workflow isn't "three profiles", it's two (i.e. the development ones, with "release" being "not development") - or just one, really (I almost always end up using debug mode to avoid typing The summary makes me worried the discussion overfocused on the If I had the time and energy I would probably make an RFC, but also this may be one of those things where nothing short of a time machine will really fix it. OTOH we have editions, and the Cargo resolver 2.0 saga is a good precedent - it might be nice to deprecate ¹ (or any other way to spell "optimized development", I'm not that attached to
Sadly there's no good way to do this for profiles without ending up with |
if you want development with optimizations, why not just set an optimization level? Similarly, if you want release with split debug, isn't that something you can set in the profile? (i admit I've never had to set split debug, on windows it's always split). |
I go into far more detail above, but the obvious choices today are:
One trick I've never seen used, would be to either set And for all of those options (unless you start with the The closest thing to what I want is a custom profile, except for it not already existing in everyone's config. Not to sound like a broken record but my point was that we screwed up with the "dev=debug vs opt=release" split, and that if we want to address that, we have to switch people off With a Of course there's nobody stopping anyone from taking that to
Not something I've ever done personally, so I can't answer this very well, but AFAIK this is generally handled by the release pipeline you have around the |
Sorry, the post with the
I would suggest to turn I agree with you that cargo's two-profile system is inadequate. I just set |
Since incremental compilation supports being used in conjunction with ThinLTO the runtime performance of incrementally built artifacts is (presumably) roughly on par with non-incrementally built code. At the same time, building things incrementally often is significantly faster ((1.4-5x according to perf.rlo). As a consequence it might be a good idea to make Cargo default to incremental compilation for release builds.
Possible caveats that need to be resolved:
debug
andcheck
builds everybody seems to be fine with this already.style-servo
, are always slower to compile with incr. comp., even if there is just a small change. In the case ofstyle-servo
that is 62 seconds versus 64-69 seconds on perf.rlo. It is unlikely that this would improve before we make incr. comp. the default. We need to decide if this is a justifiable price to pay for improvements in other projects.CARGO_INCREMENTAL
flag or a local Cargo config. However, this might not be common knowledge, the same as it isn't common knowledge that one can improve runtime performance by forcing the compiler to use just one codegen unit.Data on runtime performance of incrementally compiled release artifacts
Apart from anectodal evidence that runtime performance is "roughly the same" there have been two attempts to measure this in a more reliable way:
clean
builds)helloworld
,unify-linearly
) performance drops by 30%. It is known that these test cases are very sensitive to LLVM making the right inlining decisions, which we already saw when switching from single-CGU to non-incremental ThinLTO. This is indicative that microbenchmarks may see performance drops unless the author of the benchmark takes care of marking bottleneck functions with#[inline]
.One more experiment we should do is compiling Firefox because it is a large Rust codebase with an excellent benchmarking infrastructure (cc @nnethercote).
cc @rust-lang/core @rust-lang/cargo @rust-lang/compiler
The text was updated successfully, but these errors were encountered: