-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[wip] Enable ThinLTO with incremental compilation. #52309
[wip] Enable ThinLTO with incremental compilation. #52309
Conversation
r? @oli-obk (rust_highfive has picked a reviewer for you, use r? to override) |
@bors try |
⌛ Trying commit c1cc0ea41470916761bfa45723719fe2a264de05 with merge 285c985debb917383c41f3032115eac7700d932a... |
💔 Test failed - status-travis |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
1 similar comment
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
I don't understand what's going on here. The |
The error is on llvm-emscripten, based on LLVM 4.0 which doesn't have |
Ah, that clears it up. Thanks, @cuviper! |
c1cc0ea
to
4e28b6c
Compare
@bors try |
[wip] Enable ThinLTO with incremental compilation. This (work-in-progress) PR allows for combining ThinLTO and incremental compilation. I'll write up something more detailed once the kinks are worked out. For now, I'm mostly interested in what this does to compile times.
☀️ Test successful - status-travis |
@rust-timer build 02e01bd |
Success: Queued 02e01bd with parent 68c39b9, comparison URL. |
Looks like cache hits aren't happening? |
I think the results are kind of expected because of how coarse-grained this still is. Right now if something changes in a module then all modules important anything from that module (even something that has not really changed) need to completely re-compiled. There's room for improvement: If we track changes for every |
I updated the PR description: #52309 (comment) @rust-lang/compiler @rust-lang/wg-compiler-performance |
In terms of testing, you could try to build the compiler incrementally with ThinLTO and see if it's tests still pass, potentially doing a local perf run to compare against a incrementally compiled compiler (w/o ThinLTO), as well. |
Thanks @michaelwoerister! I wonder though, I think that incremental ThinLTO upstream at least isn't supposed to have such bad performance, so I think we're missing a step in impelmenting this? I thought incremental ThinLTO would looks something like this:
It looks like the implementation here is invalidating too eagerly? The construction above would avoid retranslating/optimizing modules that depend on changed modules but don't import any of the changes. In that the cache invalidation needs to only happen for the ThinLTO passes, not the entire module's original optimization passes. |
@alexcrichton, your approach would increase the size of our incr. comp. cache quite a bit (~50%) because we would have to cache LLVM IR before and after ThinLTO. Currently we only cache the later. I'm not strictly opposed to doing that but cache size already is a concern for big projects. It's certainly worth a try though. Also note that we actually don't have to hash anything, or incr. comp. framework will tell us already which modules have changed. With some tweaking, it should also be possible to know exactly which items within a module have changed. I'll have another look at how the linker-plugins do it though. |
@michaelwoerister indeed! Although I think what I mentioned above is how incremental ThinLTO was designed to work? In that if you're using a linker plugin then all your intermediate files are the IR files and then the final link also has a cache directory with a lot of object files? I think we have some other various methods to greatly improve our cache directory sizes, for example we should probably be using thin archives which just point to the object files elsewhere on disk, and that way object files aren't copied around. (additionally we probably save way too much IR in too many places!) Within our own incr. comp. framework I think ThinLTO would roughly look like:
While it's probably possible to integrate it into our own incr. comp. framework and have reuse "just work" it may also be good to shoehorn as close as we can to upstream incremental ThinLTO as that's how it's written to work in the sense of fixing bugs and getting future developments. I sort of assumed that the way we'd implement incremental ThinLTO was to basically implement custom incrementality in ThinLTO passes (like upstream LLVM does) and basically just get the main incremental directory from the compilation session. I do think we have to solve the problem here in the sense if I change one CGU that shouldn't force all CGUs which may import the item to be fully recompiled. Instead only those which are known to import the changed item should be recompiled (and even then, not fully recompiled, just post-ThinLTO-passes). |
It’s not just the incremental cache and it’s not just Rust code, but as a single data point I regularly fill the 500 GB on my desktop machine that is dedicated to compilation. Part of this is due to Cargo not having any garbage collection outside of |
If by "item" you mean CGU then this is already what this PR does (except of course re-use the pre-ThinLTO optimization work because we don't cache it).
That's what I think we should do too. This PR already makes LLVM IR generation not be a query anymore. It's pretty much a "standard ThinLTO" pipeline. |
@michaelwoerister ah yeah sorry by item I mean per-function. For example if there are two CGUs A and B where A inlines function I may be misunderstanding what this PR does though? The description leads me to believe that in the above scenario no matter what changes in B then A is fully recompiled unconditionally. Is that not the case though? |
That's the case. The PR does NOY yet implement change tracking at the function level. Do you know if linker-based LTO computes a hash per function? I don't have the code at hand at the moment. The description here sounds like no: http://blog.llvm.org/2016/06/thinlto-scalable-and-incremental-lto.html. |
Aha it appears I'm misremembering! I've gone off this list before, and you're right in that it only looks at entire modules, not hashes of imported contents. In light of that I think there's still a "main thing" for this PR to implement, right? Which is that we shouldn't recodegen/reoptimize entire modules, but rather only rerun the ThinLTO passes when an input changes, right? Does that make sense? |
Yes, that makes sense. That should put |
Ping from triage, @michaelwoerister we haven't heard from you for a while, will you have time to work on this PR? |
This has been slightly de-prioritized until linker-plugin-based LTO is under wraps. Since this will need a little design work still, I'm going to close this PR for now. |
…hton Enable ThinLTO with incremental compilation. This is an updated version of #52309. This PR allows `rustc` to use (local) ThinLTO and incremental compilation at the same time. In theory this should allow for getting compile-time improvements for small changes while keeping the runtime performance of the generated code roughly the same as when compiling non-incrementally. The difference to #52309 is that this version also caches the pre-LTO version of LLVM bitcode. This allows for another layer of caching: 1. if the module itself has changed, we have to re-codegen and re-optimize. 2. if the module itself has not changed, but a module it imported from during ThinLTO has, we don't need to re-codegen and don't need to re-run the first optimization phase. Only the second (i.e. ThinLTO-) optimization phase is re-run. 3. if neither the module itself nor any of its imports have changed then we can re-use the final, post-ThinLTO version of the module. (We might have to load its pre-ThinLTO version though so it's available for other modules to import from)
This PR allows
rustc
to use ThinLTO and incremental compilation at the same time. In theory this should allow for getting compile-time improvements for small changes while keeping the runtime performance of the generated code roughly the same as when compiling non-incrementally.How does it work?
ModuleSummaryIndex
. This index contains a map of which module imported stuff from which other modules during ThinLTO.has_changed(module.inputs)
tohas_changed(module.inputs) || any(has_changed(imported_module.inputs))
.Performance
The effect on compile times can be seen here: https://perf.rust-lang.org/compare.html?start=68c39b9fec17846005da9a8e42991422c08c377c&end=02e01bd388a35a90002e37851aa26ba59c593a42&stat=wall-time
Some observations:
baseline incremental-opt
case is always slower. This is expected since we are running ThinLTO now but didn't do so before.patched incremental: println-opt
cases are a lot slower. E.g.clap-rs-opt
which goes from 4.47s to 24.65s, i.e. it takes more than 5 times as long. The small change in these cases probably invalidates a module that many other modules import things from.patched incremental: println-opt
are hardly affected. E.g.encoding-opt
goes from 0.38s to 0.42s, i.e. 8% slower. It is still 10 times faster than compiling from scratch, in theory without sacrificing runtime performance.debug
builds are also negatively affected (e.g.regex-debug / patched incremental: compile one
). I suspect this is because in it's current form, the PR allows for less parallelism during codegen. It first sequentially checks all modules for modifications and only then starts codegen and LLVM. Without this PR, modification checking is done more lazily, while LLVM is already running. This should not be hard to fix for the non-ThinLTO case.How to proceed?
cc @rust-lang/compiler @rust-lang/wg-compiler-performance
This (work-in-progress) PR allows for combining ThinLTO and incremental compilation. I'll write up something more detailed once the kinks are worked out. For now, I'm mostly interested in what this does to compile times.