Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wip] Enable ThinLTO with incremental compilation. #52309

Closed

Conversation

michaelwoerister
Copy link
Member

@michaelwoerister michaelwoerister commented Jul 12, 2018

This PR allows rustc to use ThinLTO and incremental compilation at the same time. In theory this should allow for getting compile-time improvements for small changes while keeping the runtime performance of the generated code roughly the same as when compiling non-incrementally.

How does it work?

  • ThinLTO computes the so-called ModuleSummaryIndex. This index contains a map of which module imported stuff from which other modules during ThinLTO.
  • The implementation in this PR takes this "import map" after it has been generated and saves it in the incr. comp. cache directory.
  • In a subsequent compilation session, we load the previous import map and use it do enhanced cache invalidation: Because ThinLTO pulls in things from other modules we don't only have to re-compile an object file if its corresponding source code has changed but also if something has changed in a module that we imported something from.
  • The cache invalidation logic thus changes from has_changed(module.inputs) to has_changed(module.inputs) || any(has_changed(imported_module.inputs)).

Performance

The effect on compile times can be seen here: https://perf.rust-lang.org/compare.html?start=68c39b9fec17846005da9a8e42991422c08c377c&end=02e01bd388a35a90002e37851aa26ba59c593a42&stat=wall-time

Some observations:

  • The baseline incremental-opt case is always slower. This is expected since we are running ThinLTO now but didn't do so before.
  • Some of the patched incremental: println-opt cases are a lot slower. E.g. clap-rs-opt which goes from 4.47s to 24.65s, i.e. it takes more than 5 times as long. The small change in these cases probably invalidates a module that many other modules import things from.
  • Some of the patched incremental: println-opt are hardly affected. E.g. encoding-opt goes from 0.38s to 0.42s, i.e. 8% slower. It is still 10 times faster than compiling from scratch, in theory without sacrificing runtime performance.
  • Some of the debug builds are also negatively affected (e.g. regex-debug / patched incremental: compile one). I suspect this is because in it's current form, the PR allows for less parallelism during codegen. It first sequentially checks all modules for modifications and only then starts codegen and LLVM. Without this PR, modification checking is done more lazily, while LLVM is already running. This should not be hard to fix for the non-ThinLTO case.

How to proceed?

  • Bring back the performance for the incremental non-ThinLTO builds. This should be done before merging this PR.
  • Find out how the heck to test this properly. This would also be good to do before merging.
  • Maybe merge the PR and then ask people on irlo to compile their code with it. This way we would get us numbers on the runtime performance of code generated this way. It would be good to know if this is even worth the trouble.
  • Once we know that runtime performance is good, we can decide whether to turn incremental compilation on by default also for release builds. Just merging this PR won't do so.
  • Investigate if we can make cache invalidation smarter. The ThinLTO import map tells us exactly which functions it imported into a given module, so we could try to check things at the function- instead of at the module-level. I'm sure though that that's not an easy task.

cc @rust-lang/compiler @rust-lang/wg-compiler-performance

This (work-in-progress) PR allows for combining ThinLTO and incremental compilation. I'll write up something more detailed once the kinks are worked out. For now, I'm mostly interested in what this does to compile times.

@rust-highfive
Copy link
Collaborator

r? @oli-obk

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 12, 2018
@michaelwoerister
Copy link
Member Author

@bors try

@bors
Copy link
Contributor

bors commented Jul 12, 2018

⌛ Trying commit c1cc0ea41470916761bfa45723719fe2a264de05 with merge 285c985debb917383c41f3032115eac7700d932a...

@bors
Copy link
Contributor

bors commented Jul 12, 2018

💔 Test failed - status-travis

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 12, 2018
@rust-highfive
Copy link
Collaborator

The job dist-x86_64-linux of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
travis_fold:end:services

travis_fold:start:git.checkout
travis_time:start:023bee34
$ git clone --depth=2 --branch=try https://github.com/rust-lang/rust.git rust-lang/rust
---
[00:26:37]    Compiling rustc_llvm v0.0.0 (file:///checkout/src/librustc_llvm)
[00:26:42] error: failed to run custom build command for `rustc_llvm v0.0.0 (file:///checkout/src/librustc_llvm)`
[00:26:42] process didn't exit successfully: `/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-rustc/release/build/rustc_llvm-42df884059483cad/build-script-build` (exit code: 101)
[00:26:42] --- stdout
[00:26:42] cargo:rerun-if-changed=/checkout/obj/build/x86_64-unknown-linux-gnu/llvm-emscripten/bin/llvm-config
[00:26:42] cargo:rerun-if-env-changed=LLVM_CONFIG
[00:26:42] cargo:rustc-cfg=llvm_component="asmparser"
[00:26:42] cargo:rustc-cfg=llvm_component="bitreader"
[00:26:42] cargo:rustc-cfg=llvm_component="bitwriter"
[00:26:42] cargo:rustc-cfg=llvm_component="instrumentation"
[00:26:42] cargo:rustc-cfg=llvm_component="interpreter"
[00:26:42] cargo:rustc-cfg=llvm_component="ipo"
[00:26:42] cargo:rustc-cfg=llvm_component="jsbackend"
[00:26:42] cargo:rustc-cfg=llvm_component="linker"
[00:26:42] cargo:rustc-cfg=llvm_component="lto"
[00:26:42] cargo:rustc-cfg=llvm_component="mcjit"
[00:26:42] cargo:rerun-if-changed-env=LLVM_RUSTLLVM
[00:26:42] cargo:rerun-if-changed=../rustllvm/ArchiveWrapper.cpp
[00:26:42] cargo:rerun-if-changed=../rustllvm/Linker.cpp
[00:26:42] cargo:rerun-if-changed=../rustllvm/llvm-rebuild-trigger
[00:26:42] cargo:rerun-if-changed=../rustllvm/RustWrapper.cpp
[00:26:42] cargo:rerun-if-changed=../rustllvm/rustllvm.h
[00:26:42] cargo:rerun-if-changed=../rustllvm/README
[00:26:42] cargo:rerun-if-changed=../rustllvm/.editorconfig
[00:26:42] cargo:rerun-if-changed=../rustllvm/PassWrapper.cpp
[00:26:42] OPT_LEVEL = Some("2")
[00:26:42] TARGET = Some("x86_64-unknown-linux-gnu")
[00:26:42] HOST = Some("x86_64-unknown-linux-gnu")
[00:26:42] TARGET = Some("x86_64-unknown-linux-gnu")
[00:26:42] TARGET = Some("x86_64-unknown-linux-gnu")
[00:26:42] TARGET = Some("x86_64-unknown-linux-gnu")
[00:26:42] HOST = Some("x86_64-unknown-linux-gnu")
[00:26:42] CXX_x86_64-unknown-linux-gnu = Some("sccache clang++")
[00:26:42] HOST = Some("x86_64-unknown-linux-gnu")
[00:26:42] CXXFLAGS_x86_64-unknown-linux-gnu = Some("-ffunction-sections -fdata-sections -fPIC --target=x86_64-unknown-linux-gnu")
[00:26:42] DEBUG = Some("false")
[00:26:42] DEBUG = Some("false")
[00:26:42] running: "sccache" "clang++" "-O2" "-ffunction-sections" "-fdata-sections" "-fPIC" "-ffunction-sections" "-fdata-sections" "-fPIC" "--target=x86_64-unknown-linux-gnu" "--target=x86_64-unknown-linux-gnu" "-I/checkout/obj/build/x86_64-unknown-linux-gnu/llvm-emscripten/include" "-ffunction-sections" "-fdata-sections" "-fPIC" "--target=x86_64-unknown-linux-gnu" "-fPIC" "-fvisibility-inlines-hidden" "-Wall" "-W" "-Wno-unused-parameter" "-Wwrite-strings" "-Wcast-qual" "-Wmissing-field-initializers" "-pedantic" "-Wno-long-long" "-Wcovered-switch-default" "-Wnon-virtual-dtor" "-Wdelete-non-virtual-dtor" "-Wno-comment" "-Wstring-conversion" "-Werror=date-time" "-std=c++11" "-ffunction-sections" "-fdata-sections" "-O3" "-DNDEBUG" "-fno-exceptions" "-fno-rtti" "-D_GNU_SOURCE" "-D__STDC_CONSTANT_MACROS" "-D__STDC_FORMAT_MACROS" "-D__STDC_LIMIT_MACROS" "-DLLVM_COMPONENT_ASMPARSER" "-DLLVM_COMPONENT_BITREADER" "-DLLVM_COMPONENT_BITWRITER" "-DLLVM_COMPONENT_INSTRUMENTATION" "-DLLVM_COMPONENT_INTERPRETER" "-DLLVM_COMPONENT_IPO" "-DLLVM_COMPONENT_JSBACKEND" "-DLLVM_COMPONENT_LINKER" "-DLLVM_COMPONENT_LTO" "-DLLVM_COMPONENT_MCJIT" "-DLLVM_RUSTLLVM" "-o" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-rustc/x86_64-unknown-linux-gnu/release/build/rustc_llvm-d574027f8db565f3/out/../rustllvm/PassWrapper.o" "-c" "../rustllvm/PassWrapper.cpp"
[00:26:42] cargo:warning=../rustllvm/PassWrapper.cpp:1133:50: error: no member named 'keys' in 'llvm::StringMap<std::map<unsigned long, unsigned int, std::less<unsigned long>, std::allocator<std::pair<const unsigned long, unsigned int> > >, llvm::MallocAllocator>'
[00:26:42] cargo:warning=    for (const auto imported_module_id : imports.keys()) {
[00:26:42] cargo:warning=                                         ~~~~~~~ ^
[00:26:42] cargo:warning=1 error generated.
[00:26:42] 
[00:26:42] --- stderr
[00:26:42] thread 'main' panicked at '
[00:26:42] 
[00:26:42] 
[00:26:42] Internal error occurred: Command "sccache" "clang++" "-O2" "-ffunction-sections" "-fdata-sections" "-fPIC" "-ffunction-sections" "-fdata-sections" "-fPIC" "--target=x86_64-unknown-linux-gnu" "--target=x86_64-unknown-linux-gnu" "-I/checkout/obj/build/x86_64-unknown-linux-gnu/llvm-emscripten/include" "-ffunction-sections" "-fdata-sections" "-fPIC" "--target=x86_64-unknown-linux-gnu" "-fPIC" "-fvisibility-inlines-hidden" "-Wall" "-W" "-Wno-unused-parameter" "-Wwrite-strings" "-Wcast-qual" "-Wmissing-field-initializers" "-pedantic" "-Wno-long-long" "-Wcovered-switch-default" "-Wnon-virtual-dtor" "-Wdelete-non-virtual-dtor" "-Wno-comment" "-Wstring-conversion" "-Werror=date-time" "-std=c++11" "-ffunction-sections" "-fdata-sections" "-O3" "-DNDEBUG" "-fno-exceptions" "-fno-rtti" "-D_GNU_SOURCE" "-D__STDC_CONSTANT_MACROS" "-D__STDC_FORMAT_MACROS" "-D__STDC_LIMIT_MACROS" "-DLLVM_COMPONENT_ASMPARSER" "-DLLVM_COMPONENT_BITREADER" "-DLLVM_COMPONENT_BITWRITER" "-DLLVM_COMPONENT_INSTRUMENTATION" "-DLLVM_COMPONENT_INTERPRETER" "-DLLVM_COMPONENT_IPO" "-DLLVM_COMPONENT_JSBACKEND" "-DLLVM_COMPONENT_LINKER" "-DLLVM_COMPONENT_LTO" "-DLLVM_COMPONENT_MCJIT" "-DLLVM_RUSTLLVM" "-o" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-rustc/x86_64-unknown-linux-gnu/release/build/rustc_llvm-d574027f8db565f3/out/../rustllvm/PassWrapper.o" "-c" "../rustllvm/PassWrapper.cpp" with args "clang++" did not execute successfully (status code exit code: 1).
[00:26:42] ', /cargo/registry/src/gh.neting.cc-1ecc6299db9ec823/cc-1.0.17/src/lib.rs:2180:5
[00:26:42] note: Run with `RUST_BACKTRACE=1` for a backtrace.
[00:26:42] 
[00:26:42] 
[00:26:42] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "build" "--target" "x86_64-unknown-linux-gnu" "-j" "4" "--release" "--locked" "--color" "always" "--manifest-path" "/checkout/src/librustc_codegen_llvm/Cargo.toml" "--features" " jemalloc emscripten" "--message-format" "json"
[00:26:42] expected success, got: exit code: 101
[00:26:42] thread 'main' panicked at 'cargo must succeed', bootstrap/compile.rs:1117:9
travis_time:start:stage0-rustc_codegen_llvm
travis_fold:end:stage0-rustc_codegen_llvm

[00:26:42] note: Run with `RUST_BACKTRACE=1` for a backtrace.
---
travis_time:end:0d938ba0:start=1531416772331973760,finish=1531416772338225524,duration=6251764
travis_fold:end:after_failure.3
travis_fold:start:after_failure.4
travis_time:start:0f685200
$ head -30 ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers || true
head: cannot open ‘./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers’ for reading: No such file or directory
travis_fold:end:after_failure.4
travis_fold:start:after_failure.5
travis_time:start:25811a6c
$ dmesg | grep -i kill

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

1 similar comment
@rust-highfive
Copy link
Collaborator

The job dist-x86_64-linux of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
travis_fold:end:services

travis_fold:start:git.checkout
travis_time:start:023bee34
$ git clone --depth=2 --branch=try https://github.com/rust-lang/rust.git rust-lang/rust
---
[00:26:37]    Compiling rustc_llvm v0.0.0 (file:///checkout/src/librustc_llvm)
[00:26:42] error: failed to run custom build command for `rustc_llvm v0.0.0 (file:///checkout/src/librustc_llvm)`
[00:26:42] process didn't exit successfully: `/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-rustc/release/build/rustc_llvm-42df884059483cad/build-script-build` (exit code: 101)
[00:26:42] --- stdout
[00:26:42] cargo:rerun-if-changed=/checkout/obj/build/x86_64-unknown-linux-gnu/llvm-emscripten/bin/llvm-config
[00:26:42] cargo:rerun-if-env-changed=LLVM_CONFIG
[00:26:42] cargo:rustc-cfg=llvm_component="asmparser"
[00:26:42] cargo:rustc-cfg=llvm_component="bitreader"
[00:26:42] cargo:rustc-cfg=llvm_component="bitwriter"
[00:26:42] cargo:rustc-cfg=llvm_component="instrumentation"
[00:26:42] cargo:rustc-cfg=llvm_component="interpreter"
[00:26:42] cargo:rustc-cfg=llvm_component="ipo"
[00:26:42] cargo:rustc-cfg=llvm_component="jsbackend"
[00:26:42] cargo:rustc-cfg=llvm_component="linker"
[00:26:42] cargo:rustc-cfg=llvm_component="lto"
[00:26:42] cargo:rustc-cfg=llvm_component="mcjit"
[00:26:42] cargo:rerun-if-changed-env=LLVM_RUSTLLVM
[00:26:42] cargo:rerun-if-changed=../rustllvm/ArchiveWrapper.cpp
[00:26:42] cargo:rerun-if-changed=../rustllvm/Linker.cpp
[00:26:42] cargo:rerun-if-changed=../rustllvm/llvm-rebuild-trigger
[00:26:42] cargo:rerun-if-changed=../rustllvm/RustWrapper.cpp
[00:26:42] cargo:rerun-if-changed=../rustllvm/rustllvm.h
[00:26:42] cargo:rerun-if-changed=../rustllvm/README
[00:26:42] cargo:rerun-if-changed=../rustllvm/.editorconfig
[00:26:42] cargo:rerun-if-changed=../rustllvm/PassWrapper.cpp
[00:26:42] OPT_LEVEL = Some("2")
[00:26:42] TARGET = Some("x86_64-unknown-linux-gnu")
[00:26:42] HOST = Some("x86_64-unknown-linux-gnu")
[00:26:42] TARGET = Some("x86_64-unknown-linux-gnu")
[00:26:42] TARGET = Some("x86_64-unknown-linux-gnu")
[00:26:42] TARGET = Some("x86_64-unknown-linux-gnu")
[00:26:42] HOST = Some("x86_64-unknown-linux-gnu")
[00:26:42] CXX_x86_64-unknown-linux-gnu = Some("sccache clang++")
[00:26:42] HOST = Some("x86_64-unknown-linux-gnu")
[00:26:42] CXXFLAGS_x86_64-unknown-linux-gnu = Some("-ffunction-sections -fdata-sections -fPIC --target=x86_64-unknown-linux-gnu")
[00:26:42] DEBUG = Some("false")
[00:26:42] DEBUG = Some("false")
[00:26:42] running: "sccache" "clang++" "-O2" "-ffunction-sections" "-fdata-sections" "-fPIC" "-ffunction-sections" "-fdata-sections" "-fPIC" "--target=x86_64-unknown-linux-gnu" "--target=x86_64-unknown-linux-gnu" "-I/checkout/obj/build/x86_64-unknown-linux-gnu/llvm-emscripten/include" "-ffunction-sections" "-fdata-sections" "-fPIC" "--target=x86_64-unknown-linux-gnu" "-fPIC" "-fvisibility-inlines-hidden" "-Wall" "-W" "-Wno-unused-parameter" "-Wwrite-strings" "-Wcast-qual" "-Wmissing-field-initializers" "-pedantic" "-Wno-long-long" "-Wcovered-switch-default" "-Wnon-virtual-dtor" "-Wdelete-non-virtual-dtor" "-Wno-comment" "-Wstring-conversion" "-Werror=date-time" "-std=c++11" "-ffunction-sections" "-fdata-sections" "-O3" "-DNDEBUG" "-fno-exceptions" "-fno-rtti" "-D_GNU_SOURCE" "-D__STDC_CONSTANT_MACROS" "-D__STDC_FORMAT_MACROS" "-D__STDC_LIMIT_MACROS" "-DLLVM_COMPONENT_ASMPARSER" "-DLLVM_COMPONENT_BITREADER" "-DLLVM_COMPONENT_BITWRITER" "-DLLVM_COMPONENT_INSTRUMENTATION" "-DLLVM_COMPONENT_INTERPRETER" "-DLLVM_COMPONENT_IPO" "-DLLVM_COMPONENT_JSBACKEND" "-DLLVM_COMPONENT_LINKER" "-DLLVM_COMPONENT_LTO" "-DLLVM_COMPONENT_MCJIT" "-DLLVM_RUSTLLVM" "-o" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-rustc/x86_64-unknown-linux-gnu/release/build/rustc_llvm-d574027f8db565f3/out/../rustllvm/PassWrapper.o" "-c" "../rustllvm/PassWrapper.cpp"
[00:26:42] cargo:warning=../rustllvm/PassWrapper.cpp:1133:50: error: no member named 'keys' in 'llvm::StringMap<std::map<unsigned long, unsigned int, std::less<unsigned long>, std::allocator<std::pair<const unsigned long, unsigned int> > >, llvm::MallocAllocator>'
[00:26:42] cargo:warning=    for (const auto imported_module_id : imports.keys()) {
[00:26:42] cargo:warning=                                         ~~~~~~~ ^
[00:26:42] cargo:warning=1 error generated.
[00:26:42] 
[00:26:42] --- stderr
[00:26:42] thread 'main' panicked at '
[00:26:42] 
[00:26:42] 
[00:26:42] Internal error occurred: Command "sccache" "clang++" "-O2" "-ffunction-sections" "-fdata-sections" "-fPIC" "-ffunction-sections" "-fdata-sections" "-fPIC" "--target=x86_64-unknown-linux-gnu" "--target=x86_64-unknown-linux-gnu" "-I/checkout/obj/build/x86_64-unknown-linux-gnu/llvm-emscripten/include" "-ffunction-sections" "-fdata-sections" "-fPIC" "--target=x86_64-unknown-linux-gnu" "-fPIC" "-fvisibility-inlines-hidden" "-Wall" "-W" "-Wno-unused-parameter" "-Wwrite-strings" "-Wcast-qual" "-Wmissing-field-initializers" "-pedantic" "-Wno-long-long" "-Wcovered-switch-default" "-Wnon-virtual-dtor" "-Wdelete-non-virtual-dtor" "-Wno-comment" "-Wstring-conversion" "-Werror=date-time" "-std=c++11" "-ffunction-sections" "-fdata-sections" "-O3" "-DNDEBUG" "-fno-exceptions" "-fno-rtti" "-D_GNU_SOURCE" "-D__STDC_CONSTANT_MACROS" "-D__STDC_FORMAT_MACROS" "-D__STDC_LIMIT_MACROS" "-DLLVM_COMPONENT_ASMPARSER" "-DLLVM_COMPONENT_BITREADER" "-DLLVM_COMPONENT_BITWRITER" "-DLLVM_COMPONENT_INSTRUMENTATION" "-DLLVM_COMPONENT_INTERPRETER" "-DLLVM_COMPONENT_IPO" "-DLLVM_COMPONENT_JSBACKEND" "-DLLVM_COMPONENT_LINKER" "-DLLVM_COMPONENT_LTO" "-DLLVM_COMPONENT_MCJIT" "-DLLVM_RUSTLLVM" "-o" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-rustc/x86_64-unknown-linux-gnu/release/build/rustc_llvm-d574027f8db565f3/out/../rustllvm/PassWrapper.o" "-c" "../rustllvm/PassWrapper.cpp" with args "clang++" did not execute successfully (status code exit code: 1).
[00:26:42] ', /cargo/registry/src/gh.neting.cc-1ecc6299db9ec823/cc-1.0.17/src/lib.rs:2180:5
[00:26:42] note: Run with `RUST_BACKTRACE=1` for a backtrace.
[00:26:42] 
[00:26:42] 
[00:26:42] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "build" "--target" "x86_64-unknown-linux-gnu" "-j" "4" "--release" "--locked" "--color" "always" "--manifest-path" "/checkout/src/librustc_codegen_llvm/Cargo.toml" "--features" " jemalloc emscripten" "--message-format" "json"
[00:26:42] expected success, got: exit code: 101
[00:26:42] thread 'main' panicked at 'cargo must succeed', bootstrap/compile.rs:1117:9
travis_time:start:stage0-rustc_codegen_llvm
travis_fold:end:stage0-rustc_codegen_llvm

[00:26:42] note: Run with `RUST_BACKTRACE=1` for a backtrace.
---
travis_time:end:0d938ba0:start=1531416772331973760,finish=1531416772338225524,duration=6251764
travis_fold:end:after_failure.3
travis_fold:start:after_failure.4
travis_time:start:0f685200
$ head -30 ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers || true
head: cannot open ‘./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers’ for reading: No such file or directory
travis_fold:end:after_failure.4
travis_fold:start:after_failure.5
travis_time:start:25811a6c
$ dmesg | grep -i kill

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@michaelwoerister
Copy link
Member Author

I don't understand what's going on here. The keys() method is right there in the source code. Is the dist build doing something weird with LLVM? cc @rust-lang/release

@cuviper
Copy link
Member

cuviper commented Jul 12, 2018

The error is on llvm-emscripten, based on LLVM 4.0 which doesn't have keys(). LLVM 5 does have it.

@michaelwoerister
Copy link
Member Author

Ah, that clears it up. Thanks, @cuviper!

@michaelwoerister
Copy link
Member Author

@bors try

@bors
Copy link
Contributor

bors commented Jul 13, 2018

⌛ Trying commit 4e28b6c with merge 02e01bd...

bors added a commit that referenced this pull request Jul 13, 2018
[wip] Enable ThinLTO with incremental compilation.

This (work-in-progress) PR allows for combining ThinLTO and incremental compilation. I'll write up something more detailed once the kinks are worked out. For now, I'm mostly interested in what this does to compile times.
@bors
Copy link
Contributor

bors commented Jul 13, 2018

☀️ Test successful - status-travis
State: approved= try=True

@michaelwoerister
Copy link
Member Author

@rust-timer build 02e01bd

@rust-timer
Copy link
Collaborator

Success: Queued 02e01bd with parent 68c39b9, comparison URL.

@alexcrichton
Copy link
Member

Looks like cache hits aren't happening?

@michaelwoerister
Copy link
Member Author

I think the results are kind of expected because of how coarse-grained this still is. Right now if something changes in a module then all modules important anything from that module (even something that has not really changed) need to completely re-compiled.

There's room for improvement: If we track changes for every CodegenItem separately (instead of for whole CodegenUnits at a time) then we (probably) can use the more detailed information available in the ThinLTO import map.

@michaelwoerister
Copy link
Member Author

I updated the PR description: #52309 (comment)

@rust-lang/compiler @rust-lang/wg-compiler-performance

@Mark-Simulacrum
Copy link
Member

In terms of testing, you could try to build the compiler incrementally with ThinLTO and see if it's tests still pass, potentially doing a local perf run to compare against a incrementally compiled compiler (w/o ThinLTO), as well.

@alexcrichton
Copy link
Member

Thanks @michaelwoerister! I wonder though, I think that incremental ThinLTO upstream at least isn't supposed to have such bad performance, so I think we're missing a step in impelmenting this? I thought incremental ThinLTO would looks something like this:

  • As usual, incrementally translate and optimize a bunch of LLVM modules. This may have a bunch of cache hits (especially in the println case)
  • Next we calculate the ThinLTO module summary index for everything (unconditional work)
  • Next we take this index and generate a unique hash for each module which includes items imported from other modules (unconditional work)
  • If there is an object file matching this unique hash we use it from the cache, otherwise we generate a new object file via codegen.
  • Finally, all the output object files here are linked together

It looks like the implementation here is invalidating too eagerly? The construction above would avoid retranslating/optimizing modules that depend on changed modules but don't import any of the changes. In that the cache invalidation needs to only happen for the ThinLTO passes, not the entire module's original optimization passes.

@michaelwoerister
Copy link
Member Author

@alexcrichton, your approach would increase the size of our incr. comp. cache quite a bit (~50%) because we would have to cache LLVM IR before and after ThinLTO. Currently we only cache the later. I'm not strictly opposed to doing that but cache size already is a concern for big projects. It's certainly worth a try though.

Also note that we actually don't have to hash anything, or incr. comp. framework will tell us already which modules have changed. With some tweaking, it should also be possible to know exactly which items within a module have changed. I'll have another look at how the linker-plugins do it though.

@alexcrichton
Copy link
Member

@michaelwoerister indeed! Although I think what I mentioned above is how incremental ThinLTO was designed to work? In that if you're using a linker plugin then all your intermediate files are the IR files and then the final link also has a cache directory with a lot of object files? I think we have some other various methods to greatly improve our cache directory sizes, for example we should probably be using thin archives which just point to the object files elsewhere on disk, and that way object files aren't copied around. (additionally we probably save way too much IR in too many places!)

Within our own incr. comp. framework I think ThinLTO would roughly look like:

  1. There's a query to get the final set of object files. Each object file is the same as our already-existing scheme to partition into CGUs.
  2. Each CGU's object file query queries for the pre-ThinLTO-IR of the object file (optimized) as well as the module summary for this object file.
  3. Querying the module summary queries for the global module summary and then selectively queries for specific objects from other IR modules. Note that here you don't depend on the entire foreign IR module, just what you're importing.

While it's probably possible to integrate it into our own incr. comp. framework and have reuse "just work" it may also be good to shoehorn as close as we can to upstream incremental ThinLTO as that's how it's written to work in the sense of fixing bugs and getting future developments.

I sort of assumed that the way we'd implement incremental ThinLTO was to basically implement custom incrementality in ThinLTO passes (like upstream LLVM does) and basically just get the main incremental directory from the compilation session.

I do think we have to solve the problem here in the sense if I change one CGU that shouldn't force all CGUs which may import the item to be fully recompiled. Instead only those which are known to import the changed item should be recompiled (and even then, not fully recompiled, just post-ThinLTO-passes).

@SimonSapin
Copy link
Contributor

cache size already is a concern for big projects

It’s not just the incremental cache and it’s not just Rust code, but as a single data point I regularly fill the 500 GB on my desktop machine that is dedicated to compilation. Part of this is due to Cargo not having any garbage collection outside of cargo clean. Upgrading to a new Rust Nightly changes the hash in some directory names and can effectively double the size of target/.

@michaelwoerister
Copy link
Member Author

I do think we have to solve the problem here in the sense if I change one CGU that shouldn't force all CGUs which may import the item to be fully recompiled. Instead only those which are known to import the changed item should be recompiled (and even then, not fully recompiled, just post-ThinLTO-passes).

If by "item" you mean CGU then this is already what this PR does (except of course re-use the pre-ThinLTO optimization work because we don't cache it).

I sort of assumed that the way we'd implement incremental ThinLTO was to basically implement custom incrementality in ThinLTO passes (like upstream LLVM does) and basically just get the main incremental directory from the compilation session.

That's what I think we should do too. This PR already makes LLVM IR generation not be a query anymore. It's pretty much a "standard ThinLTO" pipeline.

@alexcrichton
Copy link
Member

@michaelwoerister ah yeah sorry by item I mean per-function. For example if there are two CGUs A and B where A inlines function foo from B but nothing else, then if B's foo function changes then A should be re-ThinLTO'd but not recodegen'd or reoptimized. If another function in B changes, though, then A shouldn't be touched at all and the previously cached version should be used (as we know that the previous version of A didn't import the symbol from B.

I may be misunderstanding what this PR does though? The description leads me to believe that in the above scenario no matter what changes in B then A is fully recompiled unconditionally. Is that not the case though?

@michaelwoerister
Copy link
Member Author

The description leads me to believe that in the above scenario no matter what changes in B then A is fully recompiled unconditionally.

That's the case. The PR does NOY yet implement change tracking at the function level. Do you know if linker-based LTO computes a hash per function? I don't have the code at hand at the moment. The description here sounds like no: http://blog.llvm.org/2016/06/thinlto-scalable-and-incremental-lto.html.

@alexcrichton
Copy link
Member

Aha it appears I'm misremembering! I've gone off this list before, and you're right in that it only looks at entire modules, not hashes of imported contents.

In light of that I think there's still a "main thing" for this PR to implement, right? Which is that we shouldn't recodegen/reoptimize entire modules, but rather only rerun the ThinLTO passes when an input changes, right? Does that make sense?

@michaelwoerister
Copy link
Member Author

In light of that I think there's still a "main thing" for this PR to implement, right? Which is that we shouldn't recodegen/reoptimize entire modules, but rather only rerun the ThinLTO passes when an input changes, right? Does that make sense?

Yes, that makes sense. That should put rustc on par with the linker plugin.

@stokhos
Copy link

stokhos commented Jul 21, 2018

Ping from triage, @michaelwoerister we haven't heard from you for a while, will you have time to work on this PR?

@michaelwoerister
Copy link
Member Author

This has been slightly de-prioritized until linker-plugin-based LTO is under wraps. Since this will need a little design work still, I'm going to close this PR for now.

bors added a commit that referenced this pull request Sep 3, 2018
…hton

Enable ThinLTO with incremental compilation.

This is an updated version of #52309. This PR allows `rustc` to use (local) ThinLTO and incremental compilation at the same time. In theory this should allow for getting compile-time improvements for small changes while keeping the runtime performance of the generated code roughly the same as when compiling non-incrementally.

The difference to #52309 is that this version also caches the pre-LTO version of LLVM bitcode. This allows for another layer of caching:
1. if the module itself has changed, we have to re-codegen and re-optimize.
2. if the module itself has not changed, but a module it imported from during ThinLTO has, we don't need to re-codegen and don't need to re-run the first optimization phase. Only the second (i.e. ThinLTO-) optimization phase is re-run.
3. if neither the module itself nor any of its imports have changed then we can re-use the final, post-ThinLTO version of the module. (We might have to load its pre-ThinLTO version though so it's available for other modules to import from)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants