Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rustc: Implement ThinLTO #44841

Merged
merged 1 commit into from
Oct 8, 2017
Merged

rustc: Implement ThinLTO #44841

merged 1 commit into from
Oct 8, 2017

Conversation

alexcrichton
Copy link
Member

This commit is an implementation of LLVM's ThinLTO for consumption in rustc
itself. Currently today LTO works by merging all relevant LLVM modules into one
and then running optimization passes. "Thin" LTO operates differently by having
more sharded work and allowing parallelism opportunities between optimizing
codegen units. Further down the road Thin LTO also allows incremental LTO
which should enable even faster release builds without compromising on the
performance we have today.

This commit uses a -Z thinlto flag to gate whether ThinLTO is enabled. It then
also implements two forms of ThinLTO:

  • In one mode we'll only perform ThinLTO over the codegen units produced in a
    single compilation. That is, we won't load upstream rlibs, but we'll instead
    just perform ThinLTO amongst all codegen units produced by the compiler for
    the local crate. This is intended to emulate a desired end point where we have
    codegen units turned on by default for all crates and ThinLTO allows us to do
    this without performance loss.

  • In anther mode, like full LTO today, we'll optimize all upstream dependencies
    in "thin" mode. Unlike today, however, this LTO step is fully parallelized so
    should finish much more quickly.

There's a good bit of comments about what the implementation is doing and where
it came from, but the tl;dr; is that currently most of the support here is
copied from upstream LLVM. This code duplication is done for a number of
reasons:

  • Controlling parallelism means we can use the existing jobserver support to
    avoid overloading machines.
  • We will likely want a slightly different form of incremental caching which
    integrates with our own incremental strategy, but this is yet to be
    determined.
  • This buys us some flexibility about when/where we run ThinLTO, as well as
    having it tailored to fit our needs for the time being.
  • Finally this allows us to reuse some artifacts such as our TargetMachine
    creation, where all our options we used today aren't necessarily supported by
    upstream LLVM yet.

My hope is that we can get some experience with this copy/paste in tree and then
eventually upstream some work to LLVM itself to avoid the duplication while
still ensuring our needs are met. Otherwise I fear that maintaining these
bindings may be quite costly over the years with LLVM updates!

@rust-highfive
Copy link
Collaborator

r? @arielb1

(rust_highfive has picked a reviewer for you, use r? to override)

@alexcrichton
Copy link
Member Author

r? @michaelwoerister

Note that the first commit here is from #44783, so shouldn't need extra review, the thin support should all be in the second. Example output of the modified time_graph module is here (that's a thin LTO measurement for Cargo), and I've also collected some various data about ThinLTO, codegen units, compiletime, and runtime.

if codegen_unit_name.contains(NUMBERED_CODEGEN_UNIT_MARKER) {
// If we use the numbered naming scheme for modules, we don't want
// the files to look like <crate-name><extra>.<crate-name>.<index>.<ext>
// but simply <crate-name><extra>.<index>.<ext>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michaelwoerister mind helping me understand what was going on here? For ThinLTO we need to make sure that all the objects and such have unique names, which this comment seems to indicate we will have achieved. (although in practice I didn't see crate name hashes and such in those names)

With this name munging left in though I found that lots of objects were overwriting one another by accident, because I think the backend of shuffling files around was "getting weird". I couldn't find a downside to removing this logic, though, so I was curious if you knew what this was originally added for?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic looks like a measure for keeping filenames/paths short. We could probably just get rid of the "numbered" codegen unit naming scheme (incremental does not use that). That was only introduced to stay close to the then existing behavior. That would have the downside of codegen units having somewhat misleading names but they should be unique.

@michaelwoerister
Copy link
Member

That's a nice surprise :)

  • The compilation speedups look great. How many cores does the machine you're testing on have?
  • Do you know how runtime speeds compare for debug builds with different # of codegen units? Or asked differently, is there any real downside to using multiple codegen units for debug builds?
  • Runtime performance is worse, in some cases significantly so it seems. It think this is acceptable but we might want to think about how we introduce this feature, what recommendations we give, etc.
  • Increasing the number of codegen units seems to not have a detrimental effect on runtime performance. This is great for peak memory consumption (because we can split work into smaller chunks) and also great for incremental compilation.
  • This is awesome 🎉

cc @rust-lang/compiler

@alexcrichton
Copy link
Member Author

How many cores does the machine you're testing on have?

Oops an excellent question! I had 8 cores.

Or asked differently, is there any real downside to using multiple codegen units for debug builds?

AFAIK no, I realized when doing all this that we should just turn this on by default. I was gonna do that in a separate PR orthogonal to this though.

Runtime performance is worse, in some cases significantly so it seems.

Hm sort of! I think it's a lot better than it looks though. For example, here's a comparison between 1 and 16 codegen units, without ThinLTO enabled (of the regex benchmark suite, threshold 5% regression at least). And then here's the same comparison when those 16 CGUs are compiled with ThinLTO. Notably nearly every benchmark regesses (sometimes by 10x) with just vanilla codegen units, whereas with ThinLTO the worst regression is 18ns/iter -> 27ns/iter, and ThinLTO even improved the performance in one case!

In other words, my conclusion is that runtime performance is "basically the same" modulo compiler wizardry details. My guess is that any possible performance loss could be easily fixed by #[inline] if truly necessary, but in most cases I doubt this'll be necessary. AFAIK ThinLTO is supposed to be the same as LTO perf-wise on the LLVM layer, so it may just continue to evolve at that layer of implementation as well.

Overally though I definitely wouldn't classify it as overall a significant performance loss, only a very minor performance loss in some esoteric situations, at best. From what I've seen it does everything you'd expect it to do across the board. Then again, that's why this is unstable to start out with :)

@bors
Copy link
Contributor

bors commented Sep 25, 2017

☔ The latest upstream changes (presumably #44085) made this pull request unmergeable. Please resolve the merge conflicts.

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented Sep 26, 2017

Amazing!!

On the subject of performance, note that ThinLTO has historically been missing some optimizations of classical LTO (just due to them not being implemented). One class of such optimizations specifically called out on LLVM's "open projects" page is "propagating more global informations across the program". I assume upstream has made progress on this since LLVM 4.0, so maybe we'll see improved performance just by updating to LLVM 5.0?

alexcrichton added a commit to alexcrichton/rust that referenced this pull request Sep 26, 2017
This commit changes the default of rustc to use 32 codegen units when compiling
in debug mode, typically an opt-level=0 compilation. Since their inception
codegen units have matured quite a bit, gaining features such as:

* Parallel translation and codegen enabling codegen units to get worked on even
  more quickly.
* Deterministic and reliable partitioning through the same infrastructure as
  incremental compilation.
* Global rate limiting through the `jobserver` crate to avoid overloading the
  system.

The largest benefit of codegen units has forever been faster compilation through
parallel processing of modules on the LLVM side of things, using all the cores
available on build machines that typically have many available. Some downsides
have been fixed through the features above, but the major downside remaining is
that using codegen units reduces opportunities for inlining and optimization.
This, however, doesn't matter much during debug builds!

In this commit the default number of codegen units for debug builds has been
raised from 1 to 32. This should enable most `cargo build` compiles that are
bottlenecked on translation and/or code generation to immediately see speedups
through parallelization on available cores.

Work is being done to *always* enable multiple codegen units (and therefore
parallel codegen) but it requires rust-lang#44841 at least to be landed and stabilized,
but stay tuned if you're interested in that aspect!
@mersinvald
Copy link
Contributor

@alexcrichton I've just ran some build time benchmarks on my project that uses a lot of popular rust libraries and codegen (diesel, hyper, serde, tokio, futures, reqwest) on my Intel Core i5 laptop (skylake 2c/4t) and got these results:

1 unit:    92 secs
2 units:   81 secs
4 units:   83 secs
8 units:   85 secs
16 units:  90 secs
32 units:  102 secs

Cargo profile:

# The development profile, used for `cargo build`.
[profile.dev]
opt-level = 0
debug = true 
lto = false
debug-assertions = true 
codegen-units = 4

rustc 1.22.0-nightly (17f56c5 2017-09-21)

As expected, the best results is for codegen units of number of system threads, and 32 is way to much for an average machine.

Did you concider an option to select number of codegen units depending on number of cpus, with num_cpus crate?

Thank you for working on compile times!

@alexcrichton
Copy link
Member Author

@mersinvald very interesting! Did you mean to comment on #44853 though? If so, maybe we can continue over there?

@mersinvald
Copy link
Contributor

mersinvald commented Sep 26, 2017

@alexcrichton ok, sorry :)

@arielb1 arielb1 added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Sep 26, 2017
@alexcrichton alexcrichton force-pushed the thinlto branch 3 times, most recently from d8ae4f8 to d3b0c49 Compare September 26, 2017 21:29
@sgrif
Copy link
Contributor

sgrif commented Sep 26, 2017

@alexcrichton FYI if you were trying to convince us that you weren't a robot, this and #44853 back-to-back aren't helping. ;)

bors added a commit that referenced this pull request Sep 29, 2017
…rister

rustc: Default 32 codegen units at O0

This commit changes the default of rustc to use 32 codegen units when compiling
in debug mode, typically an opt-level=0 compilation. Since their inception
codegen units have matured quite a bit, gaining features such as:

* Parallel translation and codegen enabling codegen units to get worked on even
  more quickly.
* Deterministic and reliable partitioning through the same infrastructure as
  incremental compilation.
* Global rate limiting through the `jobserver` crate to avoid overloading the
  system.

The largest benefit of codegen units has forever been faster compilation through
parallel processing of modules on the LLVM side of things, using all the cores
available on build machines that typically have many available. Some downsides
have been fixed through the features above, but the major downside remaining is
that using codegen units reduces opportunities for inlining and optimization.
This, however, doesn't matter much during debug builds!

In this commit the default number of codegen units for debug builds has been
raised from 1 to 32. This should enable most `cargo build` compiles that are
bottlenecked on translation and/or code generation to immediately see speedups
through parallelization on available cores.

Work is being done to *always* enable multiple codegen units (and therefore
parallel codegen) but it requires #44841 at least to be landed and stabilized,
but stay tuned if you're interested in that aspect!
@bors
Copy link
Contributor

bors commented Sep 29, 2017

☔ The latest upstream changes (presumably #44853) made this pull request unmergeable. Please resolve the merge conflicts.

@alexcrichton alexcrichton added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Sep 30, 2017
@alexcrichton
Copy link
Member Author

Rebased and should be ready for review!

r? @michaelwoerister

@bors
Copy link
Contributor

bors commented Oct 6, 2017

⌛ Testing commit 5f66c99efd92eef321dbe007e5a71b7b757c8f8b with merge f8577605e7e59dd1354533f9465a365cc4cb814d...

@bors
Copy link
Contributor

bors commented Oct 6, 2017

💔 Test failed - status-travis

@kennytm
Copy link
Member

kennytm commented Oct 6, 2017

Multiple incremental tests on x86_64-gnu ICE'd.

...
[00:58:39] ---- [incremental] incremental/spike.rs stdout ----
[00:58:39] 	
[00:58:39] error in revision `rpass2`: compilation failed!
[00:58:39] status: exit code: 101
[00:58:39] command: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustc" "/checkout/src/test/incremental/spike.rs" "-L" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/incremental" "--target=x86_64-unknown-linux-gnu" "--cfg" "rpass2" "-Z" "incremental=/checkout/obj/build/x86_64-unknown-linux-gnu/test/incremental/spike.inc" "--error-format" "json" "-C" "prefer-dynamic" "-o" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/incremental/spike.stage2-x86_64-unknown-linux-gnu" "-Crpath" "-O" "-Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "-Z" "query-dep-graph" "-Z" "query-dep-graph" "-Zincremental-info" "-L" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/incremental/spike.stage2-x86_64-unknown-linux-gnu.incremental.libaux"
[00:58:39] stdout:
[00:58:39] ------------------------------------------
[00:58:39] 
[00:58:39] ------------------------------------------
[00:58:39] stderr:
[00:58:39] ------------------------------------------
[00:58:39] error: internal compiler error: unexpected panic
[00:58:39] 
[00:58:39] note: the compiler unexpectedly panicked. this is a bug.
[00:58:39] 
[00:58:39] note: we would appreciate a bug report: https://github.com/rust-lang/rust/blob/master/CONTRIBUTING.md#bug-reports
[00:58:39] 
[00:58:39] note: rustc 1.22.0-dev running on x86_64-unknown-linux-gnu
[00:58:39] 
[00:58:39] incremental: session directory: 6 files hard-linked
[00:58:39] incremental: session directory: 0 files copied
[00:58:39] thread 'rustc' panicked at 'no entry found for key', /checkout/src/libcore/option.rs:839:4
[00:58:39] note: Run with `RUST_BACKTRACE=1` for a backtrace.
[00:58:39] 
[00:58:39] 
[00:58:39] ------------------------------------------
[00:58:39] 
[00:58:39] thread '[incremental] incremental/spike.rs' panicked at 'explicit panic', /checkout/src/tools/compiletest/src/runtest.rs:2433:8
[00:58:39] 
[00:58:39] 
[00:58:39] failures:
[00:58:39]     [incremental] incremental/add_private_fn_at_krate_root_cc/struct_point.rs
[00:58:39]     [incremental] incremental/cache_file_headers.rs
[00:58:39]     [incremental] incremental/change_add_field/struct_point.rs
[00:58:39]     [incremental] incremental/change_private_fn/struct_point.rs
[00:58:39]     [incremental] incremental/change_private_fn_cc/struct_point.rs
[00:58:39]     [incremental] incremental/change_private_impl_method/struct_point.rs
[00:58:39]     [incremental] incremental/change_private_impl_method_cc/struct_point.rs
[00:58:39]     [incremental] incremental/change_pub_inherent_method_body/struct_point.rs
[00:58:39]     [incremental] incremental/change_pub_inherent_method_sig/struct_point.rs
[00:58:39]     [incremental] incremental/change_symbol_export_status.rs
[00:58:39]     [incremental] incremental/commandline-args.rs
[00:58:39]     [incremental] incremental/issue-35593.rs
[00:58:39]     [incremental] incremental/issue-38222.rs
[00:58:39]     [incremental] incremental/krate-inherent.rs
[00:58:39]     [incremental] incremental/krate-inlined.rs
[00:58:39]     [incremental] incremental/remapped_paths_cc/main.rs
[00:58:39]     [incremental] incremental/remove-private-item-cross-crate/main.rs
[00:58:39]     [incremental] incremental/spans_in_type_debuginfo.rs
[00:58:39]     [incremental] incremental/spike.rs
[00:58:39] 
[00:58:39] test result: �[31mFAILED�(B�[m. 59 passed; 19 failed; 0 ignored; 0 measured; 0 filtered out

@alexcrichton
Copy link
Member Author

@bors: r=michaelwoerister

@bors
Copy link
Contributor

bors commented Oct 6, 2017

📌 Commit 1cb0c99 has been approved by michaelwoerister

@bors
Copy link
Contributor

bors commented Oct 7, 2017

⌛ Testing commit 1cb0c99d055072c9dfcaf922b26fc66f25cbbb43 with merge 54b42bae05a52565bcf4ebb2780d7f94deae900d...

@bors
Copy link
Contributor

bors commented Oct 7, 2017

💔 Test failed - status-travis

let foo = foo as usize as *const u8;
let bar = bar::bar as usize as *const u8;

assert_eq!(*foo, *bar);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

asm.js failed on this line...

[01:25:14] failures:
[01:25:14] 
[01:25:14] ---- [run-pass] run-pass/thin-lto-inlines.rs stdout ----
[01:25:14] 	
[01:25:14] error: test run failed!
[01:25:14] status: exit code: 101
[01:25:14] command: "/emsdk-portable/node/4.1.1_64bit/bin/node" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/run-pass/thin-lto-inlines.stage2-asmjs-unknown-emscripten.js"
[01:25:14] stdout:
[01:25:14] ------------------------------------------
[01:25:14] 3 3
[01:25:14] 
[01:25:14] ------------------------------------------
[01:25:14] stderr:
[01:25:14] ------------------------------------------
[01:25:14] thread 'main' panicked at 'assertion failed: `(left == right)`
[01:25:14]   left: `115`,
[01:25:14]  right: `99`', /checkout/src/test/run-pass/thin-lto-inlines.rs:36:8
[01:25:14] note: Run with `RUST_BACKTRACE=1` for a backtrace.
[01:25:14] 
[01:25:14] ------------------------------------------
[01:25:14] 
[01:25:14] thread '[run-pass] run-pass/thin-lto-inlines.rs' panicked at 'explicit panic', /checkout/src/tools/compiletest/src/runtest.rs:2433:8
[01:25:14] note: Run with `RUST_BACKTRACE=1` for a backtrace.

This commit is an implementation of LLVM's ThinLTO for consumption in rustc
itself. Currently today LTO works by merging all relevant LLVM modules into one
and then running optimization passes. "Thin" LTO operates differently by having
more sharded work and allowing parallelism opportunities between optimizing
codegen units. Further down the road Thin LTO also allows *incremental* LTO
which should enable even faster release builds without compromising on the
performance we have today.

This commit uses a `-Z thinlto` flag to gate whether ThinLTO is enabled. It then
also implements two forms of ThinLTO:

* In one mode we'll *only* perform ThinLTO over the codegen units produced in a
  single compilation. That is, we won't load upstream rlibs, but we'll instead
  just perform ThinLTO amongst all codegen units produced by the compiler for
  the local crate. This is intended to emulate a desired end point where we have
  codegen units turned on by default for all crates and ThinLTO allows us to do
  this without performance loss.

* In anther mode, like full LTO today, we'll optimize all upstream dependencies
  in "thin" mode. Unlike today, however, this LTO step is fully parallelized so
  should finish much more quickly.

There's a good bit of comments about what the implementation is doing and where
it came from, but the tl;dr; is that currently most of the support here is
copied from upstream LLVM. This code duplication is done for a number of
reasons:

* Controlling parallelism means we can use the existing jobserver support to
  avoid overloading machines.
* We will likely want a slightly different form of incremental caching which
  integrates with our own incremental strategy, but this is yet to be
  determined.
* This buys us some flexibility about when/where we run ThinLTO, as well as
  having it tailored to fit our needs for the time being.
* Finally this allows us to reuse some artifacts such as our `TargetMachine`
  creation, where all our options we used today aren't necessarily supported by
  upstream LLVM yet.

My hope is that we can get some experience with this copy/paste in tree and then
eventually upstream some work to LLVM itself to avoid the duplication while
still ensuring our needs are met. Otherwise I fear that maintaining these
bindings may be quite costly over the years with LLVM updates!
@alexcrichton
Copy link
Member Author

@bors: r=michaelwoerister

@bors
Copy link
Contributor

bors commented Oct 7, 2017

📌 Commit 4ca1b19 has been approved by michaelwoerister

@bors
Copy link
Contributor

bors commented Oct 7, 2017

⌛ Testing commit 4ca1b19 with merge 33f1f8654b50e49db838b64e76c4af59bc55ddb5...

@bors
Copy link
Contributor

bors commented Oct 7, 2017

💔 Test failed - status-appveyor

@retep998
Copy link
Member

retep998 commented Oct 7, 2017

failures:
---- [run-make] run-make\issue-26092 stdout ----
	
error: make failed
status: exit code: 2
command: "make"
stdout:
------------------------------------------
PATH="/c/projects/rust/build/x86_64-pc-windows-gnu/test/run-make/issue-26092.stage2-x86_64-pc-windows-gnu:C:\projects\rust\build\x86_64-pc-windows-gnu\stage2\bin:/c/projects/rust/build/x86_64-pc-windows-gnu/stage0-tools/x86_64-pc-windows-gnu/release/deps:/c/projects/rust/build/x86_64-pc-windows-gnu/stage0-sysroot/lib/rustlib/x86_64-pc-windows-gnu/lib:/c/Program Files (x86)/Inno Setup 5:/c/Python27:/c/projects/rust/mingw64/bin:/usr/bin:/c/Perl/site/bin:/c/Perl/bin:/c/Windows/system32:/c/Windows:/c/Windows/System32/Wbem:/c/Windows/System32/WindowsPowerShell/v1.0:/c/Program Files/7-Zip:/c/Program Files/Microsoft/Web Platform Installer:/c/Tools/GitVersion:/c/Tools/PsTools:/c/Program Files/Git LFS:/c/Program Files (x86)/Subversion/bin:/c/Program Files/Microsoft SQL Server/120/Tools/Binn:/c/Program Files/Microsoft SQL Server/Client SDK/ODBC/110/Tools/Binn:/c/Program Files (x86)/Microsoft SQL Server/120/Tools/Binn:/c/Program Files/Microsoft SQL Server/120/DTS/Binn:/c/Program Files (x86)/Microsoft SQL Server/120/Tools/Binn/ManagementStudio:/c/Tools/WebDriver:/c/Program Files (x86)/Microsoft SDKs/TypeScript/1.4:/c/Program Files (x86)/Microsoft Visual Studio 12.0/Common7/IDE/PrivateAssemblies:/c/Program Files (x86)/Microsoft SDKs/Azure/CLI/wbin:/c/Ruby193/bin:/c/Tools/NUnit/bin:/c/Tools/xUnit:/c/Tools/MSpec:/c/Tools/Coverity/bin:/c/Program Files (x86)/CMake/bin:/c/go/bin:/c/Program Files/Java/jdk1.8.0/bin:/c/Python27:/c/Program Files/nodejs:/c/Program Files (x86)/iojs:/c/Program Files/iojs:/c/Users/appveyor/AppData/Roaming/npm:/c/Program Files/Microsoft SQL Server/130/Tools/Binn:/c/Program Files (x86)/MSBuild/14.0/Bin:/c/Tools/NuGet:/c/Program Files (x86)/Microsoft Visual Studio 14.0/Common7/IDE/CommonExtensions/Microsoft/TestWindow:/c/Program Files/Microsoft DNX/Dnvm:/c/Program Files/Microsoft SQL Server/Client SDK/ODBC/130/Tools/Binn:/c/Program Files (x86)/Microsoft SQL Server/130/Tools/Binn:/c/Program Files (x86)/Microsoft SQL Server/130/DTS/Binn:/c/Program Files/Microsoft SQL Server/130/DTS/Binn:/c/Program Files (x86)/Microsoft SQL Server/110/DTS/Binn:/c/Program Files (x86)/Microsoft SQL Server/120/DTS/Binn:/c/Program Files (x86)/Apache/Maven/bin:/c/Python27/Scripts:/c/Tools/NUnit3:/c/Program Files/Mercurial:/c/Program Files/LLVM/bin:/c/Program Files/dotnet:/c/Program Files/erl8.3/bin:/c/Tools/curl/bin:/c/Program Files/Amazon/AWSCLI:/c/Program Files (x86)/Microsoft SQL Server/140/DTS/Binn:/c/Program Files (x86)/Microsoft Visual Studio 14.0/Common7/IDE/Extensions/Microsoft/SQLDB/DAC/140:/c/Program Files (x86)/Yarn/bin:/c/Program Files/Git/cmd:/c/Program Files/Git/usr/bin:/c/ProgramData/chocolatey/bin:/c/Tools/vcpkg:/c/Program Files (x86)/nodejs:/c/Program Files/Microsoft Service Fabric/bin/Fabric/Fabric.Code:/c/Program Files/Microsoft SDKs/Service Fabric/Tools/ServiceFabricLocalClusterManager:/c/Users/appveyor/AppData/Local/Yarn/bin:/c/Users/appveyor/AppData/Roaming/npm:/c/Program Files/AppVeyor/BuildAgent:/c/projects/rust:/c/projects/rust/handle" 'C:\projects\rust\build\x86_64-pc-windows-gnu\stage2\bin\rustc.exe' --out-dir /c/projects/rust/build/x86_64-pc-windows-gnu/test/run-make/issue-26092.stage2-x86_64-pc-windows-gnu -L /c/projects/rust/build/x86_64-pc-windows-gnu/test/run-make/issue-26092.stage2-x86_64-pc-windows-gnu  -o "" blank.rs 2>&1 | \
		grep -i 'No such file or directory'
------------------------------------------
stderr:
------------------------------------------
make: *** [Makefile:4: all] Error 1
------------------------------------------
thread '[run-make] run-make\issue-26092' panicked at 'explicit panic', src\tools\compiletest\src\runtest.rs:2433:8
note: Run with `RUST_BACKTRACE=1` for a backtrace.
failures:
    [run-make] run-make\issue-26092
test result: FAILED. 160 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out

@petrochenkov
Copy link
Contributor

Spurious. (#43402)
@bors retry

@bors
Copy link
Contributor

bors commented Oct 7, 2017

⌛ Testing commit 4ca1b19 with merge ac76206...

bors added a commit that referenced this pull request Oct 7, 2017
rustc: Implement ThinLTO

This commit is an implementation of LLVM's ThinLTO for consumption in rustc
itself. Currently today LTO works by merging all relevant LLVM modules into one
and then running optimization passes. "Thin" LTO operates differently by having
more sharded work and allowing parallelism opportunities between optimizing
codegen units. Further down the road Thin LTO also allows *incremental* LTO
which should enable even faster release builds without compromising on the
performance we have today.

This commit uses a `-Z thinlto` flag to gate whether ThinLTO is enabled. It then
also implements two forms of ThinLTO:

* In one mode we'll *only* perform ThinLTO over the codegen units produced in a
  single compilation. That is, we won't load upstream rlibs, but we'll instead
  just perform ThinLTO amongst all codegen units produced by the compiler for
  the local crate. This is intended to emulate a desired end point where we have
  codegen units turned on by default for all crates and ThinLTO allows us to do
  this without performance loss.

* In anther mode, like full LTO today, we'll optimize all upstream dependencies
  in "thin" mode. Unlike today, however, this LTO step is fully parallelized so
  should finish much more quickly.

There's a good bit of comments about what the implementation is doing and where
it came from, but the tl;dr; is that currently most of the support here is
copied from upstream LLVM. This code duplication is done for a number of
reasons:

* Controlling parallelism means we can use the existing jobserver support to
  avoid overloading machines.
* We will likely want a slightly different form of incremental caching which
  integrates with our own incremental strategy, but this is yet to be
  determined.
* This buys us some flexibility about when/where we run ThinLTO, as well as
  having it tailored to fit our needs for the time being.
* Finally this allows us to reuse some artifacts such as our `TargetMachine`
  creation, where all our options we used today aren't necessarily supported by
  upstream LLVM yet.

My hope is that we can get some experience with this copy/paste in tree and then
eventually upstream some work to LLVM itself to avoid the duplication while
still ensuring our needs are met. Otherwise I fear that maintaining these
bindings may be quite costly over the years with LLVM updates!
@bors
Copy link
Contributor

bors commented Oct 8, 2017

☀️ Test successful - status-appveyor, status-travis
Approved by: michaelwoerister
Pushing ac76206 to master...

@bors bors merged commit 4ca1b19 into rust-lang:master Oct 8, 2017
@alexcrichton alexcrichton deleted the thinlto branch October 8, 2017 02:08
@michaelwoerister
Copy link
Member

🎉 🎉 🎉 🎉 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author.
Projects
None yet
Development

Successfully merging this pull request may close these issues.