Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

save LTO import info and check it when trying to reuse build products #67020

Merged

Conversation

pnkfelix
Copy link
Member

@pnkfelix pnkfelix commented Dec 4, 2019

Fix #59535

Previous runs of LTO optimization on the previous incremental build can import larger portions of the dependence graph into a codegen unit than the current compilation run is choosing to import. We need to take that into account when we choose to reuse PostLTO-optimization object files from previous compiler invocations.

This PR accomplishes that by serializing the LTO import information on each incremental build. We load up the previous LTO import data as well as the current LTO import data. Then as we decide whether to reuse previous PostLTO objects or redo LTO optimization, we check whether the LTO import data matches. After we finish with this decision process for every object, we write the LTO import data back to disk.


What is the scenario where comparing against past LTO import information is necessary?

I've tried to capture it in the comments in the regression test, but here's yet another attempt from me to summarize the situation:

  1. Consider a call-graph like [A] -> [B -> D] <- [C] (where the letters are functions and the modules are enclosed in [])
  2. In our specific instance, the earlier compilations were inlining the call toB into A; thus A ended up with a external reference to the symbol D in its object code, to be resolved at subsequent link time. The LTO import information provided by LLVM for those runs reflected that information: it explicitly says during those runs, B definition and D declaration were imported into [A].
  3. The change between incremental builds was that the call D <- C was removed.
  4. That change, coupled with other decisions within rustc, made the compiler decide to make D an internal symbol (since it was no longer accessed from other codegen units, this makes sense locally). And then the definition of D was inlined into B and D itself was eliminated entirely.
  5. The current LTO import information reported that B alone is imported into [A] for the current compilation. So when the Rust compiler surveyed the dependence graph, it determined that nothing [A] imports changed since the last build (and [A] itself has not changed either), so it chooses to reuse the object code generated during the previous compilation.
  6. But that previous object code has an unresolved reference to D, and that causes a link time failure!

The interesting thing is that its quite hard to actually observe the above scenario arising, which is probably why no one has noticed this bug in the year or so since incremental LTO support landed (PR #53673).

I've literally spent days trying to observe the bug on my local machine, but haven't managed to find the magic combination of factors to get LLVM and rustc to do just the right set of the inlining and internal-reclassification choices that cause this particular problem to arise.


Also, I have tried to be careful about injecting new bugs with this PR. Specifically, I was/am worried that we could get into a scenario where overwriting the current LTO import data with past LTO import data would cause us to "forget" a current import. To guard against this, the PR as currently written always asserts, at overwrite time, that the past LTO import-set is a superset of the current LTO import-set. This way, the overwriting process should always be safe to run.

  • The previous note was written based on the first version of this PR. It has since been revised to use a simpler strategy, where we never attempt to merge the past LTO import information into the current one. We just compare them, and act accordingly.
  • Also, as you can see from the comments on the PR itself, I was quite right to be worried about forgetting past imports; that scenario was observable via a trivial transformation of the regression test I had devised.

@rust-highfive
Copy link
Collaborator

r? @eddyb

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Dec 4, 2019
@pnkfelix
Copy link
Member Author

pnkfelix commented Dec 4, 2019

r? @michaelwoerister

@pnkfelix pnkfelix added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Dec 4, 2019
@pnkfelix
Copy link
Member Author

pnkfelix commented Dec 5, 2019

I was/am worried that we could get into a scenario where overwriting the current LTO import data with past LTO import data would cause us to "forget" a current import. To guard against this, the PR as currently written always asserts, at overwrite time, that the past LTO import-set is a superset of the current LTO import-set. This way, the overwriting process should always be safe to run.

I've been thinking more about this assertion this morning. I think I should just remove it.

Here is my thinking: assuming the LTO import set computation is always based on the current state of the code, and we observe when going from revision₁ to revision₂ that the import set for (green) module A has gone from set₁ to set₂ where set₂set₁, then logically one should be able to reverse the direction of the change (going from revision₂ to revision₁), and LLVM may (or even should) compute a corresponding change in the import set for that same (still green) module A where the import set now goes from set₂ to set₁ (where we still have set₁set₂). That would cause the assertion to fail.

Instead, I should just trust the underlying logic: the serialized imports for codegen unit A correspond to whatever information the LTO used for that compilation. If we reuse previously compiled A, then we keep the serialized LTO imports. If we recompile (or re-run LTO optimization) for A, then we use its newly computed LTO imports.

@pnkfelix
Copy link
Member Author

pnkfelix commented Dec 5, 2019

logically one should be able to reverse the direction of the change (going from revision₂ to revision₁)

eek! Yes: One can trivially construct a test showing this with slight modification of the test that I already have on this PR.

Fixing that up now.

@michaelwoerister
Copy link
Member

I'm wondering, do we even need to "overwrite" anything? Or could we make the decision once, before ThinLTO, based on the previous import map? Then save the current map to disk as is, without any merging. It should be correct and deterministic as it is always generated from the entire set of modules, including the cached ones.

@michaelwoerister
Copy link
Member

That is an excellent description of the error scenario btw ❤️ ❤️ ❤️

@michaelwoerister
Copy link
Member

michaelwoerister commented Dec 5, 2019

Here is a version that makes re-use decisions based solely on the previous import map:
https://github.com/rust-lang/rust/pull/52309/files#diff-7495f8d1204a6939d7aa2d406839e509R949-R1008
(EDIT: Clicking the link doesn't scroll down to the function in question, it seems. Look for fn determine_cgu_reuse)

It's from #52309, which was never merged.

@pnkfelix
Copy link
Member Author

pnkfelix commented Dec 5, 2019

I'm wondering, do we even need to "overwrite" anything? Or could we make the decision once, before ThinLTO, based on the previous import map? Then save the current map to disk as is, without any merging. It should be correct and deterministic as it is always generated from the entire set of modules, including the cached ones.

My understanding of how the current import map is generated leads me to think that we cannot get by using the previous imports and storing the current ones. Otherwise, you'll still end up in the same scenario, just one incremental compile-cycle later.

I can try to spell this out in more detail, if need be.

@michaelwoerister
Copy link
Member

I think the problem arises from making decision based on the previous state of the object files in combination with the current import map. If the two things match up, i.e. making decisions about the previous object files based on the previous import map (which always accurately describes them), then the problem would not arise.

I'm generally very uneasy about keeping around data from sessions N-x where x > 1. That could add a kind of indeterminism unless you re-create the entire version history of compiling a crate (which would make for a really nasty debugging experience).

@michaelwoerister
Copy link
Member

If you want, I can try to sketch an alternative PR tomorrow morning, so we'd see if the bug gets fixed.

@pnkfelix
Copy link
Member Author

pnkfelix commented Dec 5, 2019

I think the problem arises from making decision based on the previous state of the object files in combination with the current import map. If the two things match up, i.e. making decisions about the previous object files based on the previous import map (which always accurately describes them), then the problem would not arise.

I'm generally very uneasy about keeping around data from sessions N-x where x > 1. That could add a kind of indeterminism unless you re-create the entire version history of compiling a crate (which would make for a really nasty debugging experience).

But that is indeed the problem: the so-called "previous object files" could have originated from arbitrarily distant compilation sessions, not just the immediately preceding session. So I do not see how we can avoid accumulating information from an arbitrary number of sessions.

@pnkfelix pnkfelix force-pushed the issue-59535-accumulate-past-lto-imports branch from 001cf76 to f29e4bd Compare December 5, 2019 12:56
@michaelwoerister
Copy link
Member

But that is indeed the problem: the so-called "previous object files" can come from arbitrarily distant compilation sessions.

Yes, but the import map is always re-computed from scratch from the entire set of modules, including the older ones. That will make it up-to-date.

@michaelwoerister
Copy link
Member

In other words: The import map should always be the same, regardless of what your cache looks like and even if there is no cache. That makes it a function of the current state of the source code. (<= that describes more accurately what I mean).

@pnkfelix
Copy link
Member Author

pnkfelix commented Dec 5, 2019

Okay. I think I see what you are saying. It seems like it the import map can change in non-local ways though, right? I mean, the [A] -> [B -> C] <- [D] example seems to illustrate that a change to D can cause the import map for [A] to change.

This non-locality property doesn't contradict what you said. I just want to have a clear mental model.

Update: the non-locality property is what makes me worry that we would need to accumulate imports from arbitrarily far back. However, on zulip, @michaelwoerister and I have sketched out a different variant approach for fixing this that we're both happy with.

@michaelwoerister
Copy link
Member

Hurray! I was able to construct a pretty small test case for this:

// ./src/test/incremental/thinlto/import_removed.rs

// revisions: cfail1 cfail2
// compile-flags: -O -Zhuman-readable-cgu-names -Cllvm-args=-import-instr-limit=10
// build-pass

// TODO: Add proper description.

fn main() {
    foo::foo();
    bar::baz();
}

mod foo {

    // In cfail1, foo() gets inlined into main.
    // In cfail2, ThinLTO decides that foo() does not get inlined into main, and
    // instead bar() gets inlined into foo(). But faulty logic in our incr.
    // ThinLTO implementation thought that `main()` is unchanged and thus reused
    // the object file still containing a call to the now non-existant bar().
    pub fn foo(){
        bar()
    }

    // This function needs to be big so that it does not get inlined by ThinLTO
    // but *does* get inlined into foo() once it is declared `internal` in
    // cfail2.
    pub fn bar(){
        println!("quux1");
        println!("quux2");
        println!("quux3");
        println!("quux4");
        println!("quux5");
        println!("quux6");
        println!("quux7");
        println!("quux8");
        println!("quux9");
    }
}

mod bar {

    #[inline(never)]
    pub fn baz() {
        #[cfg(cfail1)]
        {
            crate::foo::bar();
        }
    }
}

This test case fails with the expected error on a recent nightly. I have not tested whether the fix in this PR makes it pass, but it should.

@pnkfelix
Copy link
Member Author

pnkfelix commented Dec 6, 2019

(okay this is ready for re-review; I incororated all feedback and adapted the test case, thanks @michaelwoerister !)

@JohnCSimon
Copy link
Member

Ping from triage:
@michaelwoerister - can you please review this pr?

// are doing the ThinLTO in this current compilation cycle.)
//
// See rust-lang/rust#59535.
if let (Some(prev_import_map), true) =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting pattern :)

@pnkfelix pnkfelix force-pushed the issue-59535-accumulate-past-lto-imports branch from aecb511 to 42b00a4 Compare December 20, 2019 03:47
@pnkfelix
Copy link
Member Author

(I went ahead and squashed since I was rebasing anyway to remove the run-make based test.)

@pnkfelix
Copy link
Member Author

@bors r=mw

@bors
Copy link
Contributor

bors commented Dec 20, 2019

📌 Commit 42b00a4 has been approved by mw

@bors
Copy link
Contributor

bors commented Dec 20, 2019

🌲 The tree is currently closed for pull requests below priority 100, this pull request will be tested once the tree is reopened

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Dec 20, 2019
@Centril
Copy link
Contributor

Centril commented Dec 20, 2019

@bors p=200

@bors
Copy link
Contributor

bors commented Dec 20, 2019

⌛ Testing commit 42b00a4 with merge ccd2383...

bors added a commit that referenced this pull request Dec 20, 2019
…ts, r=mw

save LTO import info and check it when trying to reuse build products

Fix #59535

Previous runs of LTO optimization on the previous incremental build can import larger portions of the dependence graph into a codegen unit than the current compilation run is choosing to import. We need to take that into account when we choose to reuse PostLTO-optimization object files from previous compiler invocations.

This PR accomplishes that by serializing the LTO import information on each incremental build. We load up the previous LTO import data as well as the current LTO import data. Then as we decide whether to reuse previous PostLTO objects or redo LTO optimization, we check whether the LTO import data matches. After we finish with this decision process for every object, we write the LTO import data back to disk.

----

What is the scenario where comparing against past LTO import information is necessary?

I've tried to capture it in the comments in the regression test, but here's yet another attempt from me to summarize the situation:

 1. Consider a call-graph like `[A] -> [B -> D] <- [C]` (where the letters are functions and the modules are enclosed in `[]`)
 2. In our specific instance, the earlier compilations were inlining the call to`B` into `A`; thus `A` ended up with a external reference to the symbol `D` in its object code, to be resolved at subsequent link time. The LTO import information provided by LLVM for those runs reflected that information: it explicitly says during those runs, `B` definition and `D` declaration were imported into `[A]`.
 3. The change between incremental builds was that the call `D <- C` was removed.
 4. That change, coupled with other decisions within `rustc`, made the compiler decide to make `D` an internal symbol (since it was no longer accessed from other codegen units, this makes sense locally). And then the definition of `D` was inlined into `B` and `D` itself was eliminated entirely.
  5. The current LTO import information reported that `B` alone is imported into `[A]` for the *current compilation*. So when the Rust compiler surveyed the dependence graph, it determined that nothing `[A]` imports changed since the last build (and `[A]` itself has not changed either), so it chooses to reuse the object code generated during the previous compilation.
  6. But that previous object code has an unresolved reference to `D`, and that causes a link time failure!

----

The interesting thing is that its quite hard to actually observe the above scenario arising, which is probably why no one has noticed this bug in the year or so since incremental LTO support landed (PR #53673).

I've literally spent days trying to observe the bug on my local machine, but haven't managed to find the magic combination of factors to get LLVM and `rustc` to do just the right set of the inlining and `internal`-reclassification choices that cause this particular problem to arise.

----

Also, I have tried to be careful about injecting new bugs with this PR. Specifically, I was/am worried that we could get into a scenario where overwriting the current LTO import data with past LTO import data would cause us to "forget" a current import. ~~To guard against this, the PR as currently written always asserts, at overwrite time, that the past LTO import-set is a *superset* of the current LTO import-set. This way, the overwriting process should always be safe to run.~~
 * The previous note was written based on the first version of this PR. It has since been revised to use a simpler strategy, where we never attempt to merge the past LTO import information into the current one. We just *compare* them, and act accordingly.
 * Also, as you can see from the comments on the PR itself, I was quite right to be worried about forgetting past imports; that scenario was observable via a trivial transformation of the regression test I had devised.
@bors
Copy link
Contributor

bors commented Dec 21, 2019

☀️ Test successful - checks-azure
Approved by: mw
Pushing ccd2383 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Dec 21, 2019
@bors bors merged commit 42b00a4 into rust-lang:master Dec 21, 2019
pnkfelix added a commit to pnkfelix/rust that referenced this pull request Apr 14, 2020
…file in

incremental compilation.

This is symmetric to PR rust-lang#67020, which handled the case where the LLVM module's
*imports* changed. This commit builds upon the infrastructure added there; the
export map is just the inverse of the import map, so we can build the export map
at the same time that we load the serialized import map.

Fix rust-lang#69798
bors added a commit to rust-lang-ci/rust that referenced this pull request Apr 17, 2020
…lto-products-when-exports-change, r=nagisa

Do not reuse post LTO products when exports change

Do not reuse post lto products when exports change

Generalizes code from PR rust-lang#67020, which handled case when imports change.

Fix rust-lang#69798
Mark-Simulacrum pushed a commit to Mark-Simulacrum/rust that referenced this pull request Apr 17, 2020
…file in

incremental compilation.

This is symmetric to PR rust-lang#67020, which handled the case where the LLVM module's
*imports* changed. This commit builds upon the infrastructure added there; the
export map is just the inverse of the import map, so we can build the export map
at the same time that we load the serialized import map.

Fix rust-lang#69798
bors added a commit to rust-lang-ci/rust that referenced this pull request Apr 21, 2020
…s-all-green, r=nagisa

attempt to recover perf by removing `exports_all_green`

attempt to recover perf by removing `exports_all_green` flag.

cc rust-lang#71248

(My hypothesis is that my use of this flag was an overly conservative generalization of PR rust-lang#67020.)
Aaron1011 added a commit to Aaron1011/rust that referenced this pull request Sep 18, 2020
During incremental ThinLTO compilation, we attempt to re-use the
optimized (post-ThinLTO) bitcode file for a module if it is 'safe' to do
so.

Up until now, 'safe' has meant that the set of modules that our current
modules imports from/exports to is unchanged from the previous
compilation session. See PR rust-lang#67020 and PR rust-lang#71131 for more details.

However, this turns out be insufficient to guarantee that it's safe
to reuse the post-LTO module (i.e. that optimizing the pre-LTO module
would produce the same result). When LLVM optimizes a module during
ThinLTO, it may look at other information from the 'module index', such
as whether a (non-imported!) global variable is used. If this
information changes between compilation runs, we may end up re-using an
optimized module that (for example) had dead-code elimination run on a
function that is now used by another module.

Fortunately, LLVM implements its own ThinLTO module cache, which is used
when ThinLTO is performed by a linker plugin (e.g. when clang is used to
compile a C proect). Using this cache directly would require extensive
refactoring of our code - but fortunately for us, LLVM provides a
function that does exactly what we need.

The function `llvm::computeLTOCacheKey` is used to compute a SHA-1 hash
from all data that might influence the result of ThinLTO on a module.
In addition to the module imports/exports that we manually track, it
also hashes information about global variables (e.g. their liveness)
which might be used during optimization. By using this function, we
shouldn't have to worry about new LLVM passes breaking our module re-use
behavior.

In LLVM, the output of this function forms part of the filename used to
store the post-ThinLTO module. To keep our current filename structure
intact, this PR just writes out the mapping 'CGU name -> Hash' to a
file. To determine if a post-LTO module should be reused, we compare
hashes from the previous session.

This should unblock PR rust-lang#75199 - by sheer chance, it seems to have hit
this issue due to the particular CGU partitioning and optimization
decisions that end up getting made.
bors added a commit to rust-lang-ci/rust that referenced this pull request Oct 11, 2020
…twco,nikic

Use llvm::computeLTOCacheKey to determine post-ThinLTO CGU reuse

During incremental ThinLTO compilation, we attempt to re-use the
optimized (post-ThinLTO) bitcode file for a module if it is 'safe' to do
so.

Up until now, 'safe' has meant that the set of modules that our current
modules imports from/exports to is unchanged from the previous
compilation session. See PR rust-lang#67020 and PR rust-lang#71131 for more details.

However, this turns out be insufficient to guarantee that it's safe
to reuse the post-LTO module (i.e. that optimizing the pre-LTO module
would produce the same result). When LLVM optimizes a module during
ThinLTO, it may look at other information from the 'module index', such
as whether a (non-imported!) global variable is used. If this
information changes between compilation runs, we may end up re-using an
optimized module that (for example) had dead-code elimination run on a
function that is now used by another module.

Fortunately, LLVM implements its own ThinLTO module cache, which is used
when ThinLTO is performed by a linker plugin (e.g. when clang is used to
compile a C proect). Using this cache directly would require extensive
refactoring of our code - but fortunately for us, LLVM provides a
function that does exactly what we need.

The function `llvm::computeLTOCacheKey` is used to compute a SHA-1 hash
from all data that might influence the result of ThinLTO on a module.
In addition to the module imports/exports that we manually track, it
also hashes information about global variables (e.g. their liveness)
which might be used during optimization. By using this function, we
shouldn't have to worry about new LLVM passes breaking our module re-use
behavior.

In LLVM, the output of this function forms part of the filename used to
store the post-ThinLTO module. To keep our current filename structure
intact, this PR just writes out the mapping 'CGU name -> Hash' to a
file. To determine if a post-LTO module should be reused, we compare
hashes from the previous session.

This should unblock PR rust-lang#75199 - by sheer chance, it seems to have hit
this issue due to the particular CGU partitioning and optimization
decisions that end up getting made.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incremental compilation results in linker error when method use is removed
7 participants