Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster resolver: clean code and the backtrack_stack #5187

Merged
merged 8 commits into from
Mar 17, 2018

Conversation

Eh2406
Copy link
Contributor

@Eh2406 Eh2406 commented Mar 15, 2018

This is a small extension to #5168 and is inspired by #4834 (comment)

After #5168 these work (and don't on cargo from nightly.):

  • safe_core = "=0.22.4"
  • safe_vault = "=0.13.2"

But these don't work (and do on cargo from this PR.)

  • crust = "=0.24.0"
  • elastic = "=0.3.0"
  • elastic = "=0.4.0"
  • elastic = "=0.5.0"
  • safe_vault = "=0.14.0"

It took some work to figure out why they are not working, and make a test case.

This PR remove use of conflicting_activations before it is extended with the conflicting from next.
#5187 (comment)
However the find_candidate( is still needed so it now gets the conflicting from next before being called.

It often happens that the candidate whose child will fail leading to it's failure, will have older siblings that have already set up backtrack_frames. The candidate knows that it's failure can not be saved by its siblings, but sometimes we activate the child anyway for the error messages. Unfortunately the child does not know that is uncles can't save it, so it backtracks to one of them. Leading to a combinatorial loop.

The solution is to clear the backtrack_stack if we are activating just for the error messages.

Edit original end of this message, no longer accurate.
#5168 means that when we find a permanent problem we will never activate its parent again. In practise there afften is a lot of work and backtrack_frames between the problem and reactivating its parent. This PR removes backtrack_frames where its parent and the problem are present. This means that when we find a permanent problem we will never backtrack into it again.

An alternative is to scan all cashed problems while backtracking, but this seemed more efficient.

@rust-highfive
Copy link

r? @alexcrichton

(rust_highfive has picked a reviewer for you, use r? to override)

Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I've tried to sit down and understand all this and I think this is indeed right! Sorry I'm having a lot of trouble keeping this all in my head...

parent: Option<&PackageId>,
conflicting_activations: &HashMap<PackageId, ConflictReason>,
) -> bool {
parent.map(|p| self.is_active(p)).unwrap_or(true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a little tidier as:

conflicting_activations.keys().chain(parent).all(|id| self.is_active(id))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -1091,12 +1088,9 @@ fn activate_deps_loop(
.filter_map(|(_, (deb, _, _))| {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the filter_map + next combo can be replaced with find

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something like?

.filter_map(|(_, (new_dep, _, _))| past_conflicting_activations.get(&new_dep))
.flat_map(|x| x)
.find(|con| cx.is_conflicting(None, con))

I'd love to find the elegant way to do this, just haven't found it yet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right yeah, don't worry about this it's fine as-is!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Submitted a PR to the std to make this code prettier :) rust-lang/rust#49098

@@ -1091,12 +1088,9 @@ fn activate_deps_loop(
.filter_map(|(_, (deb, _, _))| {
past_conflicting_activations.get(&deb).and_then(|past_bad| {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may actually be a great place to use ?:

past_conflicting_activations.get(&deb)?
    .iter()
    .find(|conflicts| cx.is_conflicting(None, conflicts))

(also this may want to rename deb to dep

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So much better!

@@ -1091,12 +1088,9 @@ fn activate_deps_loop(
.filter_map(|(_, (deb, _, _))| {
past_conflicting_activations.get(&deb).and_then(|past_bad| {
// for each dependency check all of its cashed conflicts
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/cashed/cached/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks spelling is not my strong suit! I know I'd mess this one up at some point.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh no worries, it's also not mine!

@Eh2406
Copy link
Contributor Author

Eh2406 commented Mar 16, 2018

No need to apologize! I am enjoying this because it feels like I can just barely get it all in my head, but everytime it does something unexpected proving that I in fact have not got it all.

@matklad
Copy link
Member

matklad commented Mar 16, 2018

Awesome work @Eh2406! I wonder if the module-level docs could be updated after optimization work in this and other PR is concluded?

Currently the docs say

The algorithm employed here is fairly simple, we simply do a DFS,

but looks like the reality has changed a bit since three years ago when that was written :) It would be awesome to have a high-level overview of the recent developments to allow future contributors to jump into resolve code more quickly! And if you are by any chance really into writing docs, there's also ARCHITECTURE.md which could benefit from ultra high-level resolve procedure description as well :)

@Eh2406
Copy link
Contributor Author

Eh2406 commented Mar 16, 2018

That is a good point. I will add it to my list, and we'll see if I get there.

By the way please don't merge this PR until I've had a chance to play with removing code-duplication inspired by #5168 (comment) .

@Eh2406
Copy link
Contributor Author

Eh2406 commented Mar 16, 2018

I pushed a commit that successfully remove the duplicated adding to the cache. But now I am getting really suspicions about where the extra frames are coming from, and why find_candidate is not removing the them.

);
past.push(conflicting_activations.clone());
}
// we have not activated ANY candidates.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm ok so if it's ok with you I'm gonna zero in on this block and see if I can understand it. The comment here mentions that we haven't activated any candidates, but we've activated candidate above, right? We're concluding here, however, that this activation is doomed to fail, which leads us to filter out the backtrack stack.

How come we conclude here, after activating, to filter the backtrack stack? I'd naively expect this logic to be above when we fail to activate a frame. Basically my thinking would be that we, above, conclude that conflicting activations make this dependency fail to activate. That naturally means that any backtrack frame which has all our conflicts activated is also doomed to fail.

In light of that, do we still need the !has_another condition and the !backtracked condition here? I'm not 100% sure what those are guarding against, but I'm sure I'm missing something!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are understanding this as well as I am. I.E. I am confused as well.

"but we've activated candidate above, right?" Yes.
"expect this logic to be above when we fail to activate a frame" We don't need it up above because find_candidate basically does it for us.
"In light of that, do we still need" any of this block? I don't know why we do. I pushed a commit removing it, time to figure out a better fix for the new test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh interesting! So removing this retain meant the test case added here still executed quickly? I see what you mean about find_candidate basically doing this already yeah

@alexcrichton
Copy link
Member

One sort of meta comment as well is that one reason I've been hesitant to tweak the resolver historically (apart from it being complicated and difficult to page back in) is that I'm relatively confident the resolver originally at least attempted to find all solutions. In the sense that if there was a successful resolution graph and given infinite time, Cargo would reach it.

One of the problems we want to keep our eyes peeled for is violating this assumption. For example I think it'd amount in a really subtle bug if we accidentally didn't visit a particular dependency graph due to it being pruned by accident. That would lead to probably super confusing "this graph can't be resolved" error messages when it in fact could be resolved!

Now to be clear I don't think we've hit such a case yet in the refactorings you've been doing @Eh2406, just something to look out for!

@Eh2406
Copy link
Contributor Author

Eh2406 commented Mar 16, 2018

I wholeheartedly agree! I am trying to add a test for each algorithmic change. There is lots of room for subtle bugs!
I wish we had a crater like tool to compare the lock files from two different cargos over all publick code. I wish we had randomized/quickcheck testing.
I will make a big announcement when this work makes it to nightly and ask people to compare and report regressions.

@alexcrichton
Copy link
Member

I will make a big announcement when this work makes it to nightly and ask people to compare and report regressions.

Sounds fantastic!

And again to be clear, everything done so far is amazingly awesome, I have no qualms with anything :)

@Eh2406
Copy link
Contributor Author

Eh2406 commented Mar 16, 2018

Thanks @alexcrichton for the great questions! I got suspicious of all use of conflicting_activations before it is extended with the conflicting from next. I removed adding a meta-skip so we only add to past_conflicting_activations in the one place, and it mostly went well. I removed the backtrack_stack.retain( that is the new part of this PR, and predictably the new test started failing. Then for completeness I removed the second find_candidate(. Then to find a way to get the test to pass... but.. that is odd... the test is passing. And all the examples from the OP are asswell.

So this PR is now "Remove 2 of the more confusing optimizations from #5168 and have things go faster." I will go edit the OP.

@Eh2406 Eh2406 changed the title Faster resolver: clean the backtrack_stack Faster resolver: clean code and remove bad optimizations Mar 16, 2018
@alexcrichton
Copy link
Member

@Eh2406 ok interesting! Even more interesting is the CI failure...

---- build::incompatible_dependencies stdout ----
	running `C:\projects\cargo\target\debug\cargo.exe build`
thread 'build::incompatible_dependencies' panicked at '
Expected: execs
    but: expected to find:
error: failed to select a version for `bad`.
    ... required by package `baz v0.1.0`
    ... which is depended on by `incompatible_dependencies v0.0.1 ([..])`
versions that meet the requirements `>= 1.0.1` are: 1.0.2, 1.0.1
all possible versions conflict with previously selected packages.
  previously selected package `bad v1.0.0`
    ... which is depended on by `bar v0.1.0`
    ... which is depended on by `incompatible_dependencies v0.0.1 ([..])`
failed to select a version for `bad` which could resolve this conflict
did not find in output:
    Updating registry `file:///C:/projects/cargo/target/cit/t224/registry`
error: failed to select a version for `baz`.
    ... required by package `incompatible_dependencies v0.0.1 (file:///C:/projects/cargo/target/cit/t224/transitive_load_test)`
versions that meet the requirements `^0.1.0` are: 0.1.2, 0.1.1, 0.1.0
all possible versions conflict with previously selected packages.
  previously selected package `bad v1.0.0`
    ... which is depended on by `bar v0.1.0`
    ... which is depended on by `incompatible_dependencies v0.0.1 (file:///C:/projects/cargo/target/cit/t224/transitive_load_test)`
failed to select a version for `baz` which could resolve this conflict
', tests\testsuite\hamcrest.rs:13:9
note: Run with `RUST_BACKTRACE=1` for a backtrace.

@Eh2406
Copy link
Contributor Author

Eh2406 commented Mar 16, 2018

That makes sense. :-(
We are relying on this optimization even when it's user visible, that's why things are suddenly going faster.
Time to come up with a more systematic solution.

@Eh2406 Eh2406 changed the title Faster resolver: clean code and remove bad optimizations Faster resolver: clean code and the backtrack_stack Mar 17, 2018
@Eh2406
Copy link
Contributor Author

Eh2406 commented Mar 17, 2018

Put back in the find_candidate( with a separate extended with the conflicting from next. Added a new version of clean the backtrack_stack this time with a lot more comments. Edited the OP to describe why the cleaning is necessary.

@alexcrichton
Copy link
Member

@bors: r+

Ok awesome that all makes sense to me, thanks so much!

@bors
Copy link
Contributor

bors commented Mar 17, 2018

📌 Commit c771a4c has been approved by alexcrichton

@bors
Copy link
Contributor

bors commented Mar 17, 2018

⌛ Testing commit c771a4c with merge bdc6fc2...

bors added a commit that referenced this pull request Mar 17, 2018
Faster resolver: clean code and the `backtrack_stack`

This is a small extension to #5168 and is inspired by #4834 (comment)

After #5168 these work (and don't on cargo from nightly.):
- `safe_core = "=0.22.4"`
- `safe_vault = "=0.13.2"`

But these don't work (and do on cargo from this PR.)
- `crust = "=0.24.0"`
- `elastic = "=0.3.0"`
- `elastic = "=0.4.0"`
- `elastic = "=0.5.0"`
- `safe_vault = "=0.14.0"`

It took some work to figure out why they are not working, and make a test case.

This PR remove use of `conflicting_activations` before it is extended with the conflicting from next.
#5187 (comment)
However the `find_candidate(` is still needed so it now gets the conflicting from next before being called.

It often happens that the candidate whose child will fail leading to it's failure, will have older siblings that have already set up `backtrack_frame`s. The candidate knows that it's failure can not be saved by its siblings, but sometimes we activate the child anyway for the error messages. Unfortunately the child does not know that is uncles can't save it, so it backtracks to one of them. Leading to a combinatorial loop.

The solution is to clear the `backtrack_stack` if we are activating just for the error messages.

Edit original end of this message, no longer accurate.
#5168 means that when we find a permanent problem we will never **activate** its parent again. In practise there afften is a lot of work and `backtrack_frame`s between the problem and reactivating its parent. This PR removes `backtrack_frame`s where its parent and the problem are present. This means that when we find a permanent problem we will never **backtrack** into it again.

An alternative is to scan all cashed problems while backtracking, but this seemed more efficient.
@bors
Copy link
Contributor

bors commented Mar 17, 2018

☀️ Test successful - status-appveyor, status-travis
Approved by: alexcrichton
Pushing bdc6fc2 to master...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants