Faster resolver: clean code and the `backtrack_stack` #5187

Eh2406 · 2018-03-15T17:56:47Z

This is a small extension to #5168 and is inspired by #4834 (comment)

After #5168 these work (and don't on cargo from nightly.):

safe_core = "=0.22.4"
safe_vault = "=0.13.2"

But these don't work (and do on cargo from this PR.)

crust = "=0.24.0"
elastic = "=0.3.0"
elastic = "=0.4.0"
elastic = "=0.5.0"
safe_vault = "=0.14.0"

It took some work to figure out why they are not working, and make a test case.

This PR remove use of conflicting_activations before it is extended with the conflicting from next.
#5187 (comment)
However the find_candidate( is still needed so it now gets the conflicting from next before being called.

It often happens that the candidate whose child will fail leading to it's failure, will have older siblings that have already set up backtrack_frames. The candidate knows that it's failure can not be saved by its siblings, but sometimes we activate the child anyway for the error messages. Unfortunately the child does not know that is uncles can't save it, so it backtracks to one of them. Leading to a combinatorial loop.

The solution is to clear the backtrack_stack if we are activating just for the error messages.

Edit original end of this message, no longer accurate.
#5168 means that when we find a permanent problem we will never activate its parent again. In practise there afften is a lot of work and backtrack_frames between the problem and reactivating its parent. This PR removes backtrack_frames where its parent and the problem are present. This means that when we find a permanent problem we will never backtrack into it again.

An alternative is to scan all cashed problems while backtracking, but this seemed more efficient.

…hed bad activations

rust-highfive · 2018-03-15T17:56:57Z

r? @alexcrichton

(rust_highfive has picked a reviewer for you, use r? to override)

alexcrichton

Ok I've tried to sit down and understand all this and I think this is indeed right! Sorry I'm having a lot of trouble keeping this all in my head...

alexcrichton · 2018-03-15T21:11:45Z

src/cargo/core/resolver/mod.rs

+        parent: Option<&PackageId>,
+        conflicting_activations: &HashMap<PackageId, ConflictReason>,
+    ) -> bool {
+        parent.map(|p| self.is_active(p)).unwrap_or(true)


This might be a little tidier as:

conflicting_activations.keys().chain(parent).all(|id| self.is_active(id))

alexcrichton · 2018-03-15T21:12:05Z

src/cargo/core/resolver/mod.rs

@@ -1091,12 +1088,9 @@ fn activate_deps_loop(
                            .filter_map(|(_, (deb, _, _))| {


I think the filter_map + next combo can be replaced with find

Oh right yeah, don't worry about this it's fine as-is!

Submitted a PR to the std to make this code prettier :) rust-lang/rust#49098

alexcrichton · 2018-03-15T21:12:57Z

src/cargo/core/resolver/mod.rs

@@ -1091,12 +1088,9 @@ fn activate_deps_loop(
                            .filter_map(|(_, (deb, _, _))| {
                                past_conflicting_activations.get(&deb).and_then(|past_bad| {


This may actually be a great place to use ?:

past_conflicting_activations.get(&deb)? .iter() .find(|conflicts| cx.is_conflicting(None, conflicts))

(also this may want to rename deb to dep

So much better!

alexcrichton · 2018-03-15T21:13:10Z

src/cargo/core/resolver/mod.rs

@@ -1091,12 +1088,9 @@ fn activate_deps_loop(
                            .filter_map(|(_, (deb, _, _))| {
                                past_conflicting_activations.get(&deb).and_then(|past_bad| {
                                    // for each dependency check all of its cashed conflicts


s/cashed/cached/

Thanks spelling is not my strong suit! I know I'd mess this one up at some point.

oh no worries, it's also not mine!

Eh2406 · 2018-03-16T02:48:44Z

No need to apologize! I am enjoying this because it feels like I can just barely get it all in my head, but everytime it does something unexpected proving that I in fact have not got it all.

matklad · 2018-03-16T09:09:18Z

Awesome work @Eh2406! I wonder if the module-level docs could be updated after optimization work in this and other PR is concluded?

Currently the docs say

The algorithm employed here is fairly simple, we simply do a DFS,

but looks like the reality has changed a bit since three years ago when that was written :) It would be awesome to have a high-level overview of the recent developments to allow future contributors to jump into resolve code more quickly! And if you are by any chance really into writing docs, there's also ARCHITECTURE.md which could benefit from ultra high-level resolve procedure description as well :)

Eh2406 · 2018-03-16T11:44:43Z

That is a good point. I will add it to my list, and we'll see if I get there.

By the way please don't merge this PR until I've had a chance to play with removing code-duplication inspired by #5168 (comment) .

Eh2406 · 2018-03-16T17:13:38Z

I pushed a commit that successfully remove the duplicated adding to the cache. But now I am getting really suspicions about where the extra frames are coming from, and why find_candidate is not removing the them.

alexcrichton · 2018-03-16T18:30:45Z

src/cargo/core/resolver/mod.rs

-                            );
-                            past.push(conflicting_activations.clone());
-                        }
+                        // we have not activated ANY candidates.


Hm ok so if it's ok with you I'm gonna zero in on this block and see if I can understand it. The comment here mentions that we haven't activated any candidates, but we've activated candidate above, right? We're concluding here, however, that this activation is doomed to fail, which leads us to filter out the backtrack stack.

How come we conclude here, after activating, to filter the backtrack stack? I'd naively expect this logic to be above when we fail to activate a frame. Basically my thinking would be that we, above, conclude that conflicting activations make this dependency fail to activate. That naturally means that any backtrack frame which has all our conflicts activated is also doomed to fail.

In light of that, do we still need the !has_another condition and the !backtracked condition here? I'm not 100% sure what those are guarding against, but I'm sure I'm missing something!

I think you are understanding this as well as I am. I.E. I am confused as well.

"but we've activated candidate above, right?" Yes.
"expect this logic to be above when we fail to activate a frame" We don't need it up above because find_candidate basically does it for us.
"In light of that, do we still need" any of this block? I don't know why we do. I pushed a commit removing it, time to figure out a better fix for the new test.

Oh interesting! So removing this retain meant the test case added here still executed quickly? I see what you mean about find_candidate basically doing this already yeah

alexcrichton · 2018-03-16T18:33:39Z

One sort of meta comment as well is that one reason I've been hesitant to tweak the resolver historically (apart from it being complicated and difficult to page back in) is that I'm relatively confident the resolver originally at least attempted to find all solutions. In the sense that if there was a successful resolution graph and given infinite time, Cargo would reach it.

One of the problems we want to keep our eyes peeled for is violating this assumption. For example I think it'd amount in a really subtle bug if we accidentally didn't visit a particular dependency graph due to it being pruned by accident. That would lead to probably super confusing "this graph can't be resolved" error messages when it in fact could be resolved!

Now to be clear I don't think we've hit such a case yet in the refactorings you've been doing @Eh2406, just something to look out for!

…with cashed bad activations"

Eh2406 · 2018-03-16T19:27:25Z

I wholeheartedly agree! I am trying to add a test for each algorithmic change. There is lots of room for subtle bugs!
I wish we had a crater like tool to compare the lock files from two different cargos over all publick code. I wish we had randomized/quickcheck testing.
I will make a big announcement when this work makes it to nightly and ask people to compare and report regressions.

alexcrichton · 2018-03-16T19:40:43Z

I will make a big announcement when this work makes it to nightly and ask people to compare and report regressions.

Sounds fantastic!

And again to be clear, everything done so far is amazingly awesome, I have no qualms with anything :)

Eh2406 · 2018-03-16T20:09:39Z

Thanks @alexcrichton for the great questions! I got suspicious of all use of conflicting_activations before it is extended with the conflicting from next. I removed adding a meta-skip so we only add to past_conflicting_activations in the one place, and it mostly went well. I removed the backtrack_stack.retain( that is the new part of this PR, and predictably the new test started failing. Then for completeness I removed the second find_candidate(. Then to find a way to get the test to pass... but.. that is odd... the test is passing. And all the examples from the OP are asswell.

So this PR is now "Remove 2 of the more confusing optimizations from #5168 and have things go faster." I will go edit the OP.

alexcrichton · 2018-03-16T20:15:17Z

@Eh2406 ok interesting! Even more interesting is the CI failure...

---- build::incompatible_dependencies stdout ----
	running `C:\projects\cargo\target\debug\cargo.exe build`
thread 'build::incompatible_dependencies' panicked at '
Expected: execs
    but: expected to find:
error: failed to select a version for `bad`.
    ... required by package `baz v0.1.0`
    ... which is depended on by `incompatible_dependencies v0.0.1 ([..])`
versions that meet the requirements `>= 1.0.1` are: 1.0.2, 1.0.1
all possible versions conflict with previously selected packages.
  previously selected package `bad v1.0.0`
    ... which is depended on by `bar v0.1.0`
    ... which is depended on by `incompatible_dependencies v0.0.1 ([..])`
failed to select a version for `bad` which could resolve this conflict
did not find in output:
    Updating registry `file:///C:/projects/cargo/target/cit/t224/registry`
error: failed to select a version for `baz`.
    ... required by package `incompatible_dependencies v0.0.1 (file:///C:/projects/cargo/target/cit/t224/transitive_load_test)`
versions that meet the requirements `^0.1.0` are: 0.1.2, 0.1.1, 0.1.0
all possible versions conflict with previously selected packages.
  previously selected package `bad v1.0.0`
    ... which is depended on by `bar v0.1.0`
    ... which is depended on by `incompatible_dependencies v0.0.1 (file:///C:/projects/cargo/target/cit/t224/transitive_load_test)`
failed to select a version for `baz` which could resolve this conflict
', tests\testsuite\hamcrest.rs:13:9
note: Run with `RUST_BACKTRACE=1` for a backtrace.

Eh2406 · 2018-03-16T20:58:29Z

That makes sense. :-(
We are relying on this optimization even when it's user visible, that's why things are suddenly going faster.
Time to come up with a more systematic solution.

… other backtrack frames

Eh2406 · 2018-03-17T14:50:58Z

Put back in the find_candidate( with a separate extended with the conflicting from next. Added a new version of clean the backtrack_stack this time with a lot more comments. Edited the OP to describe why the cleaning is necessary.

alexcrichton · 2018-03-17T17:49:21Z

@bors: r+

Ok awesome that all makes sense to me, thanks so much!

bors · 2018-03-17T17:49:22Z

📌 Commit c771a4c has been approved by alexcrichton

bors · 2018-03-17T17:49:29Z

⌛ Testing commit c771a4c with merge bdc6fc2...

Faster resolver: clean code and the `backtrack_stack` This is a small extension to #5168 and is inspired by #4834 (comment) After #5168 these work (and don't on cargo from nightly.): - `safe_core = "=0.22.4"` - `safe_vault = "=0.13.2"` But these don't work (and do on cargo from this PR.) - `crust = "=0.24.0"` - `elastic = "=0.3.0"` - `elastic = "=0.4.0"` - `elastic = "=0.5.0"` - `safe_vault = "=0.14.0"` It took some work to figure out why they are not working, and make a test case. This PR remove use of `conflicting_activations` before it is extended with the conflicting from next. #5187 (comment) However the `find_candidate(` is still needed so it now gets the conflicting from next before being called. It often happens that the candidate whose child will fail leading to it's failure, will have older siblings that have already set up `backtrack_frame`s. The candidate knows that it's failure can not be saved by its siblings, but sometimes we activate the child anyway for the error messages. Unfortunately the child does not know that is uncles can't save it, so it backtracks to one of them. Leading to a combinatorial loop. The solution is to clear the `backtrack_stack` if we are activating just for the error messages. Edit original end of this message, no longer accurate. #5168 means that when we find a permanent problem we will never **activate** its parent again. In practise there afften is a lot of work and `backtrack_frame`s between the problem and reactivating its parent. This PR removes `backtrack_frame`s where its parent and the problem are present. This means that when we find a permanent problem we will never **backtrack** into it again. An alternative is to scan all cashed problems while backtracking, but this seemed more efficient.

bors · 2018-03-17T18:21:28Z

☀️ Test successful - status-appveyor, status-travis
Approved by: alexcrichton
Pushing bdc6fc2 to master...

Eh2406 added 2 commits March 15, 2018 10:54

Clean the backtrack_stack so we don't backtrack to a place with cas…

15c757a

…hed bad activations

add a test

617856e

rust-highfive assigned alexcrichton Mar 15, 2018

cargo +stable fmt

1291c50

alexcrichton reviewed Mar 15, 2018

View reviewed changes

suggestions

68e9577

This was referenced Mar 16, 2018

Faster resolver: Cache past conflicting_activations, prevent doing the same work repeatedly. #5168

Merged

Add some documentation to the resolver #5195

Merged

remove duplicated adding to the cache

59249d1

alexcrichton reviewed Mar 16, 2018

View reviewed changes

Revert "Clean the backtrack_stack so we don't backtrack to a place …

f5a0f28

…with cashed bad activations"

Eh2406 changed the title ~~Faster resolver: clean the backtrack_stack~~ Faster resolver: clean code and remove bad optimizations Mar 16, 2018

Eh2406 added 2 commits March 16, 2018 21:40

When test backtracking include conflicts in remaining_candidates

dd9ff1f

When activating for the better error messages don't waste time on the…

c771a4c

… other backtrack frames

Eh2406 force-pushed the faster_resolver branch from 888361f to c771a4c Compare March 17, 2018 14:07

Eh2406 changed the title ~~Faster resolver: clean code and remove bad optimizations~~ Faster resolver: clean code and the backtrack_stack Mar 17, 2018

bors merged commit c771a4c into rust-lang:master Mar 17, 2018

MartinNowak mentioned this pull request Jul 7, 2018

Backtracking for dependency resolver dlang/dub#1508

Closed

Eh2406 mentioned this pull request Sep 7, 2018

BUG fuzzing found a bug in the resolver, we need a complete set of conflicts to do backjumping #5988

Merged

ehuss added this to the 1.26.0 milestone Feb 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster resolver: clean code and the `backtrack_stack` #5187

Faster resolver: clean code and the `backtrack_stack` #5187

Eh2406 commented Mar 15, 2018 •

edited

Loading

rust-highfive commented Mar 15, 2018

alexcrichton left a comment

alexcrichton Mar 15, 2018

Eh2406 Mar 16, 2018

alexcrichton Mar 15, 2018

Eh2406 Mar 16, 2018

alexcrichton Mar 16, 2018

matklad Mar 16, 2018

alexcrichton Mar 15, 2018

Eh2406 Mar 16, 2018

alexcrichton Mar 15, 2018

Eh2406 Mar 16, 2018

alexcrichton Mar 16, 2018

Eh2406 commented Mar 16, 2018

matklad commented Mar 16, 2018

Eh2406 commented Mar 16, 2018

Eh2406 commented Mar 16, 2018

alexcrichton Mar 16, 2018

Eh2406 Mar 16, 2018

alexcrichton Mar 16, 2018

alexcrichton commented Mar 16, 2018

Eh2406 commented Mar 16, 2018

alexcrichton commented Mar 16, 2018

Eh2406 commented Mar 16, 2018 •

edited

Loading

alexcrichton commented Mar 16, 2018

Eh2406 commented Mar 16, 2018

Eh2406 commented Mar 17, 2018

alexcrichton commented Mar 17, 2018

bors commented Mar 17, 2018

bors commented Mar 17, 2018

bors commented Mar 17, 2018

		@@ -1091,12 +1088,9 @@ fn activate_deps_loop(
		.filter_map(\|(_, (deb, _, _))\| {

		@@ -1091,12 +1088,9 @@ fn activate_deps_loop(
		.filter_map(\|(_, (deb, _, _))\| {
		past_conflicting_activations.get(&deb).and_then(\|past_bad\| {

Faster resolver: clean code and the backtrack_stack #5187

Faster resolver: clean code and the backtrack_stack #5187

Conversation

Eh2406 commented Mar 15, 2018 • edited Loading

rust-highfive commented Mar 15, 2018

alexcrichton left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Eh2406 commented Mar 16, 2018

matklad commented Mar 16, 2018

Eh2406 commented Mar 16, 2018

Eh2406 commented Mar 16, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexcrichton commented Mar 16, 2018

Eh2406 commented Mar 16, 2018

alexcrichton commented Mar 16, 2018

Eh2406 commented Mar 16, 2018 • edited Loading

alexcrichton commented Mar 16, 2018

Eh2406 commented Mar 16, 2018

Eh2406 commented Mar 17, 2018

alexcrichton commented Mar 17, 2018

bors commented Mar 17, 2018

bors commented Mar 17, 2018

bors commented Mar 17, 2018

Faster resolver: clean code and the `backtrack_stack` #5187

Faster resolver: clean code and the `backtrack_stack` #5187

Eh2406 commented Mar 15, 2018 •

edited

Loading

Eh2406 commented Mar 16, 2018 •

edited

Loading