Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only allow one of each links attribute in resolver #4978

Merged
merged 13 commits into from
Feb 24, 2018
Merged

Conversation

Eh2406
Copy link
Contributor

@Eh2406 Eh2406 commented Jan 25, 2018

This is a start on fixing the links part of #4902. It needs code that adds the links attribute to the index. But, I wanted to get feedback.

@rust-highfive
Copy link

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @alexcrichton (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@alexcrichton
Copy link
Member

Awesome, thanks!

It looks though like there's some CI failures? In addition to that, could some tests be added about successfully choosing a resolution when two possible graphs would have otherwise been possible?

The final comment I'd have for this is is that resolution tends to be a pretty performance sensitive path in Cargo, so would it be possible to avoid the HashSet construction on each iteration?

@Eh2406
Copy link
Contributor Author

Eh2406 commented Jan 25, 2018

The CI failures are test looking for the old links checker, they will probably need to be updated. Maybe we should remove the old cheker entirely? That is a good suggestion for added test.

As to performance:

  • Do we have a benchmark?
  • We could use a Vec as n is likely to be small.
  • We could add it as a field of Context, but we would have to be careful it got updated in skink with activations.
  • Why are we using the default hasher instead of fnv for all these hash maps?

@Eh2406
Copy link
Contributor Author

Eh2406 commented Jan 26, 2018

We cant remove the old checker, mostall indexes don't have the new field. :-)

@alexcrichton
Copy link
Member

@Eh2406 yeah I thin kit's fine to remove the old checker entirely. Although I will say though that the error messages from the old checker are quite good, and it'd be a shame to lose those... Fixing that may require a more general overhaul to resolver error messages which already aren't the best unfortunately :(

In any case unfortunately I don't have many benchmarks for the resolver. The main one I've used in the past is "how long does it take to resolve Servo's dependency graph?" but without information in the index it'll be deceptively fast I think. In general though we just need to avoid allocations if possible on each turn of the loop if we can, or otherwise make sure the allocations are pretty small. Would it be possible to have a field that's updated over time rather than recreated each time?

The allocations have been most of the runtime so far, I've never seen the hashing part of a hash map show up in profiles.

@Eh2406
Copy link
Contributor Author

Eh2406 commented Jan 26, 2018

The first time I tried this I had a persistent set, and got myself confused. Then I tried making the set dynamically and submit the pr. This time I got the persistent set to work. I think the third time's the charm!

As to the old checker, I don't think we can remove it until we are sure that all of the index (and clones of the index) are updated to have the new field. I could be wrong, I am new to this codebase. If I am wrong, please enlighten me!

Switching from the good old checker error messages to the resolver messages is a shame. But what can we do. :-(

How hard is it to resolve Servo's dependency graph in isolation as someone that has never touched Servo's code befor?

@@ -1047,6 +1053,9 @@ impl<'a> Context<'a> {
.or_insert(Vec::new());
if !prev.iter().any(|c| c == summary) {
self.resolve_graph.push(GraphNode::Add(id.clone()));
if let Some(link) = summary.links() {
self.links.insert(link.to_owned());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this assert that we're not replacing somethign that was already there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could make it assert!(self.links.insert(link.to_owned())); but I think this can get triggered if updating a lock file that already has multiple crates with the same links attribute. Is this acceptable? Is there some way to get better error messages?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm that would indeed be bad! Thinking through how this can happen I think we could trigger this if an older cargo created a lock file with a bunch of new crates (that all have links in the manifest) and then a newer cargo tries to use that lock file.

In that case, sounds like we should skip the assert for now!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Er well, but if we do actually run into that case the previous project surely couldn't build previously becuase it would have been caught later...

How about this, could we return an error from here to entirely abort resolution?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have that working now.

@alexcrichton
Copy link
Member

Ok great! I think this looks good to me in terms of perf, and we can always continue to tweak as more cases come up.

I've actually found it somewhat easy to profile with Servo before, but unfortunately it may not be too interesting to profile with this PR as-is (as there's no links keys in the index). If you'd like though you could perhaps manufacture a links for all Summary instances created (a unique name in links) temporarily to test the performance?

Otherwise I've historically just cd'd into Servos' checkout and run cargo generate-lockfile or something like cargo update -p servo which shouldn't hit het network after the first time and stresseses the path of "we've already got a Cargo.lock so this shoudl be fast". Perhaps worth testing if you can with some synthetic links keys?

@alexcrichton
Copy link
Member

Oh right, some further thoughts:

  • Let's actually not remove the old system just yet. Because many crates on crates.io don't have the links key in the index we won't otherwise catch those errors if we remove that logic. We can eventually remove it one day perhaps but only after we've published a lot.

  • This will need to be coupled with an update to how we publish crates to the registry. That way information will actually go into the index!

@Eh2406
Copy link
Contributor Author

Eh2406 commented Jan 27, 2018

So I got Servo set up so I can run the test. I benchmarked 4 versions. I am running windows so I don't have a time command. I am just looking at wall time for now. For each one I am running cargo update -p servo.

  1. The base, the commit before my PR. Takes approximately 3 second to run. Is it usually this fast? Maybe I just got a lucky checkout of Servo.
  2. The initial version of this PR, with a new HashSet for each loop. I killed it after a minute had already gone by. So, ya that is not acceptable.
  3. The current head of this PR. This tests rolling out this PR without backfilling the index. Takes approximately 2 second to run.
  4. If I add a line links.or_else(|| if name.starts_with("sys-") {Some(name.to_string())} else {None}) to registry\index.rs. This tests after the index is filled in. Takes approximately 4 second to run.

@alexcrichton
Copy link
Member

Thanks for investigating @Eh2406! I'm actually surprised it took 3 seconds at all, that sort of update should happen in milliseconds ather than seconds...

Well in any case sounds like the current PR as-is is certainly acceptable speed-wise!

@Eh2406
Copy link
Contributor Author

Eh2406 commented Jan 28, 2018

I added the links attribute to the NewCrate struct then followed the error messages. Please let me know if I did that correctly.

@alexcrichton
Copy link
Member

Ok thanks @Eh2406! Thinking about this some more...

I'm getting more and more worried as I think about it about the error messages here. It's pretty well-known at this point I believe that the resolve error messages coming out of Cargo are pretty bad (almost un-actionable!). That's also coupled with the fact that non-resolvable crate graphs look like hangs I think could get us into a bit of a tight spot with this feature.

For example we've, since inception, had requests and an implementation for better error messages around the links key. That implies to me that this situation is actually coming up in the wild a nontrivial number of times. If we were to move the errors to graph resolution then I fear that all of these cases coming up in the wild will become un-debuggable.

In that sense I'm not entirely sure how to balance this implementation. On one hand I definitely believe it's technically correct (although would be good to have a test to assert that this error can guide crate graph resolution). On the other hand though I'm not sure that this is worth the loss in error message quality. From my experience cases like #4902 are coming up pretty rarely where there's a resolvable crate graph but Cargo isn't finding it immediately.

I'm curious what you think about this as well?

@Eh2406
Copy link
Contributor Author

Eh2406 commented Jan 29, 2018

I have a lot to say in response to your comment. How much will actually be relevant to this pr I won't know till I try and say it. But I hope to do a lot of work on the resolver and it will be relevant to that larger context. I will try and respond point by point.

It's pretty well-known at this point I believe that the resolve error messages coming out of Cargo are pretty bad (almost un-actionable!).

I agree! I would love to work on fixing them when I understand the resolver deeply so I can see how to improve it. I started working on this PR so I could begin to understand the resolver. If I begin with some low hanging problems, even if I never understand the resolver deeply I will still have contributed.

I fear that making good error messages for resolver will be hard. Resolver currently returns some random crate it can not resolve, usually one that the user did not even know was being depend upon. It is not clear how to synthesize it into a larger coherent user centric message.

that non-resolvable crate graphs look like hangs

I think this is surprisingly tightly tied to the error messages. I think the solution to failing fast on most non-resolvable crate graphs in valves taking the fact that some random crate can not be resolved synthesize it into a larger coherent problem then back track until the bigger problem goes away. If we had a good way of building a larger coherent problem we would have them both solved.

For example we've, since inception, had requests and an implementation for better error messages around the links key. That implies to me that this situation is actually coming up in the wild a nontrivial number of times. If we were to move the errors to graph resolution then I fear that all of these cases coming up in the wild will become un-debuggable.

I think all the cases that are debugabal in the current system, would just resolve into something that worked. I.E. If you can fix it by specifying a specific version of a dependency, then it will just no longer be an issue. The process of backtracking will find the dependency that works.

Of course the resolver error messages are offil so if you have a bigger problem than pinning a dependency the error messages will get a lot worse.

although would be good to have a test to assert that this error can guide crate graph resolution

I'd love some guidance on making a test in which cargo has multiple versions to choose from only the oldest of witch does not have a links attribute that conflicts with the main crate. Or any other test you would like to suggest.

From my experience cases like #4902 are coming up pretty rarely where there's a resolvable crate graph but Cargo isn't finding it immediately.

I don't think that is an accurate description of #4902. I don't see that particular report mentioning cargo being slow. I see it reporting that cargo always builds a lock file that it then cannot use. With, in that case, no way to fix it. That is a really bad user experience.

I think over all this replaces friendly error messages that clearly tell you what problem cargo made for you with cargo silently not making the error in the first place.

@Eh2406
Copy link
Contributor Author

Eh2406 commented Jan 30, 2018

I think this is a ergonomic win, but reasonable people can disagree. If you decide to postpone this until we somehow get better error messages then I will open up PRs related to other issues with the resolver and let this one bit rot. Knowing me, If I end up in a holding pattern I will likely get distracted, so one way or the other I'd like to keep things moving so I can contribute.

@alexcrichton
Copy link
Member

Thanks for the writeup @Eh2406! (and sorry for taking a few days to get back to you)

Mostly that all sounds great to me, I'd be thrilled to help you learn about the resolver if you'd like!

One point though is...

I think over all this replaces friendly error messages that clearly tell you what problem cargo made for you with cargo silently not making the error in the first place.

Almost! A few days ago we talked about this more broadly in the cargo triage meeting and what to do with this PR. Our conclusion though was that we probably don't want to land this yet, but we're curious to learn more information!

My fear, more specifically, is that entirely unresolvable crate graphs will start showing up (producing bad error messages). I suspect that, because we've improved the error messages here, it's somewhat often coming up in practice that there's multiple native libraries of the same name that can't get built together. While in #4902 it's possible that Cargo could have actually found a successful resolution I suspect that is not the case for most of the times this comes up in practice. Whenever I've seen this before it's been impossible to resolve the crate graph with this constraint, but rather a dependency simply needed to be updated.

Along those lines though, @SergioBenitez as the reporter of #4902 is this an issue that's coming up in practice for you a lot? The solution in this PR will solve #4902 but will unfortunately create some other problems as well, so we're trying to figure out how to best weigh the priorities here.

@Eh2406 if you'd like to tackle resolution one of the highest priority bits of it I think would be the error messages, and I'd be more than willing to help mentor you on improvements to that!

@Eh2406
Copy link
Contributor Author

Eh2406 commented Feb 5, 2018

Just a note for when someone/I come back to this, in addition to the error messages, It will take some work to rebase/merge this with #4834, and that is a much bigger/more common UX improvement.

@bors
Copy link
Contributor

bors commented Feb 6, 2018

☔ The latest upstream changes (presumably #4834) made this pull request unmergeable. Please resolve the merge conflicts.

# Conflicts:
#	src/cargo/core/resolver/mod.rs
#	tests/resolve.rs
@Eh2406
Copy link
Contributor Author

Eh2406 commented Feb 15, 2018

Worked on the error messages, dose that look better?

@bors
Copy link
Contributor

bors commented Feb 21, 2018

☔ The latest upstream changes (presumably #5063) made this pull request unmergeable. Please resolve the merge conflicts.

@alexcrichton
Copy link
Member

@bors: r+

Looks amazing, thanks so much @Eh2406!

@bors
Copy link
Contributor

bors commented Feb 24, 2018

📌 Commit 68a40ad has been approved by alexcrichton

@bors
Copy link
Contributor

bors commented Feb 24, 2018

⌛ Testing commit 68a40ad with merge fe0c18b...

bors added a commit that referenced this pull request Feb 24, 2018
Only allow one of each links attribute in resolver

This is a start on fixing the `links` part of #4902. It needs code that adds the links attribute to the index. But, I wanted to get feedback.
@alexcrichton
Copy link
Member

@Eh2406 also, would you be up for following through with the crates.io changes needed here as well?

@Eh2406
Copy link
Contributor Author

Eh2406 commented Feb 24, 2018

Um befor we merge, how do we test that adding links to the index will not break old versions of cargo?
@joshtriplett asked us to test it explicitly.

@bors
Copy link
Contributor

bors commented Feb 24, 2018

☀️ Test successful - status-appveyor, status-travis
Approved by: alexcrichton
Pushing fe0c18b to master...

@bors bors merged commit 68a40ad into rust-lang:master Feb 24, 2018
@alexcrichton
Copy link
Member

@Eh2406 oh I'd be fine adding a test yeah, although I'm quite confident that it won't require any code changes.

@Eh2406
Copy link
Contributor Author

Eh2406 commented Feb 25, 2018

I don't think it's as simple as adding a test. Adding a test will test that it works with current cargo. I guess we would need to manually alter the index, and then every previous version of cargo.

@Eh2406
Copy link
Contributor Author

Eh2406 commented Feb 26, 2018

Ok so I am trying to test if the links attribute will brake old cargo. This is a log of my progress.

  1. set up a new project called testlinks, use rustup override set 1.0.0-gnu to make sure we are using an old cargo.
  2. dell my C:\Users\...\.cargo\registry then run cargo search winapi inside testlinks. This checks out the index exactly as rust 1.0.0 wanted it.
  3. Add a dependency to testlinks on cpython = {version = "*", features=["python27-sys"]} python27-sys = "0.0.6" and generate a lock file. This set of dependencies are chosen to lead to having both [[package]] name = "python27-sys" version = "0.0.6" and [[package]] name = "python27-sys" version = "0.1.2"
  4. edit the index to add ,"links":"python27"} to the end of each row of the python27-sys file. save and commit.
  5. rerun testlinks>cargo generate-lockfile and insure it has the same result.
  6. try with a fresh build of cargo from master. witch should error but gives the same results, something is wrong. looks like something throw out my commit. re committing and still nothing.

This test failed! In 2 ways:

  1. With cargo from 1.0.0 the Updating registry throws out my changes making the test useless.
  2. With cargo from master it doesn't seem to be reading the attribute.

TLDR:
How do I get cargo from 1.0.0 to see an index with a links attribute? Why is cargo from master not reading the links attribute? Help!?

@alexcrichton
Copy link
Member

@Eh2406 IIRC Cargo at 1.0.0 actually unconditionally reads the registry, so you should be able to do:

  1. Use Cargo 1.0.0 to generate a lock file
  2. Edit the index. Take a crate that's in the lock file and add an index key to the row in the index for that crate
  3. Rerun Cargo 1.0.0, it'll read the registry entry and ingore the index key

For master cargo are you using nightly or building from source? I think these changes haven't made their way into nightly yet unfortunately.

@Eh2406
Copy link
Contributor Author

Eh2406 commented Feb 26, 2018

I think that is what I tried, but when Cargo 1.0.0 is Updating registry it does a clean checkout that removes my change to the index. I.E. change files to have the links and commit. Check that files are changed, And that commit in log. Run testlinks>cargo generate-lockfile. Index files do not have links, nor is my commit in the git log.

For the master test I am building from source.

@alexcrichton
Copy link
Member

Hm I'm not so sure in that case :(. Been awhile since I used Cargo 1.0.0... In any case using Cargo 1.20.0 or something like that (not too old) is probably sufficient. No one's using Rust 1.0.0 in production right now I'd imagine.

As to why master isn't reading the attribute... not sure!

@Eh2406
Copy link
Contributor Author

Eh2406 commented Feb 26, 2018

made a clone of the index at https://github.com/Eh2406/crates.io-index. use .cargo/config to override. pushed https://github.com/Eh2406/crates.io-index/commit/2d969f4712c5aaa9083e5e964879b73c8ed59ae2

Good news:

testlinks>rustup override set 1.0.0-x86_64-pc-windows-gnu
info: using existing install for '1.0.0-x86_64-pc-windows-gnu'
testlinks>cargo generate-lockfile
    Updating registry `https://github.com/Eh2406/crates.io-index`

Bad news:
from source, master cargo generate-lockfile still is not returning an error.

@Eh2406
Copy link
Contributor Author

Eh2406 commented Feb 26, 2018

Good news I figured it out, and all is well!
cpython = {version = "*", features=["python27-sys"]} python27-sys = "0.0.6" Does have a solution it is to use an old version of cpython No error message needed.

Switching to cpython = {version = "0.1.0", features=["python27-sys"]} python27-sys = "0.0.6" gives error: failed to select a version for `python27-sys`. as intended!

@alexcrichton
Copy link
Member

Nice!

bors-voyager bot added a commit to rust-lang/crates.io that referenced this pull request Feb 27, 2018
1248: Try to add links to the index r=carols10cents

This is a companion to rust-lang/cargo#4978
It is adding the links attribute to the index git. I am working on windows so cannot test locally, I.E. this may involve some sparing with the CI to get this green. Advice and help are welcome!
@ehuss ehuss added this to the 1.26.0 milestone Feb 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants