Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cargo build updates the git commit #8468

Closed
bkchr opened this issue Jul 8, 2020 · 24 comments · Fixed by #8522
Closed

cargo build updates the git commit #8468

bkchr opened this issue Jul 8, 2020 · 24 comments · Fixed by #8522
Labels
C-bug Category: bug

Comments

@bkchr
Copy link

bkchr commented Jul 8, 2020

Hey, so we have the following problem:

cargo build in our repo with a Cargo.lock leads to the following output:

    Updating git repository `https://github.com/paritytech/substrate`
    Updating crates.io index

It tries to update the git commit, which leads to some other dependency problems.

Using cargo 1.46.0-nightly (fede83ccf 2020-07-02).

Doing the same with cargo 2020-06-26, it just starts building without updating the git commit.

I tracked it down to the following pr: #8364

If I revert this pr, it works.

TLDR: cargo now always updates the git commit using cargo build.

CC @alexcrichton as you have created the pr.

@bkchr bkchr added the C-bug Category: bug label Jul 8, 2020
@bkchr bkchr changed the title cargo build updates the git reference while it should not do this cargo build updates the git commit Jul 8, 2020
@bkchr
Copy link
Author

bkchr commented Jul 8, 2020

FYI: We have everywhere branch = "master" in the Cargo.toml files.

@bkchr
Copy link
Author

bkchr commented Jul 8, 2020

Okay, I think I know what the problem is:

source = "git+https://github.com/paritytech/substrate#59ee76a0f342ae0dc1c6a6777d24d7811d9261ef"
source = "git+https://github.com/paritytech/substrate?branch=master#d076f4705ee257b61c02c918755e63e3b34272ad"

The new syntax requires the branch=master?

@alexcrichton
Copy link
Member

Thanks for the report! Could you expand a bit on how you're depending on the substrate git repo? Those two different source URLs looks like there's one dependency without branch and one with branch perhaps? That still shouldn't cause issues for Cargo (it's a bug that issues are happening), but I'm curious how we could reproduce locally to investigate.

@bkchr
Copy link
Author

bkchr commented Jul 8, 2020

https://github.com/paritytech/polkadot/tree/v0.8.13

You can check this out and try to compile it.(it is enough to just compile for a second, you don't need to compile the full project) It will work with a cargo without your pr, if you add your pr it will fail to resolve a depdency (because it updated the substrate commit). I always deleted the target dir in between the runs.

What I have posted is the change that happened on master, after I updated the commit by intention. In the old format the master branch was not added to the cargo.lock and now it is added because it is not the default anymore. I think this is the reason why cargo tries to update the cargo.lock with your pr included. (but just guessing here)

@alexcrichton
Copy link
Member

Ok so I've done some digging here and I think I see what's going on.

Before #8364 specifying branch = "master" in your Cargo.toml was 100% equivalent to specifying nothing. Specifying nothing desugared to as if you did branch = "master" internally. After #8364, however, these are now different ways to specify a dependency, where omitting a specifier means "default branch" whereas branch = "master" is now specifying you specifically want the master branch.

This had implications on Cargo.lock as well. Different branches in a git repo are different "sources" internally in Cargo, so we need to encode this information differently in Cargo.lock to differentiate (so you can pull in the same name/version from two branches if you want). The encoding of Cargo.lock includes ?branch=foo in the URL of git repositories for this purpose. In an attempt to be "pretty", however, ?branch=master was omitted. This means that your Cargo.lock does not mention ?branch=master anywhere inside of it.

With #8364, however, branch = "master" in Cargo.toml now encodes as ?branch=master in Cargo.lock. Specifying nothing is the only way to omit extra specifiers in Cargo.lock.

Anyway, putting all that together, checking out that repository and running cargo +nightly fetch fails. This is because Cargo sees the lock file but thinks it does not apply for any packages pointing to the substrate git repository. Cargo thinks nothing matches because the lock file says that source is the "default branch" whereas all the manifests say "use the master branch". Seeing this mismatch Cargo discards the lock file for those nodes, and attempts to do an incremental update. Cargo then fails the incremental update with an error message because things have changed in the meantime, and lots of other parts of the graph need to update first. All-in-all this is an encoding disagreement between Cargo versions, and due to the complexity of the dependency graph it's showing up as a dependency update error by accident.


Ok now what to do about this. Some possibilities I can think of:

I'd personally hope that we could continue to land the change as-is and see if updating y'all's Cargo.toml wouldn't be too bad, but I'm likely biased!

@bkchr
Copy link
Author

bkchr commented Jul 9, 2020

Could we support loading the old cargo.lock files with no branch=master until stable writes the same Cargo.lock format?
The problem is that our build process involves stable and nightly. The nightly reuses the cargo.lock from the stable build. So we would always run into this until both versions write the same cargo.lock.

In the end all our old releases are "broken". Removing master from the Cargo.toml files could work for the future, but I will first need to see if there is a tool that changes the branch of all workspace projects (which is currently just a find&sed)

@alexcrichton
Copy link
Member

If this is an issue for y'all the only real fix is to back everything out of Cargo for now. There's no way to roll out changes to the lock file quickly, so we'd have to slow this change down a lot.

You can work across stable/beta/nightly by omitting branch = 'master', but if you've got frozen release artifacts then it's true that historical ones will not build today with nightly.

@est31
Copy link
Member

est31 commented Jul 9, 2020

One could special case branch = "master" as GitReference::MasterBranch for the time being. Then make its lockfile encoding have the legacy behaviour. The commit would still be pinned in the lockfile, just now omitting the branch name can now indicate two things in Cargo.lock: either the branch name is omitted in Cargo.toml as well, or it's set to master. Fundamentally, as commits are pinned, there is no need to encode the branch name in Cargo.lock at all, except for cargo recognizing when you changed the branch name in Cargo.toml and re-downloading stuff automatically, without needing cargo update. So if the proposal is adopted, this would be the only downside, you'd have to run cargo update if you added branch = "master" (or removed such a specification), and master is not the default branch of the repo. But that would be an okay cost to pay to keep the PR inside.

In the long term, one could put version numbers into Cargo.lock and use that to create two lockfile encodings for MasterBranch and DefaultBranch. Such a transition would be what the change would buy time for.

@est31
Copy link
Member

est31 commented Jul 10, 2020

See #8475 .

@bkchr
Copy link
Author

bkchr commented Jul 10, 2020

Nice :)

est31 added a commit to est31/cargo that referenced this issue Jul 10, 2020
@eddyb
Copy link
Member

eddyb commented Jul 12, 2020

I came across this problem before seeing this issue and the explanations, and what I ended up doing is:

sed -i 's/branch = "master"/rev = "e7457b1eb9980596301fe1afd36478a6725157ef"/' ./**/Cargo.toml

That's because we've been collecting performance data (across ~750 Rust PRs by now) for a specific version of Polkadot, and if we allow any versions to change, the performance data wouldn't be as useful (as it's not comparing rustc for the exact same code).

I'll probably leave that hack in, especially with how uniform it is, just because I can trust that going back to any Rust/Cargo version, there's no chance of version updates accidentally succeeding (although I'm pretty sure it's guaranteed to fail, at this point).

@alexcrichton
Copy link
Member

We talked about this at the Cargo meeting today and we didn't reach a firm conclusion one way or another. One thing I wanted to ask, though, @bkchr what would y'all do if we didn't fix this? Is that a viable option for y'all or is this something we really need to fix one way or another?

@mathstuf
Copy link
Contributor

My CI pipeline for my Rust projects uses a single cargo update && cargo vendor to populate the dependencies. These are then fed to multiple toolchains to actually compile. Those stages use --frozen to avoid using other dependencies and my nightly builds are currently failing because of this problem.

@alexcrichton
Copy link
Member

@mathstuf is the fix of removing branch = "master" available to you?

@alexcrichton
Copy link
Member

I've also posted about this on internals to try to get more visibility into this issue as well to see if others have feedback.

@mathstuf
Copy link
Contributor

We're using just git = , so there's nothing to remove. We commit our Cargo.lock and we bump by doing cargo update -p explicitly for the repository's package.

@est31
Copy link
Member

est31 commented Jul 16, 2020

@mathstuf to make sure, you experience the problem with git = "https://github.com/something" alone? No branch = "master" at all? That's weird because it shouldn't happen. Maybe your dependencies use it?

@mathstuf
Copy link
Contributor

The dependency referenced by git is doing path= into its own workspace. Other than that, everything comes from crates.io or has a non-master branch being referenced.

@est31
Copy link
Member

est31 commented Jul 17, 2020

@mathstuf I think I found the cause for your issue. Technically it's a different bug than this one, but also a regression of #8364. Basically, if you have a .toml entry like this:

ogg = { git = "https://github.com/rustaudio/ogg" }

... and run cargo vendor on Rust stable before #8364, you'll get a message like:

To use vendored sources, add this to your .cargo/config for this project:

[source.crates-io]
replace-with = "vendored-sources"

[source."https://github.com/rustaudio/ogg"]
git = "https://github.com/rustaudio/ogg"
branch = "master"
replace-with = "vendored-sources"

If you run cargo vendor with Rust nightly after #8364, the message will be like:


To use vendored sources, add this to your .cargo/config for this project:

[source.crates-io]
replace-with = "vendored-sources"

[source."https://github.com/rustaudio/ogg"]
git = "https://github.com/rustaudio/ogg"
replace-with = "vendored-sources"

Notice the missing branch = "master".

This means that there's an incompatibility between cargo vendor output from prior to #8364 and after #8364, as long as any git dependency was used where a branch wasn't specified. This case is much more common than the case of branch = "master" and furthermore, it's also very serious because it means that many archived tarballs of cargo vendored output won't compile any more with newer Cargo's if you specified git repos in Cargo.toml without specifying a branch or rev.

The obvious workaround of specifying branch = "master" probably won't work (as you'd run into this bug), but specifying rev = "<hash>" should, which is what Firefox seems to be doing already according to a quick glance at searchfox.org, so likely they won't run into the bug.

As this affects much less of a niche use case, I'd recommend reverting #8364 for now from master and beta (it has already entered beta) and figuring out solutions that don't invalidate a large portion of old vendored cargo sources.

@alexcrichton
Copy link
Member

Thanks for the help investigating that @est31, I agree that the change should be reverted given that this is an entirely independent issue.

@alexcrichton
Copy link
Member

I've done a quick literal revert for beta, but I'm taking some time to make a more thought out change on nightly. I'll link the PR here when it's ready.

@bkchr
Copy link
Author

bkchr commented Jul 19, 2020

We talked about this at the Cargo meeting today and we didn't reach a firm conclusion one way or another. One thing I wanted to ask, though, @bkchr what would y'all do if we didn't fix this? Is that a viable option for y'all or is this something we really need to fix one way or another?

I think we would switch to the proposed solution of you. I would already have done this, but I was on vacations this week :) But now I would like to wait for your pr.

@alexcrichton
Copy link
Member

Ok I've opened up #8522 to close this issue.

alexcrichton added a commit to alexcrichton/cargo that referenced this issue Jul 23, 2020
This commit is intended to be an effective but not literal revert
of rust-lang#8364. Internally Cargo will still distinguish between
`DefaultBranch` and `Branch("master")` when reading `Cargo.toml` files,
but for almost all purposes the two are equivalent. This will namely fix
the issue we have with lock file encodings where both are encoded with
no `branch` (and without a branch it's parsed from a lock file as
`DefaultBranch`).

This will preserve the change that `cargo vendor` will not print out
`branch = "master"` annotations but that desugars to match the lock file
on the other end, so it should continue to work.

Tests have been added in this commit for the regressions found on rust-lang#8468.
@bors bors closed this as completed in b49ccad Jul 27, 2020
@de-sh
Copy link

de-sh commented Aug 5, 2020

I would like to report that this issue seems to persist in 1.47.0-nightly. I had opened an issue #8586 for the same. (It was prematurely closed due to confusion)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: bug
Projects
None yet
6 participants