Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove hacks in the Hash impl of PackageId #2423

Merged
merged 7 commits into from
Mar 4, 2016

Conversation

alexcrichton
Copy link
Member

All crates being compiled by Cargo are identified by a unique PackageId instance. This ID incorporates information such as the name, version, and source from where the crate came from. Package ids are allowed to have path sources to depend on local crates on the filesystem. The package id itself encodes the path of where the crate came from.

Historically, however, the "path source" from where these packages are learned had some interesting logic. Specifically one specific source would be able to return many packages within. In other words, a path source would recursively walk crate definitions and the filesystem attempting to find crates. Each crate returned from a source has the same source id, so consequently all packages from one source path would have the same source path id.

This in turn leads to confusing an surprising behavior, for example:

  • When crates are compiled the status message indicates the path of the crate root, not the crate being compiled
  • When viewed from two different locations (e.g. two different crate roots) the same package would have two different source ids because the id is based on the root location.

This hash mismatch has been papered over in the past to try to fix some spurious recompiles, but it unfortunately leaked back in. This is clearly indicative of the "hack" being inappropriate so instead these commits fix the root of the problem.


In short, these commits ensure that the package id for a package defined locally has a path that points precisely at that package. This was a relatively invasive change and had ramifications on a few specific portions which now require a bit of extra code to support.

The fundamental change here was to change PathSource to be non-recursive by default in terms of what packages it thinks it contains. There are still two recursive use cases, git repositories and path overrides, which are used for backwards compatibility. This meant, however, that the packaging step for crate no longer has knowledge of other crates in a repository to filter out files from. Some specific logic was added to assist in discovering a git repository as well as filtering out sibling packages.

Another ramification of this patch, however, is that special care needs to be taken when decoding a lockfile. We now need all path dependencies in the lockfile to point precisely at where the path dependency came from, and this information is not encoded in the lock file. The decoding support was altered to do a simple probe of the filesystem to recursively walk path dependencies to ensure that we can match up packages in a lock file to where they're found on the filesystem.

Overall, however, this commit closes #1697 and also addresses servo/servo#9794 where this issue was originally reported.

@rust-highfive
Copy link

r? @huonw

(rust_highfive has picked a reviewer for you, use r? to override)

@alexcrichton
Copy link
Member Author

r? @brson

cc @wycats -- some interesting changes to PathSource and lockfile decoding
cc @larsbergstrom -- this is the fix to servo/servo#9794

@rust-highfive rust-highfive assigned brson and unassigned huonw Mar 1, 2016
@larsbergstrom
Copy link

Awesome - thanks for getting on this so quickly!

@bors
Copy link
Contributor

bors commented Mar 3, 2016

☔ The latest upstream changes (presumably #2433) made this pull request unmergeable. Please resolve the merge conflicts.

@brson
Copy link
Contributor

brson commented Mar 3, 2016

Is fix decoding of lockfiles for backwards compatibility with previous Cargo's? Do new lockfiles incorprate the more precise source ids?

@brson
Copy link
Contributor

brson commented Mar 3, 2016

r=me but these are deep changes that I don't fully understand.

This does much more I/O than Package::for_path and this is also on its way out.
Right now there's a few hacks here and there to "correctly" hash package ids by
taking a package's root path into account instead of the path store in the
package id. The purpose of this was to solve issues where the same package
referenced from two locations ended up having two different hashes.

This hack leaked, however, into the implementation of fingerprints which in
turned ended up causing spurious rebuilds. Fix this problem once and for all by
just defining hashing on package ids the natural and expected way.
This mirrors the behavior that they have today. The `load` method for path
sources will by default return a non-recursive `PathSource` which unfortunately
isn't what we want here.
With the previous changes a path dependency must have the precise path to it
listed in its package id. Currently when decoding a lockfile, however, all path
dependencies have the same package id, which unfortunately causes a mismatch.

This commit alters the decoding of a lockfile to perform some simple path
traversals to probe the filesystem to understand where path dependencies are and
set the right package id for the found packages.
Currently the packaging logic depends on the old recursive nature of path
sources for a few points:

* Discovery of a git repository of a package.
* Filtering out of sibling packages for only including the right set of files.

For a non-recursive path source (now essentially the default) we can no longer
assume that we have a listing of all packages. Subsequently this logic was
tweaked to allow:

* Instead of looking for packages at the root of a repo, we instead look for a
  Cargo.toml at the root of a git repository.
* We keep track of all Cargo.toml files found in a repository and prune out all
  files which appear to be ancestors of that package.
The package id for path dependencies now has another path component pointing
precisely to the package being compiled, so lots of tests need their output
matches to get updated.
@alexcrichton
Copy link
Member Author

Yeah that commit is to preserve backwards compatibility. For now we don't encode more precise source ids, the lock file generation is the same as it was before.

@alexcrichton
Copy link
Member Author

@bors: r=brson 51e8696e4556f22012fe76c119a8f73e7610c1f2

@bors
Copy link
Contributor

bors commented Mar 3, 2016

⌛ Testing commit 51e8696 with merge 45df347...

@bors
Copy link
Contributor

bors commented Mar 3, 2016

💔 Test failed - cargo-linux-64

@alexcrichton
Copy link
Member Author

@bors: r=brson 57aeb23

@bors
Copy link
Contributor

bors commented Mar 3, 2016

⌛ Testing commit 57aeb23 with merge e671374...

@bors
Copy link
Contributor

bors commented Mar 3, 2016

💔 Test failed - cargo-win-msvc-64

When compiling a package from two separate locations it should be cached the
same way both times.
@alexcrichton
Copy link
Member Author

@bors: r=brson d3d206d

@bors
Copy link
Contributor

bors commented Mar 4, 2016

⌛ Testing commit d3d206d with merge 2eade62...

bors added a commit that referenced this pull request Mar 4, 2016
All crates being compiled by Cargo are identified by a unique `PackageId` instance. This ID incorporates information such as the name, version, and source from where the crate came from. Package ids are allowed to have path sources to depend on local crates on the filesystem. The package id itself encodes the path of where the crate came from.

Historically, however, the "path source" from where these packages are learned had some interesting logic. Specifically one specific source would be able to return many packages within. In other words, a path source would recursively walk crate definitions and the filesystem attempting to find crates. Each crate returned from a source has the same source id, so consequently all packages from one source path would have the same source path id.

This in turn leads to confusing an surprising behavior, for example:

* When crates are compiled the status message indicates the path of the crate root, not the crate being compiled
* When viewed from two different locations (e.g. two different crate roots) the same package would have two different source ids because the id is based on the root location.

This hash mismatch has been [papered over](#1697) in the past to try to fix some spurious recompiles, but it unfortunately [leaked back in](#2279). This is clearly indicative of the "hack" being inappropriate so instead these commits fix the root of the problem.

---

In short, these commits ensure that the package id for a package defined locally has a path that points precisely at that package. This was a relatively invasive change and had ramifications on a few specific portions which now require a bit of extra code to support.

The fundamental change here was to change `PathSource` to be non-recursive by default in terms of what packages it thinks it contains. There are still two recursive use cases, git repositories and path overrides, which are used for backwards compatibility. This meant, however, that the packaging step for crate no longer has knowledge of other crates in a repository to filter out files from. Some specific logic was added to assist in discovering a git repository as well as filtering out sibling packages.

Another ramification of this patch, however, is that special care needs to be taken when decoding a lockfile. We now need all path dependencies in the lockfile to point precisely at where the path dependency came from, and this information is not encoded in the lock file. The decoding support was altered to do a simple probe of the filesystem to recursively walk path dependencies to ensure that we can match up packages in a lock file to where they're found on the filesystem.

Overall, however, this commit closes #1697 and also addresses servo/servo#9794 where this issue was originally reported.
@bors
Copy link
Contributor

bors commented Mar 4, 2016

@bors bors merged commit d3d206d into rust-lang:master Mar 4, 2016
@larsbergstrom
Copy link

Yay! Thanks a ton :)

On Mar 3, 2016, at 7:15 PM, bors notifications@github.com wrote:

Merged #2423 #2423.


Reply to this email directly or view it on GitHub
#2423 (comment).

@alexcrichton alexcrichton deleted the fix-pkgid-hash branch March 28, 2016 05:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants