-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove hacks in the Hash impl of PackageId #2423
Conversation
r? @huonw (rust_highfive has picked a reviewer for you, use r? to override) |
r? @brson cc @wycats -- some interesting changes to |
Awesome - thanks for getting on this so quickly! |
1d2dfff
to
7a1f005
Compare
☔ The latest upstream changes (presumably #2433) made this pull request unmergeable. Please resolve the merge conflicts. |
7a1f005
to
579dee0
Compare
Is fix decoding of lockfiles for backwards compatibility with previous Cargo's? Do new lockfiles incorprate the more precise source ids? |
r=me but these are deep changes that I don't fully understand. |
This does much more I/O than Package::for_path and this is also on its way out.
Right now there's a few hacks here and there to "correctly" hash package ids by taking a package's root path into account instead of the path store in the package id. The purpose of this was to solve issues where the same package referenced from two locations ended up having two different hashes. This hack leaked, however, into the implementation of fingerprints which in turned ended up causing spurious rebuilds. Fix this problem once and for all by just defining hashing on package ids the natural and expected way.
This mirrors the behavior that they have today. The `load` method for path sources will by default return a non-recursive `PathSource` which unfortunately isn't what we want here.
With the previous changes a path dependency must have the precise path to it listed in its package id. Currently when decoding a lockfile, however, all path dependencies have the same package id, which unfortunately causes a mismatch. This commit alters the decoding of a lockfile to perform some simple path traversals to probe the filesystem to understand where path dependencies are and set the right package id for the found packages.
Currently the packaging logic depends on the old recursive nature of path sources for a few points: * Discovery of a git repository of a package. * Filtering out of sibling packages for only including the right set of files. For a non-recursive path source (now essentially the default) we can no longer assume that we have a listing of all packages. Subsequently this logic was tweaked to allow: * Instead of looking for packages at the root of a repo, we instead look for a Cargo.toml at the root of a git repository. * We keep track of all Cargo.toml files found in a repository and prune out all files which appear to be ancestors of that package.
The package id for path dependencies now has another path component pointing precisely to the package being compiled, so lots of tests need their output matches to get updated.
Yeah that commit is to preserve backwards compatibility. For now we don't encode more precise source ids, the lock file generation is the same as it was before. |
579dee0
to
51e8696
Compare
@bors: r=brson 51e8696e4556f22012fe76c119a8f73e7610c1f2 |
⌛ Testing commit 51e8696 with merge 45df347... |
💔 Test failed - cargo-linux-64 |
51e8696
to
57aeb23
Compare
@bors: r=brson 57aeb23 |
⌛ Testing commit 57aeb23 with merge e671374... |
💔 Test failed - cargo-win-msvc-64 |
When compiling a package from two separate locations it should be cached the same way both times.
57aeb23
to
d3d206d
Compare
All crates being compiled by Cargo are identified by a unique `PackageId` instance. This ID incorporates information such as the name, version, and source from where the crate came from. Package ids are allowed to have path sources to depend on local crates on the filesystem. The package id itself encodes the path of where the crate came from. Historically, however, the "path source" from where these packages are learned had some interesting logic. Specifically one specific source would be able to return many packages within. In other words, a path source would recursively walk crate definitions and the filesystem attempting to find crates. Each crate returned from a source has the same source id, so consequently all packages from one source path would have the same source path id. This in turn leads to confusing an surprising behavior, for example: * When crates are compiled the status message indicates the path of the crate root, not the crate being compiled * When viewed from two different locations (e.g. two different crate roots) the same package would have two different source ids because the id is based on the root location. This hash mismatch has been [papered over](#1697) in the past to try to fix some spurious recompiles, but it unfortunately [leaked back in](#2279). This is clearly indicative of the "hack" being inappropriate so instead these commits fix the root of the problem. --- In short, these commits ensure that the package id for a package defined locally has a path that points precisely at that package. This was a relatively invasive change and had ramifications on a few specific portions which now require a bit of extra code to support. The fundamental change here was to change `PathSource` to be non-recursive by default in terms of what packages it thinks it contains. There are still two recursive use cases, git repositories and path overrides, which are used for backwards compatibility. This meant, however, that the packaging step for crate no longer has knowledge of other crates in a repository to filter out files from. Some specific logic was added to assist in discovering a git repository as well as filtering out sibling packages. Another ramification of this patch, however, is that special care needs to be taken when decoding a lockfile. We now need all path dependencies in the lockfile to point precisely at where the path dependency came from, and this information is not encoded in the lock file. The decoding support was altered to do a simple probe of the filesystem to recursively walk path dependencies to ensure that we can match up packages in a lock file to where they're found on the filesystem. Overall, however, this commit closes #1697 and also addresses servo/servo#9794 where this issue was originally reported.
☀️ Test successful - cargo-linux-32, cargo-linux-64, cargo-mac-32, cargo-mac-64, cargo-win-gnu-32, cargo-win-gnu-64, cargo-win-msvc-32, cargo-win-msvc-64 |
Yay! Thanks a ton :) On Mar 3, 2016, at 7:15 PM, bors notifications@github.com wrote: — |
All crates being compiled by Cargo are identified by a unique
PackageId
instance. This ID incorporates information such as the name, version, and source from where the crate came from. Package ids are allowed to have path sources to depend on local crates on the filesystem. The package id itself encodes the path of where the crate came from.Historically, however, the "path source" from where these packages are learned had some interesting logic. Specifically one specific source would be able to return many packages within. In other words, a path source would recursively walk crate definitions and the filesystem attempting to find crates. Each crate returned from a source has the same source id, so consequently all packages from one source path would have the same source path id.
This in turn leads to confusing an surprising behavior, for example:
This hash mismatch has been papered over in the past to try to fix some spurious recompiles, but it unfortunately leaked back in. This is clearly indicative of the "hack" being inappropriate so instead these commits fix the root of the problem.
In short, these commits ensure that the package id for a package defined locally has a path that points precisely at that package. This was a relatively invasive change and had ramifications on a few specific portions which now require a bit of extra code to support.
The fundamental change here was to change
PathSource
to be non-recursive by default in terms of what packages it thinks it contains. There are still two recursive use cases, git repositories and path overrides, which are used for backwards compatibility. This meant, however, that the packaging step for crate no longer has knowledge of other crates in a repository to filter out files from. Some specific logic was added to assist in discovering a git repository as well as filtering out sibling packages.Another ramification of this patch, however, is that special care needs to be taken when decoding a lockfile. We now need all path dependencies in the lockfile to point precisely at where the path dependency came from, and this information is not encoded in the lock file. The decoding support was altered to do a simple probe of the filesystem to recursively walk path dependencies to ensure that we can match up packages in a lock file to where they're found on the filesystem.
Overall, however, this commit closes #1697 and also addresses servo/servo#9794 where this issue was originally reported.