Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race condition in local registry crate unpacking #6591

Merged
merged 3 commits into from
Jan 29, 2019

Conversation

hugwijst
Copy link
Contributor

Copy crate and keep exclusive lock to it with local registries. Ensures
that only one instance will try to extract the source of a new package.

Fixes #6588.

Copy crate and keep exclusive lock to it with local registries. Ensures
that only one instance will try to extract the source of a new package.

Fixes rust-lang#6588.
@rust-highfive
Copy link

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @dwijnand (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

let meta = dst.file().metadata()?;
if meta.len() > 0 {
return Ok(MaybeLock::Ready(dst));
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is directly copied from RemoteRegistry. Probably not necessary, though I'm wondering how necessary it is in the current version of RemoteRegistry anyway.

@dwijnand
Copy link
Member

r? @alexcrichton

(Sorry, travelling)

@alexcrichton
Copy link
Member

Thanks for the PR! I'm may be confused though as I'm not sure how the original implementation here suffered from a bug. In the original implementation it's just calculating the checksum for a tarball, so I'm not sure where the race is for concurrent builds?

@hugwijst
Copy link
Contributor Author

hugwijst commented Jan 23, 2019

The race is in the unpacking, I think. LocalRegistry::download always returned a shared FileLock. If I understand correctly, the unpack logic in RegistrySource::unpack_package assumes though that the lock of the tarball implies that the destination directory that is being unpacked into is being locked exclusively. This holds for remote registries, but not for local registries.

I think a better solution might be to actually lock the unpacking instead of the tarball. This could be fairly easily done by locking the .cargo-ok file, maybe writing "ok" to it when it is correctly unpacked to detect an interrupted unpacking job? The main issue with this solution is upgrading from an older version of Cargo, as I'm not sure if unpacking over existing files would work correctly.

@alexcrichton
Copy link
Member

I'm trying to make sense of this, but I don't think this PR as-is is the right solution. It looks like unpacking is actually already racy even for remote downloads because the unpacking step isn't synchronized at all. We do some synchronization to ensure that only one Cargo downloads the crate during remote downloads, but even for remote downloads a shared read lock can be returned (as expected local registries always do).

Looking at the original bug report again it looks like there's a bug in Cargo's error reporting, because there should be an I/O error in the error message as to what actually failed in the underlying system. Currently it just shows that something failed to unpack, but it doesn't say why (or what os error hapened).

Perhaps the locking here could be moved to the unpacking step instead of the downloading step?

@hugwijst
Copy link
Contributor Author

hugwijst commented Jan 25, 2019

I've spent some more time on this and I think I understand a lot better what the issue is now.

Imagine two instances trying to use a package that is not yet unpacked:

  1. Instance 1 is first and starts unpacking.
  2. Instance 2 reaches RegistrySource::unpack_package, sees no .cargo-ok file and also starts unpacking.
  3. Instance 2 trying to unpack the same file instance 1 is trying to unpack results in an IO error.
    Alternatively:
  4. Instance 1 finishes unpacking an creates the .cargo-ok file.
  5. Instance 1 starts compiling the crate that instance 2 is still unpacking. As some files are being unpacked, compilation fails.

This only happens when using a local registry. With a remote registry the registry index gets locked, allowing only one instance at a time to access the registry cache.

I was able to reproduce this by:

  1. Create a local registry mirror of the crates needed by Cargo.
  2. Set CARGO_HOME to a hard disk or some other slower file system, as this will significantly increase the chance of the race resulting in issues.
  3. Create a .cargo/config to point to the local registry mirror.
  4. cargo clean
  5. Execute two cargo commands in parallel that don't share a build directory, eg cargo check & cargo check --release.
    You might need to execute the last two commands a couple of times, but it fails about 50% of the time on my machine.

My latest commit fixes the issue locally.

@hugwijst hugwijst changed the title Fix race condition in local registry crate downloading. Fix race condition in local registry crate unpacking Jan 25, 2019
@alexcrichton
Copy link
Member

That sounds right to me, and the fix looks almost there!

The only other thing I think we need to handle is that open_rw shouldn't happen first. In the fast path we should only use open_ro because this handles cases like:

  • Avoids unnecessary contention when the filesystem is actually ready to go (most of the time)
  • Avoids failing for already-unpacked packages on readonly filesystems (can happen in docker for example)

I think we'll want to start with an open_ro, and if that succeeds check the size of the file. If the file is zero size or nonexistent then we drop the lock and open_rw. After the open_rw we check the size again before proceeding to handle something else grabbing the lock in the middle.

@alexcrichton
Copy link
Member

@bors: r+

Looks great to me, thanks!

@bors
Copy link
Contributor

bors commented Jan 28, 2019

📌 Commit a23612d has been approved by alexcrichton

@bors
Copy link
Contributor

bors commented Jan 28, 2019

⌛ Testing commit a23612d with merge cbc159917ec4350853e241d0aa8f52950dfd4ea4...

@bors
Copy link
Contributor

bors commented Jan 28, 2019

💥 Test timed out

@dwijnand
Copy link
Member

@bors retry

@bors
Copy link
Contributor

bors commented Jan 28, 2019

⌛ Testing commit a23612d with merge a15852c386d9566d242a80cbb58dc94e54f9e764...

@bors
Copy link
Contributor

bors commented Jan 28, 2019

💥 Test timed out

@hugwijst
Copy link
Contributor Author

@bors retry

@bors
Copy link
Contributor

bors commented Jan 28, 2019

@hugwijst: 🔑 Insufficient privileges: not in try users

@dwijnand
Copy link
Member

@bors retry

@bors
Copy link
Contributor

bors commented Jan 29, 2019

⌛ Testing commit a23612d with merge 9f1f786...

bors added a commit that referenced this pull request Jan 29, 2019
Fix race condition in local registry crate unpacking

Copy crate and keep exclusive lock to it with local registries. Ensures
that only one instance will try to extract the source of a new package.

Fixes #6588.
@bors
Copy link
Contributor

bors commented Jan 29, 2019

☀️ Test successful - checks-travis, status-appveyor
Approved by: alexcrichton
Pushing 9f1f786 to master...

@bors bors merged commit a23612d into rust-lang:master Jan 29, 2019
bors added a commit to rust-lang/rust that referenced this pull request Feb 5, 2019
Update cargo

7 commits in 245818076052dd7178f5bb7585f5aec5b6c1e03e..4e74e2fc0908524d17735c768067117d3e84ee9c
2019-01-27 15:17:26 +0000 to 2019-02-02 17:48:44 +0000
- Fix overlapping progress with stdout. (rust-lang/cargo#6618)
- Improve progress bar flickering. (rust-lang/cargo#6615)
- Add detail to multiple rename deps (rust-lang/cargo#6603)
- Fix race condition in local registry crate unpacking (rust-lang/cargo#6591)
- Revert "Make incremental compilation the default for all profiles." (rust-lang/cargo#6610)
- Fixup the docs on crate-type (rust-lang/cargo#6606)
- Document that owner --add now just invites (rust-lang/cargo#6604)
@ehuss ehuss added this to the 1.34.0 milestone Feb 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Build failures executing Cargo concurrently using a local registry
6 participants