Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use lockfile on incr. comp. directory #32754

Closed
nikomatsakis opened this issue Apr 5, 2016 · 14 comments
Closed

Use lockfile on incr. comp. directory #32754

nikomatsakis opened this issue Apr 5, 2016 · 14 comments
Assignees
Labels
A-incr-comp Area: Incremental compilation T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@nikomatsakis
Copy link
Contributor

Currently we are not detecting the case where multiple rustc processes are using the same incr. comp directory. That is not good.

@nikomatsakis nikomatsakis added the A-incr-comp Area: Incremental compilation label Apr 5, 2016
@alexcrichton
Copy link
Member

FWIW Cargo recently forayed into lock files, and it basically just ended up using the fs2 crate's locking methods (namely flock on Unix I believe). So far seems to be working out well at least!

@michaelwoerister
Copy link
Member

I've been thinking about a different synchronization mechanism lately, one that takes more of a persistent data structure than a mutex approach. It goes roughly like this:

  1. When starting a compilation session, create a sub-directory <session-id>-working within the incr. comp. directory, where <session-id> is a newly generated UUID.
  2. Spawn a "cleanup process" that will delete this directory if the compiler process gets killed unexpectedly and consequently fails to send the periodic keep-alive signal to the cleanup process.
  3. Choose a valid cache sub-directory from the incr. comp. directory. A valid cache-directory formerly was a <session-id>-working directory, but has been renamed to <session-id>-<timestamp> after compilation had succeeded. The timestamp in the directory name allows to choose the newest one, if there is more than one available.
  4. From the chosen directory, copy/hard-link everything into the new <session-id>-working directory. Now we have our own copy of everything and don't need to worry about other processes interfering.
  5. Validate the data we have just copied, purge invalidated items from the cache, etc.
  6. Compile the program
  7. Once we have a valid consistent cache directory again, rename the cache directory from <session-id>-working to <session-id>-<timestamp> thus making it visible to other processes. That's save now because the files in it won't be mutated again.
  8. Shut down the clean-up process, its services have not been needed.
  9. Delete the old cache directory. It's not needed anymore, since we just created a more recent one.

The most important advantage of this strategy is that we are never left with a corrupt cache. If the rustc process is killed midway through compilation, it will leave behind a <session-id>-working directory that we know is unusable. And if the cleanup process does its job, not even that will be left behind. In contrast to that, with a filelock there might be half-written object files and we have to throw away everything (although that could be improved upon too with a similar "change the filename when done" strategy).
Another advantage is that two rustc processes can work concurrently. With the filelock, one will be blocked or will fail immediately.

The disadvantages are that:

  • the "cleanup process" setup is a bit ugly.
  • making the initial "copy" of the working directory might be slow if the underlying file system does not support hard-links (which is at least true for FAT32 and exFAT).
  • it needs some kind of IPC to facilitate communicating between rustc and its cleanup process. Although that could conceivably be done through the file-system too.

Thoughts?

@alexcrichton
Copy link
Member

In the last step the old cache directory is deleted, but concurrent rustc invocations means that the old cache may be in use, right? Basically step (3) needs to be resilient to the cache disappearing, but that seems reasonable?

I'm somewhat wary about spawning a cleanup process, the behavior on Unix at least for a ctrl-c is to typically kill the entire process group, which in this case includes the cleanup process. In that sense if the compiler abnormally terminates so will the child. On Windows the behavior is somewhat similar with what we do today where the parent and child process are both in the same "job object", which when closed will kill both.

I don't really have a great suggestion though, but perhaps we could just leave around these directories or maybe cleanup > 24hr old ones automatically?

@michaelwoerister
Copy link
Member

Regarding the concurrent access: we should be fine with missing files. If the file is actually copied and there is any chance of a halfway copied file, we'd also need to store some kind of checksum to prevent those from being used.

Regarding cleanup: I was thinking about the 'delete unreasonably old working dirs' approach too, as an additional backup. It would probably also be OK to do only that.

Is there no way to spawn an independent process (i.e. one that can outlive its spawner)?

On July 21, 2016 7:56:45 PM GMT+02:00, Alex Crichton notifications@github.com wrote:

In the last step the old cache directory is deleted, but concurrent
rustc invocations means that the old cache may be in use, right?
Basically step (3) needs to be resilient to the cache disappearing, but
that seems reasonable?

I'm somewhat wary about spawning a cleanup process, the behavior on
Unix at least for a ctrl-c is to typically kill the entire process
group, which in this case includes the cleanup process. In that sense
if the compiler abnormally terminates so will the child. On Windows the
behavior is somewhat similar with what we do today where the parent and
child process are both in the same "job object", which when closed will
kill both.

I don't really have a great suggestion though, but perhaps we could
just leave around these directories or maybe cleanup > 24hr old ones
automatically?


You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
#32754 (comment)

@alexcrichton
Copy link
Member

We can emulate spawning an independent process on Unix by changing process groups at least, but it does seem like a surprising thing for a compiler to do, so I'd be somewhat wary of doing that. On Windows with job objects at least I believe it's impossible to spawn an independent process.

@nikomatsakis
Copy link
Contributor Author

@michaelwoerister sounds pretty related to #34957

@michaelwoerister michaelwoerister added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Jul 21, 2016
@michaelwoerister
Copy link
Member

We can emulate spawning an independent process on Unix by changing process groups at least, but it does seem like a surprising thing for a compiler to do, so I'd be somewhat wary of doing that. On Windows with job objects at least I believe it's impossible to spawn an independent process.

Yes, it is a bit surprising. On the other hand, I personally would be OK with the compiler starting a daemon process to support incremental compilation. Especially if it's just an enhancement that keeps my disk from filling up. For example, the way Sublime Text build systems work, I have to manually kill long-running rustc processes all the time. I'd probably end up with a few hundred unused working directories at any point in time if they only get cleaned up after 24 hours :)

@nikomatsakis Yes, definitely related.

@michaelwoerister
Copy link
Member

michaelwoerister commented Jul 21, 2016

Maybe a rustc process could use a filelock on its working directory. Then, everytime a new process starts, it tries to delete all working directories it finds. It can only succeed if the process holding the lock to its own working dir has died.

@alexcrichton
Copy link
Member

File locks work pretty well minus NFS, but we could have a scheme like:

  • First, you make a scratch directory
  • Then you crate a "well known" file in it, acquiring an exclusive lock
  • When cleaning out old directories (maybe those >2s old), you try to grab the lock
  • If grabbing the lock fails, the process must be running
  • If grabbing the lock succeeds, the process must have died so you clean it up. This also sorts out contention among multiple compilers cleaning up

This doesn't help for NFS, but I'm not sure there's really much we can do there?

@michaelwoerister
Copy link
Member

That sounds like a pretty promising approach. NFS we won't be able to support, I'm afraid, but that seems OK to me.
What about filesystems that don't support hardlinking? A lot of potential data copying is involved there (which also widens the time window for two process interfering with each other). Can we tell people to "just use a proper file-system"? :)

@alexcrichton
Copy link
Member

On Windows we actually can't delete a file if someone's opened it, and on Unix if you delete a file I believe all open references still have the same underlying data, so in theory that should be roughly ok if hard linking isn't supported? I'd also be fine saying that we don't work great on things like FAT32. Or at least not to start out

@michaelwoerister michaelwoerister added this to the Incremental compilation alpha milestone Aug 9, 2016
@michaelwoerister
Copy link
Member

Since not addressing this issue might lead to a lot of unreproducible bug reports because of a corrupted cache directory, we decided to add it to the alpha milestone.

@michaelwoerister michaelwoerister self-assigned this Aug 9, 2016
@michaelwoerister
Copy link
Member

@alexcrichton How far, do think, are we away from being able to use dependencies from crates.io in rustc? The fs2 crate looks exactly like we need. But I guess at the moment I have to re-implement file-locking.

@alexcrichton
Copy link
Member

@michaelwoerister still pretty far unfortunately, I'd plan on vendoring for now.

eddyb added a commit to eddyb/rust that referenced this issue Aug 23, 2016
…ing, r=nikomatsakis

Implement synchronization scheme for incr. comp. directory

This PR implements a copy-on-write-based synchronization scheme for the incremental compilation cache directory. For technical details, see the documentation at the beginning of `rustc_incremental/persist/fs.rs`.

The PR contains unit tests for some functions but for testing whether the scheme properly handles races, a more elaborate test setup would be needed. It would probably involve a small tool that allows to manipulate the incremental compilation directory in a controlled way and then letting a compiler instance run against directories in different states. I don't know if it's worth the trouble of adding another test category to `compiletest`, but I'd be happy to do so.

Fixes rust-lang#32754
Fixes rust-lang#34957
eddyb added a commit to eddyb/rust that referenced this issue Aug 23, 2016
…ing, r=nikomatsakis

Implement synchronization scheme for incr. comp. directory

This PR implements a copy-on-write-based synchronization scheme for the incremental compilation cache directory. For technical details, see the documentation at the beginning of `rustc_incremental/persist/fs.rs`.

The PR contains unit tests for some functions but for testing whether the scheme properly handles races, a more elaborate test setup would be needed. It would probably involve a small tool that allows to manipulate the incremental compilation directory in a controlled way and then letting a compiler instance run against directories in different states. I don't know if it's worth the trouble of adding another test category to `compiletest`, but I'd be happy to do so.

Fixes rust-lang#32754
Fixes rust-lang#34957
bors added a commit that referenced this issue Aug 31, 2016
…crichton

Implement synchronization scheme for incr. comp. directory

This PR implements a copy-on-write-based synchronization scheme for the incremental compilation cache directory. For technical details, see the documentation at the beginning of `rustc_incremental/persist/fs.rs`.

The PR contains unit tests for some functions but for testing whether the scheme properly handles races, a more elaborate test setup would be needed. It would probably involve a small tool that allows to manipulate the incremental compilation directory in a controlled way and then letting a compiler instance run against directories in different states. I don't know if it's worth the trouble of adding another test category to `compiletest`, but I'd be happy to do so.

Fixes #32754
Fixes #34957
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-incr-comp Area: Incremental compilation T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

3 participants