Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid race conditions with recursive rm #50842

Merged
merged 1 commit into from
Aug 10, 2023
Merged

Conversation

simonbyrne
Copy link
Contributor

If two processes attempt to recursively delete a directory at the same time, then we can end up in a state where the initial isdir is true, but by the time it actually deletes the directory it is already gone. e.g.

I've been conservative and only applied this when force=true, but perhaps it should apply generally?

simonbyrne added a commit to simonbyrne/GPUCompiler.jl that referenced this pull request Aug 8, 2023
We are seeing occasional failures when multiple jobs attempt to invoke the function at the same time. This should be fixed by JuliaLang/julia#50842, but in the meantime we can add a pidlock.
@staticfloat
Copy link
Member

I think it is correct to throw an error if force is not set and someone else removes it before we can, as the non-racey version of that also errors.

@Seelengrab Seelengrab added the filesystem Underlying file system and functions that use it label Aug 8, 2023
@simonbyrne
Copy link
Contributor Author

Would it be possible to get this backported as well?

@staticfloat staticfloat added the backport 1.9 Change should be backported to release-1.9 label Aug 8, 2023
@KristofferC KristofferC mentioned this pull request Aug 10, 2023
35 tasks
@staticfloat staticfloat merged commit cbd3c89 into master Aug 10, 2023
@staticfloat staticfloat deleted the sb/rm-race-condition branch August 10, 2023 16:25
KristofferC added a commit that referenced this pull request Aug 18, 2023
Backported PRs:
- [x] #47782 <!-- Generalize Bool parse method to AbstractString -->
- [x] #48634 <!-- Remove unused "deps" mechanism in internal sorting
keywords [NFC] -->
- [x] #49931 <!-- Lock finalizers' lists at exit -->
- [x] #50064 <!-- Fix numbered prompt with input only with comment -->
- [x] #50474 <!-- docs: Fix a `!!! note` which was miscapitalized -->
- [x] #50516 <!-- Fix visibility of assert on GCC12/13 -->
- [x] #50635 <!-- `versioninfo()`: include build info and unofficial
warning -->
- [x] #49915 <!-- Revert "Remove number / vector (#44358)" -->
- [x] #50781 <!-- fix `bit_map!` with aliasing -->
- [x] #50845 <!-- fix #50438, use default pool for at-threads -->
- [x] #49031 <!-- Update inference.md -->
- [x] #50289 <!-- Initialize prev_nold and nold in gc_reset_page -->
- [x] #50559 <!-- Expand kwcall lowering positional default check to
vararg -->
- [x] #49582 <!-- Update HISTORY.md for `DelimitedFiles` -->
- [x] #50341 <!-- invokelatest docs should say not exported before 1.9
-->
- [x] #50525 <!-- only check that values are finite in `generic_lufact`
when `check=true` -->
- [x] #50444 <!-- Optimize getfield lowering to avoid boxing in some
cases -->
- [x] #50523 <!-- Avoid generic call in most cases for getproperty -->
- [x] #50860 <!-- Add `Base.get_extension` to docs/API -->
- [x] #50164 <!-- codegen: handle dead code with unsafe_store of FCA
pointers -->
- [x] #50568 <!-- `Array(::AbstractRange)` should return an `Array` -->
- [x] #50871 <!-- macOS: Don't inspect dead threadtls during exception
handling. -->

Need manual backport:
- [ ] #48542 <!-- Add docs on task-specific buffering using
multithreading -->
- [ ] #50591 <!-- build: fix various makefile bugs -->


Non-merged PRs with backport label:
- [ ] #50842 <!-- Avoid race conditions with recursive rm -->
- [ ] #50823 <!-- Make ranges more robust with unsigned indexes. -->
- [ ] #50663 <!-- Fix Expr(:loopinfo) codegen -->
- [ ] #49716 <!-- Update varinfo() docstring signature -->
- [ ] #49713 <!-- prevent REPL from erroring in numbered mode in some
situations -->
- [ ] #49573 <!-- Implement jl_cpu_pause on PPC64 -->
- [ ] #48726 <!-- fix macro expansion of property destructuring -->
- [ ] #48642 <!-- Use gc alloc instead of alloc typed in lowering -->
- [ ] #48183 <!-- Don't use pkgimage for package if any includes fall in
tracked path for coverage or alloc tracking -->
- [ ] #48050 <!-- improve `--heap-size-hint` arg handling -->
- [ ] #47615 <!-- Allow threadsafe access to buffer of type inference
profiling trees -->
KristofferC pushed a commit that referenced this pull request Aug 18, 2023
If two processes attempt to recursively delete a directory at the same
time, then we can end up in a state where the initial `isdir` is `true`,
but by the time it actually deletes the directory it is already gone.
e.g.
-
https://buildkite.com/clima/climacore-ci/builds/2460#0189d254-76a9-474b-ad25-e5b16440d629/140-142
which is triggered by

https://github.com/cjdoris/PackageExtensionCompat.jl/blob/636eb5a14ddf9134d004c93f598515903af26443/src/PackageExtensionCompat.jl#L59

-
https://buildkite.com/clima/climacore-ci/builds/2457#0189c7fe-8872-40c5-9106-da2e621ff55a/139-150
which is triggered by

https://github.com/JuliaGPU/GPUCompiler.jl/blob/06e670657d7ceebc1845d7c9534a8352c33490de/src/rtlib.jl#L152

I've been conservative and only applied this when `force=true`, but
perhaps it should apply generally?

(cherry picked from commit cbd3c89)
IanButterworth pushed a commit that referenced this pull request Aug 19, 2023
If two processes attempt to recursively delete a directory at the same
time, then we can end up in a state where the initial `isdir` is `true`,
but by the time it actually deletes the directory it is already gone.
e.g.
-
https://buildkite.com/clima/climacore-ci/builds/2460#0189d254-76a9-474b-ad25-e5b16440d629/140-142
which is triggered by

https://github.com/cjdoris/PackageExtensionCompat.jl/blob/636eb5a14ddf9134d004c93f598515903af26443/src/PackageExtensionCompat.jl#L59

-
https://buildkite.com/clima/climacore-ci/builds/2457#0189c7fe-8872-40c5-9106-da2e621ff55a/139-150
which is triggered by

https://github.com/JuliaGPU/GPUCompiler.jl/blob/06e670657d7ceebc1845d7c9534a8352c33490de/src/rtlib.jl#L152

I've been conservative and only applied this when `force=true`, but
perhaps it should apply generally?

(cherry picked from commit cbd3c89)
IanButterworth pushed a commit that referenced this pull request Aug 25, 2023
If two processes attempt to recursively delete a directory at the same
time, then we can end up in a state where the initial `isdir` is `true`,
but by the time it actually deletes the directory it is already gone.
e.g.
-
https://buildkite.com/clima/climacore-ci/builds/2460#0189d254-76a9-474b-ad25-e5b16440d629/140-142
which is triggered by

https://github.com/cjdoris/PackageExtensionCompat.jl/blob/636eb5a14ddf9134d004c93f598515903af26443/src/PackageExtensionCompat.jl#L59

-
https://buildkite.com/clima/climacore-ci/builds/2457#0189c7fe-8872-40c5-9106-da2e621ff55a/139-150
which is triggered by

https://github.com/JuliaGPU/GPUCompiler.jl/blob/06e670657d7ceebc1845d7c9534a8352c33490de/src/rtlib.jl#L152

I've been conservative and only applied this when `force=true`, but
perhaps it should apply generally?

(cherry picked from commit cbd3c89)
nalimilan pushed a commit that referenced this pull request Nov 5, 2023
If two processes attempt to recursively delete a directory at the same
time, then we can end up in a state where the initial `isdir` is `true`,
but by the time it actually deletes the directory it is already gone.
e.g.
-
https://buildkite.com/clima/climacore-ci/builds/2460#0189d254-76a9-474b-ad25-e5b16440d629/140-142
which is triggered by

https://github.com/cjdoris/PackageExtensionCompat.jl/blob/636eb5a14ddf9134d004c93f598515903af26443/src/PackageExtensionCompat.jl#L59

-
https://buildkite.com/clima/climacore-ci/builds/2457#0189c7fe-8872-40c5-9106-da2e621ff55a/139-150
which is triggered by

https://github.com/JuliaGPU/GPUCompiler.jl/blob/06e670657d7ceebc1845d7c9534a8352c33490de/src/rtlib.jl#L152

I've been conservative and only applied this when `force=true`, but
perhaps it should apply generally?

(cherry picked from commit cbd3c89)
KristofferC added a commit that referenced this pull request Nov 7, 2023
Backported PRs:
- [x] #49357 <!-- Fix unclosed code fence in src/manual/methods.md -->
- [x] #50842 <!-- Avoid race conditions with recursive rm -->
- [x] #50858 <!-- Add a `threadpool` parameter to `Channel` constructor
-->
- [x] #50730 <!-- Fix integer overflow in `isapprox` -->
- [x] #50823 <!-- Make ranges more robust with unsigned indexes. -->
- [x] #50915 <!-- Add note the `Task` about sticky bit -->
- [x] #50989 <!-- fix incorrect results in `expm1(::Union{Float16,
Float32})` -->
- [x] #50912 <!-- Separate foreign threads into a :foreign threadpool
-->
- [x] #51019 <!-- fix a case of potentially use of undefined variable
when handling error in distributed message processing -->
- [x] #51222 <!-- Check again if the tty is open inside the IO lock -->
- [x] #51254 <!-- Ryu: make sure adding zeros does not overwrite
trailing dot -->
- [x] #51284 <!-- Avoid infinite loop when doing SIGTRAP in arm64-apple
-->
- [x] #51491 <!-- Throw clearer ArgumentError for strip with two string
args -->
- [x] #51531 <!-- fix `_tryonce_download_from_cache` (busybox.exe
download error) -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 1.9 Change should be backported to release-1.9 filesystem Underlying file system and functions that use it
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants