Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition during GC #6757

Closed
roberth opened this issue Jul 4, 2022 · 9 comments
Closed

Race condition during GC #6757

roberth opened this issue Jul 4, 2022 · 9 comments
Labels
bug store Issues and pull requests concerning the Nix store

Comments

@roberth
Copy link
Member

roberth commented Jul 4, 2022

Describe the bug

I've encountered a requires non-existing output error when building during garbage collection.

Steps To Reproduce

  1. Start a GC
  2. Wait until GC is deleting paths (the phase where it prints a lot of deleting '<store path>'
  3. Build something with lots of dependencies, such as cd nixpkgs; nix-build -A nixosTests.nixops-unstable
  4. error: derivation '/nix/store/8kwb4yi090vka3af8fi491l33y28lmlz-runtime-deps.drv' requires non-existent output 'bin' from input derivation '/nix/store/mh4dcqrvliqzn022ga737m25zn1l80yp-libidn2-2.3.2.drv'

The .drv was (still or again) present. It specifies a bin output, but the output was collected.

Expected behavior

A concurrent build can succeed.

Any new paths that become referenced during the GC process are retained. Path creation+retention and deletion on the local and daemon stores are atomic procedures.

nix-env --version output

nix-env (Nix) 2.8.1

Additional context

Add any other context about the problem here.

@roberth roberth added the bug label Jul 4, 2022
@bjornfor
Copy link
Contributor

bjornfor commented Jul 4, 2022

Duplicate of #6572?

@Kha
Copy link
Contributor

Kha commented Jul 4, 2022

Might not be a duplicate if the GC hypothesis is true, there was no GC during my reproducer of #6572.

@roberth
Copy link
Member Author

roberth commented Jul 5, 2022

In another case, a client wins:

[...]
deleting '/nix/store/cbcil1svag3z62b6fg42g39lv4skni9x-python3.10-sphinx-4.5.0.drv'
15847 store paths deleted, 17604.89 MiB freed
error: cannot delete path '/nix/store/1j51xaljr8kcj1m3hxfxkmz19yqpa4vf-python3.10-sphinxcontrib-devhelp-1.0.2.drv' because it is in use by '/nix/store/278frg8lhf0ydspbnywgcka0j1g6mkjj-python3.10-sphinx-4.5.0.drv'

The GC does not recover from the exception.
This could be seen as denial of service, considering that a client stops a more privileged process. I'd consider clients with nix-daemon access rather privileged already though, so it seems quite harmless security-wise.

@Cynerd
Copy link

Cynerd commented Nov 18, 2022

I can confirm this behaviour. I configured garbage collection to run automatically, and since then, I have started encountering missing output errors like this:

error: derivation '/nix/store/a9h9gaz8w0vd5amwj67cdzjp7zjdbyj4-arm-none-eabi-stage-final-gcc-11.3.0.drv' requires non-existent output 'out' from input derivation '/nix/store/53jb3pvfsalnc2n2pq4idgbixly187cg-arm-none-eabi-binutils-wrapper-2.39.drv'

While the derivation specifies this output, it is unavailable in the store. My only explanation is that it got garbage collected. Rerunning the same build always results in the same error. I managed to fix it every time by running the build on the derivation file that is missing its output and then on the derivation that requires it. The next full build passes.

@roberth
Copy link
Member Author

roberth commented Nov 18, 2022

@Cynerd these symptoms match with #6572, which was caused by a build dependency handling issue rather than a GC race. The problem has existed since Nix 2.8 and it requires either a garbage collection or a prior substitution of a single output before needing another from the same derivation. It has been solved for the upcoming 2.12 release and backports are in progress.

@Cynerd
Copy link

Cynerd commented Nov 18, 2022

@roberth It seems that you are right. I missed that issue. I am using Nix 2.11 at the moment, so yes it seems to match my issue.

@jkachmar
Copy link

we've been running into this issue (as well as #7370) at work since upgrading to Nix 2.10?

i see that 3ade5f5 has landed on master, is there a timeline for backporting it to 2.10 and/or can i do anything to help with this?

@vcunat
Copy link
Member

vcunat commented Feb 21, 2023

I wonder if the default nix-gc.service should get modified not to get into failed state on (some?) errors and just retry, at least after the timer fires again. Because otherwise the machine's disk might get completely filled due to missing GC, even to the point that GC can't even start anymore (may differ by FS) and being a bit harder to recover manually.

@roberth roberth added the store Issues and pull requests concerning the Nix store label Mar 16, 2023
@roberth
Copy link
Member Author

roberth commented Mar 29, 2023

Possibly resolved by

Closing this until reconfirmed with a Nix version that incorporates that fix, such as >= 2.12.0, or a backport

@roberth roberth closed this as completed Mar 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug store Issues and pull requests concerning the Nix store
Projects
None yet
Development

No branches or pull requests

6 participants