Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nix: set enableParallelChecking = false #272438

Closed
wants to merge 1 commit into from
Closed

nix: set enableParallelChecking = false #272438

wants to merge 1 commit into from

Conversation

ghost
Copy link

@ghost ghost commented Dec 6, 2023

tests/gc-auto.sh.test and others fail nondeterministically. This is extremely frustrating for doing development on staging where every PR has to rebuild nix. This is not worth whatever minor speedup we get from the parallel checks.

tests/gc-auto.sh.test and others fail nondeterministically.  This is
extremely frustrating for doing development on `staging` where every
PR has to rebuild `nix`.
@ghost ghost requested a review from RaitoBezarius December 6, 2023 09:11
Copy link
Member

@RaitoBezarius RaitoBezarius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the fix, I think it's important to nail down the source of nondeterminism so that the Nix core team can work on this on their pace and fix it maybe.

@ghost
Copy link
Author

ghost commented Dec 6, 2023

Failed build:

nix> running install tests
nix> installcheck flags: -j32 SHELL=/nix/store/q1c2flcykgr4wwg5a6h450hxbk4ch589-bash-5.2-p15/bin/bash --jobserver-style=pipe profiledir=\$\(out\)/etc/profile.d installcheck
nix>   GEN    tests/common.sh
nix>   CXX    tests/plugins/plugintest.o
nix>   LD     tests/plugins/libplugintest.so
nix> ran test tests/init.sh... [PASS]
nix> ran test tests/nix-profile.sh... [PASS]
nix> ran test tests/add.sh... [PASS]
nix> ran test tests/filter-source.sh... [PASS]
nix> ran test tests/misc.sh... [PASS]
nix> ran test tests/import-derivation.sh... [PASS]
nix> ran test tests/dump-db.sh... [PASS]
nix> ran test tests/dependencies.sh... [PASS]
nix> ran test tests/pass-as-file.sh... [PASS]
nix> ran test tests/secure-drv-outputs.sh... [PASS]
nix> ran test tests/simple.sh... [PASS]
nix> ran test tests/case-hack.sh... [PASS]
nix> ran test tests/optimise-store.sh... [PASS]
nix> ran test tests/placeholders.sh... [PASS]
nix> ran test tests/logging.sh... [PASS]
nix> ran test tests/tarball.sh... [PASS]
nix> ran test tests/fetchGit.sh... [SKIP]
nix> ran test tests/gc-runtime.sh... [PASS]
nix> ran test tests/nix-channel.sh... [PASS]
nix> ran test tests/check-refs.sh... [PASS]
nix> ran test tests/fetchMercurial.sh... [SKIP]
nix> ran test tests/build-remote.sh... [PASS]
nix> ran test tests/export.sh... [PASS]
nix> ran test tests/structured-attrs.sh... [PASS]
nix> ran test tests/export-graph.sh... [PASS]
nix> ran test tests/check-reqs.sh... [PASS]
nix> ran test tests/substitute-with-invalid-ca.sh... [PASS]
nix> ran test tests/plugins.sh... [PASS]
nix> ran test tests/repair.sh... [PASS]
nix> ran test tests/fetchurl.sh... [PASS]
nix> ran test tests/gc-auto.sh... [FAIL]
nix>     + clearStore
nix>     + echo 'clearing store...'
nix>     clearing store...
nix>     + chmod -R +w /build/nix-test/gc-auto/store
nix>     + rm -rf /build/nix-test/gc-auto/store
nix>     + mkdir /build/nix-test/gc-auto/store
nix>     + rm -rf /build/nix-test/gc-auto/var/nix
nix>     + mkdir /build/nix-test/gc-auto/var/nix
nix>     + nix-store --init
nix>     + clearProfiles
nix>     + profiles=/build/nix-test/gc-auto/var/nix/profiles
nix>     + rm -rf /build/nix-test/gc-auto/var/nix/profiles
nix>     ++ nix add-to-store --name garbage1 ./nar-access.sh
nix>     warning: you don't have Internet access; disabling some network-dependent features
nix>     + garbage1=/build/nix-test/gc-auto/store/3z0j1ivcfvixy42ify5m7qi2p9pay7g0-garbage1
nix>     ++ nix add-to-store --name garbage2 ./nar-access.sh
nix>     warning: you don't have Internet access; disabling some network-dependent features
nix>     + garbage2=/build/nix-test/gc-auto/store/0l5n06018x7iwf1hajbm3kallfhgl5jj-garbage2
nix>     ++ nix add-to-store --name garbage3 ./nar-access.sh
nix>     warning: you don't have Internet access; disabling some network-dependent features
nix>     + garbage3=/build/nix-test/gc-auto/store/b2b8z3iq42z7z78vvzq4d7pi8pd4mlqp-garbage3
nix>     + ls -l /build/nix-test/gc-auto/store/b2b8z3iq42z7z78vvzq4d7pi8pd4mlqp-garbage3
nix>     -r--r--r-- 1 nixbld nixbld 1853 Jan  1  1970 /build/nix-test/gc-auto/store/b2b8z3iq42z7z78vvzq4d7pi8pd4mlqp-garbage3
nix>     + POSIXLY_CORRECT=1
nix>     + du /build/nix-test/gc-auto/store/b2b8z3iq42z7z78vvzq4d7pi8pd4mlqp-garbage3
nix>     8   /build/nix-test/gc-auto/store/b2b8z3iq42z7z78vvzq4d7pi8pd4mlqp-garbage3
nix>     + fake_free=/build/nix-test/gc-auto/fake-free
nix>     + export _NIX_TEST_FREE_SPACE_FILE=/build/nix-test/gc-auto/fake-free
nix>     + _NIX_TEST_FREE_SPACE_FILE=/build/nix-test/gc-auto/fake-free
nix>     + echo 1100
nix>     ++ cat
nix>     + expr='with import ./config.nix; mkDerivation {
nix>       name = "gc-A";
nix>       buildCommand = '\'''\''
nix>         set -x
nix>         [[ $(ls $NIX_STORE/*-garbage? | wc -l) = 3 ]]
nix>         mkdir $out
nix>         echo foo > $out/bar
nix>         echo 1...
nix>         sleep 2
nix>         echo 200 > /build/nix-test/gc-auto/fake-free.tmp1
nix>         mv /build/nix-test/gc-auto/fake-free.tmp1 /build/nix-test/gc-auto/fake-free
nix>         echo 2...
nix>         sleep 2
nix>         echo 3...
nix>         sleep 2
nix>         echo 4...
nix>         [[ $(ls $NIX_STORE/*-garbage? | wc -l) = 1 ]]
nix>       '\'''\'';
nix>     }'
nix>     ++ cat
nix>     + expr2='with import ./config.nix; mkDerivation {
nix>       name = "gc-B";
nix>       buildCommand = '\'''\''
nix>         set -x
nix>         mkdir $out
nix>         echo foo > $out/bar
nix>         echo 1...
nix>         sleep 2
nix>         echo 200 > /build/nix-test/gc-auto/fake-free.tmp2
nix>         mv /build/nix-test/gc-auto/fake-free.tmp2 /build/nix-test/gc-auto/fake-free
nix>         echo 2...
nix>         sleep 2
nix>         echo 3...
nix>         sleep 2
nix>         echo 4...
nix>       '\'''\'';
nix>     }'
nix>     + pid=5087
nix>     + nix build -v -o /build/nix-test/gc-auto/result-B -L '(with import ./config.nix; mkDerivation {
nix>       name = "gc-B";
nix>       buildCommand = '\'''\''
nix>         set -x
nix>         mkdir $out
nix>         echo foo > $out/bar
nix>         echo 1...
nix>         sleep 2
nix>         echo 200 > /build/nix-test/gc-auto/fake-free.tmp2
nix>         mv /build/nix-test/gc-auto/fake-free.tmp2 /build/nix-test/gc-auto/fake-free
nix>         echo 2...
nix>         sleep 2
nix>         echo 3...
nix>         sleep 2
nix>         echo 4...
nix>       '\'''\'';
nix>     })' --min-free 1000 --max-free 2000 --min-free-check-interval 1
nix>     + nix build -v -o /build/nix-test/gc-auto/result-A -L '(with import ./config.nix; mkDerivation {
nix>       name = "gc-A";
nix>       buildCommand = '\'''\''
nix>         set -x
nix>         [[ $(ls $NIX_STORE/*-garbage? | wc -l) = 3 ]]
nix>         mkdir $out
nix>         echo foo > $out/bar
nix>         echo 1...
nix>         sleep 2
nix>         echo 200 > /build/nix-test/gc-auto/fake-free.tmp1
nix>         mv /build/nix-test/gc-auto/fake-free.tmp1 /build/nix-test/gc-auto/fake-free
nix>         echo 2...
nix>         sleep 2
nix>         echo 3...
nix>         sleep 2
nix>         echo 4...
nix>         [[ $(ls $NIX_STORE/*-garbage? | wc -l) = 1 ]]
nix>       '\'''\'';
nix>     })' --min-free 1000 --max-free 2000 --min-free-check-interval 1
nix>     warning: you don't have Internet access; disabling some network-dependent features
nix>     building '/build/nix-test/gc-auto/store/1m79amxhs79rb01wiv6n0m12cc38ii1a-gc-A.drv'...
nix>     gc-A> +++ wc -l
nix>     gc-A> +++ ls /build/nix-test/gc-auto/store/0l5n06018x7iwf1hajbm3kallfhgl5jj-garbage2 /build/nix-test/gc-auto/store/3z0j1ivcfvixy42ify5m7qi2p9pay7g0-garbage1 /build/nix-test/gc-auto/store/b2b8z3iq42z7z78vvzq4d7pi8pd4mlqp-garbage3
nix>     gc-A> ++ [[ 3 = 3 ]]
nix>     gc-A> ++ mkdir /build/nix-test/gc-auto/store/xcmbln6qf722lk997q3vfqvjvg82ll7y-gc-A
nix>     warning: you don't have Internet access; disabling some network-dependent features
nix>     gc-A> ++ echo foo
nix>     gc-A> ++ echo 1...
nix>     gc-A> 1...
nix>     gc-A> ++ sleep 2
nix>     building '/build/nix-test/gc-auto/store/bffh09p094ixasg0pas5mjzzh45x22md-gc-B.drv'...
nix>     gc-B> ++ mkdir /build/nix-test/gc-auto/store/j3y7xyc6r4hjgl05if9p4zwjh7spy352-gc-B
nix>     gc-B> ++ echo foo
nix>     gc-B> ++ echo 1...
nix>     gc-B> 1...
nix>     gc-B> ++ sleep 2
nix>     gc-A> ++ echo 200
nix>     gc-A> ++ mv /build/nix-test/gc-auto/fake-free.tmp1 /build/nix-test/gc-auto/fake-free
nix>     gc-A> ++ echo 2...
nix>     gc-A> 2...
nix>     gc-A> ++ sleep 2
nix>     gc-B> ++ echo 200
nix>     gc-B> ++ mv /build/nix-test/gc-auto/fake-free.tmp2 /build/nix-test/gc-auto/fake-free
nix>     gc-B> ++ echo 2...
nix>     gc-B> 2...
nix>     gc-B> ++ sleep 2
nix>     running auto-GC to free 1800 bytes
nix>     finding garbage collector roots...
nix>     gc-A> ++ echo 3...
nix>     gc-A> 3...
nix>     gc-A> ++ sleep 2
nix>     running auto-GC to free 1800 bytes
nix>     waiting for the big garbage collector lock...
nix>     gc-B> ++ echo 3...
nix>     gc-B> 3...
nix>     gc-B> ++ sleep 2
nix>     removing stale temporary roots file '/build/nix-test/gc-auto/var/nix/temproots/4801'
nix>     removing stale temporary roots file '/build/nix-test/gc-auto/var/nix/temproots/4652'
nix>     removing stale temporary roots file '/build/nix-test/gc-auto/var/nix/temproots/4498'
nix>     deleting garbage...
nix>     deleting '/build/nix-test/gc-auto/store/3z0j1ivcfvixy42ify5m7qi2p9pay7g0-garbage1'
nix>     deleted or invalidated more than 1800 bytes; stopping
nix>     deleting '/build/nix-test/gc-auto/store/trash'
nix>     finding garbage collector roots...
nix>     deleting unused links...
nix>     note: currently hard linking saves 0.00 MiB
nix>     gc-A> ++ echo 4...
nix>     gc-A> 4...
nix>     gc-A> +++ wc -l
nix>     gc-A> +++ ls /build/nix-test/gc-auto/store/0l5n06018x7iwf1hajbm3kallfhgl5jj-garbage2 /build/nix-test/gc-auto/store/b2b8z3iq42z7z78vvzq4d7pi8pd4mlqp-garbage3
nix>     gc-A> ++ [[ 2 = 1 ]]
nix>     builder for '/build/nix-test/gc-auto/store/1m79amxhs79rb01wiv6n0m12cc38ii1a-gc-A.drv' failed with exit code 1; last 10 log lines:
nix>       2...
nix>       ++ sleep 2
nix>       ++ echo 3...
nix>       3...
nix>       ++ sleep 2
nix>       ++ echo 4...
nix>       4...
nix>       +++ wc -l
nix>       +++ ls /build/nix-test/gc-auto/store/0l5n06018x7iwf1hajbm3kallfhgl5jj-garbage2 /build/nix-test/gc-auto/store/b2b8z3iq42z7z78vvzq4d7pi8pd4mlqp-garbage3
nix>       ++ [[ 2 = 1 ]]
nix>     error: build of '/build/nix-test/gc-auto/store/1m79amxhs79rb01wiv6n0m12cc38ii1a-gc-A.drv' failed
nix>     gc-B> ++ echo 4...
nix>     gc-B> 4...
nix>     + wait 5087
nix> make: *** [mk/lib.mk:128: tests/gc-auto.sh.test] Error 100
nix> make: *** Waiting for unfinished jobs....
nix> ran test tests/nix-shell.sh... [PASS]
nix> ran test tests/linux-sandbox.sh... [PASS]
nix> ran test tests/timeout.sh... [PASS]
nix> ran test tests/brotli.sh... [PASS]
nix> ran test tests/nix-build.sh... [PASS]
nix> ran test tests/run.sh... [PASS]
nix> ran test tests/function-trace.sh... [PASS]
nix> ran test tests/pure-eval.sh... [PASS]
nix> ran test tests/build-dry.sh... [PASS]
nix> ran test tests/referrers.sh... [PASS]
nix> ran test tests/multiple-outputs.sh... [PASS]
nix> ran test tests/nar-access.sh... [PASS]
nix> ran test tests/fixed.sh... [PASS]
nix> ran test tests/search.sh... [PASS]
nix> ran test tests/nix-copy-ssh.sh... [PASS]
nix> ran test tests/restricted.sh... [PASS]
nix> ran test tests/post-hook.sh... [PASS]
nix> ran test tests/hash.sh... [PASS]
nix> ran test tests/check.sh... [PASS]
nix> ran test tests/signing.sh... [PASS]
nix> ran test tests/gc.sh... [PASS]
nix> ran test tests/binary-cache.sh... [PASS]
nix> ran test tests/user-envs.sh... [PASS]
nix> ran test tests/gc-concurrent.sh... [PASS]
nix> ran test tests/remote-store.sh... [PASS]
nix> ran test tests/lang.sh... [PASS]
error: build of '/nix/store/n5yzfjlijbp9vgkkd499kb5yjmga13zz-nix-2.3.17.drv' on 'ssh://root@192.168.22.105' failed: builder for '/nix/store/n5yzfjlijbp9vgkkd499kb5yjmga13zz-nix-2.3.17.drv' failed with exit code 2
error: builder for '/nix/store/n5yzfjlijbp9vgkkd499kb5yjmga13zz-nix-2.3.17.drv' failed with exit code 1;
       last 10 log lines:
       > ran test tests/post-hook.sh... [PASS]
       > ran test tests/hash.sh... [PASS]
       > ran test tests/check.sh... [PASS]
       > ran test tests/signing.sh... [PASS]
       > ran test tests/gc.sh... [PASS]
       > ran test tests/binary-cache.sh... [PASS]
       > ran test tests/user-envs.sh... [PASS]
       > ran test tests/gc-concurrent.sh... [PASS]
       > ran test tests/remote-store.sh... [PASS]
       > ran test tests/lang.sh... [PASS]
       For full logs, run 'nix log /nix/store/n5yzfjlijbp9vgkkd499kb5yjmga13zz-nix-2.3.17.drv'.
error: 1 dependencies of derivation '/nix/store/kkssy59chzdjkkgshaz4lsrd38qza17h-nix-prefetch-git.drv' failed to build
error: 1 dependencies of derivation '/nix/store/i9a30zfgazjmz9psvk782xph23dxbq5r-crate2nix-0.11.0.drv' failed to build
error: 1 dependencies of derivation '/nix/store/r7cv556yb6ab2y83a9bh1g1x1qz82pff-crate2nix-generate.drv' failed to build

@ghost
Copy link
Author

ghost commented Dec 6, 2023

Note: I do not think this is a concurrency bug in Nix -- I think it's likely a concurrency bug in the test apparatus.

I'm fine with passing along this info to upstream, but in general I think that having our tests be deterministic is more important than helping upstream troubleshoot concurrency bugs in their test suite. If there were any sign that this was a concurrency bug in Nix itself (there is not) that would be a different story.

If a test is specifically designed to test concurrency (like the libuv test suite) it won't need enableParallelChecking and will spawn concurrent processes even with only one core available.

Frankly our enableParallelChecking flag is an opportunistic (and impure!) speed boost that should take a back seat to predictability. It's great when it works, but when it doesn't work we should just turn it off and not waste time fighting with it.

@delroth delroth added the 12.approvals: 1 This PR was reviewed and approved by one reputable person label Dec 6, 2023
@SuperSandro2000
Copy link
Member

I assume the time difference is acceptable and not way to the moon.

@delroth delroth added 12.approvals: 2 This PR was reviewed and approved by two reputable people and removed 12.approvals: 1 This PR was reviewed and approved by one reputable person labels Dec 6, 2023
@flokli
Copy link
Contributor

flokli commented Dec 6, 2023

I feel we should still open an issue upstream.

@roberth
Copy link
Member

roberth commented Dec 6, 2023

Thanks for the ping!

Note: I do not think this is a concurrency bug in Nix -- I think it's likely a concurrency bug in the test apparatus.

We use distinct local stores in each test (tests/*.sh file). You might tell somewhat from the store URIs, which include the test name.
We don't have concurrency in the test scripts themselves, except for intentional concurrency which will not be disabled by this flag.

You may still merge this if you want to test my theory. If it succeeds consistently, then that might suggest the existence of some shared state not covered by --store, but I find this quite unlikely. Things like the gc lock and gc socket location are also tied to those unique stores via NIX_STATE_DIR.

open an issue upstream.

Please do.

@flokli
Copy link
Contributor

flokli commented Dec 10, 2023

@amjoseph-nixpkgs poke - were you able to see the test flakyness with all Nix versions, or only 2.3?

IIRC, there's been some stability improvements in the test suite, and some got backported to 2.3.17 (by @Ericson2314).

Would help to know if this still applies to the master branch, or if 2.3 is only missing some stability backports (and in that case, enableParallelChecking should only be set for 2.3).

@ghost ghost closed this Jan 23, 2024
@ghost ghost deleted the noParallelChecking branch January 23, 2024 06:46
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants