Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[0.7/1.0] Failed to catch SIGINT during stress.jl test #28580

Open
cdluminate opened this issue Aug 11, 2018 · 4 comments
Open

[0.7/1.0] Failed to catch SIGINT during stress.jl test #28580

cdluminate opened this issue Aug 11, 2018 · 4 comments

Comments

@cdluminate
Copy link
Contributor

cdluminate commented Aug 11, 2018

http://debomatic-amd64.debian.net/distribution#unstable/julia/0.7.0-2/autopkgtest

Executing tests that run on node 1 only:
precompile                         (1) |    31.06 |   0.20 |  0.7 |     471.27 |   260.21
SharedArrays                       (1) |    24.59 |   1.14 |  4.6 |    1834.89 |   287.69
stress: Test Failed at /usr/share/julia/test/stress.jl:98
  Expression: begin
    ccall(:kill, Cvoid, (Cint, Cint), getpid(), 2)
    for i = 1:10
        Libc.systemsleep(0.5)
        ccall(:jl_gc_safepoint, Cvoid, ())
    end
end
    Expected: InterruptException
  No exception thrown
Stacktrace:
 [1] top-level scope at /usr/share/julia/test/stress.jl:98
 [2] include at ./boot.jl:317 [inlined]
 [3] include_relative(::Module, ::String) at ./loading.jl:1038
 [4] include at ./sysimg.jl:29 [inlined]
 [5] include(::String) at /usr/share/julia/test/testdefs.jl:13
 [6] macro expansion at /usr/share/julia/test/testdefs.jl:22 [inlined]
 [7] macro expansion at /build/julia-bF5BgK/julia-0.7.0/usr/share/julia/stdlib/v0.7/Test/src/Test.jl:1079 [inlined]
 [8] macro expansion at /usr/share/julia/test/testdefs.jl:21 [inlined]
 [9] macro expansion at ./util.jl:289 [inlined]
 [10] top-level scope at /usr/share/julia/test/testdefs.jl:19 [inlined]
 [11] top-level scope at ./none:0
Distributed                        (1) |   117.31 |   0.01 |  0.0 |      21.04 |   309.17
stress: Error During Test at none:1
  Test threw exception Some tests did not pass: 118 passed, 1 failed, 0 errored, 0 broken.
  Expression: stress
  Some tests did not pass: 118 passed, 1 failed, 0 errored, 0 broken.

Another build log:
https://buildd.debian.org/status/fetch.php?pkg=julia&arch=all&ver=0.7.0-1&stamp=1533838647&raw=0


Related issue #17706

@twhitehead
Copy link
Contributor

I've discovered there are several global state related errors in the test cases that happen due to multiple tests being run in the same worker. The most reliable way to see them is to run all the test serially in a single worker.

This happens automatically if you are building in a network less environment (such as the sandboxed build environment in NixOS). If this isn't your case, I expect you can force it in your test environment by changing net_on to false in test/runtests.jl (I haven't actually tested this myself)

cd(@__DIR__) do
    n = 1
    if net_on
        n = min(Sys.CPU_THREADS, length(tests))
        n > 1 && addprocs_with_testenv(n)
        LinearAlgebra.BLAS.set_num_threads(1)
    end

I believe this particular error is caused by that fact that the spawn test of running an invalid command breaks the stress test for receiving SIGINT. You can directly test that these two are incompatible by running them back-to-back in a test.jl file

using Test;

# spawn.jl test of running an invalid command
@test_throws Base.IOError run(`foo_is_not_a_valid_command`)

# stress.jl test for receiving a SIGINT
ccall(:jl_exit_on_sigint, Cvoid, (Cint,), 0)
@test_throws InterruptException begin
    ccall(:kill, Cvoid, (Cint, Cint,), getpid(), 2)
    for i in 1:10
        Libc.systemsleep(0.1)
        ccall(:jl_gc_safepoint, Cvoid, ()) # wait for SIGINT to arrive
    end
end
ccall(:jl_exit_on_sigint, Cvoid, (Cint,), 1)
julia test.jl
Test Failed at /build/source/test.jl:8
  Expression: begin
    ccall(:kill, Cvoid, (Cint, Cint), getpid(), 2)
    for i = 1:10
        Libc.systemsleep(0.1)
        ccall(:jl_gc_safepoint, Cvoid, ())
    end
end
    Expected: InterruptException
  No exception thrown
ERROR: LoadError: There was an error during testing
in expression starting at /build/source/test.jl:8

@twhitehead
Copy link
Contributor

Should add that I tried this using Julia 1.3.0 under both NixOS and gentoo.

CCing @JeffBezanson as well as you seem to be in on the rest of these issues that running the test serially in a single worker has revealed.

@twhitehead
Copy link
Contributor

Some more data points. Under 1.3.0 it seems the SIGINT test just periodically fails on its own as well. That is, if it put

using Test;

ccall(:jl_exit_on_sigint, Cvoid, (Cint,), 0)
@test_throws InterruptException begin
    ccall(:kill, Cvoid, (Cint, Cint,), getpid(), 2)
    for i in 1:10
        Libc.systemsleep(0.1)
        ccall(:jl_gc_safepoint, Cvoid, ()) # wait for SIGINT to arrive
    end
end
ccall(:jl_exit_on_sigint, Cvoid, (Cint,), 1)

in a test.jl file and then run it a thousands times

((n=0))
for ((i=0; i<1000; i++)); do
  julia test.jl || ((n++))
done
echo "Failures: $n"
Failures: 8

A friend tried 1.3.1 under arch linux and was not able to duplicate this, so it may be limited to a 1.3.0 issue.

The original test, where you proceed it by trying to run an invalid command fails around 90-99% of the time under both 1.3.0 and 1.3.1 though

using Test;

@test_throws Base.IOError run(`foo_is_not_a_valid_command`)

ccall(:jl_exit_on_sigint, Cvoid, (Cint,), 0)
@test_throws InterruptException begin
    ccall(:kill, Cvoid, (Cint, Cint,), getpid(), 2)
    for i in 1:10
        Libc.systemsleep(0.1)
        ccall(:jl_gc_safepoint, Cvoid, ()) # wait for SIGINT to arrive
    end
end
ccall(:jl_exit_on_sigint, Cvoid, (Cint,), 1)
((n=0))
for ((i=0; i<1000; i++)); do
  julia test.jl || ((n++))
done
echo "Failures: $n"
Failures: 905

Hopefully having this (mostly) reproducible case will make it easier to track down what is going on.

@vtjnash
Copy link
Member

vtjnash commented Mar 6, 2020

Should be fixed by #32599 on master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants