-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unreliable SIGINT delivery #17706
Comments
👍 I've noticed a few other issues with interrupting/signals, maybe related, maybe not. For example:
It would be nice if we could interrupt |
Also maybe related: #17626 |
The segfault is expected and is how the signal is delivered. The |
And the unreliable-ness is due to multithread support. |
But why does the process exit? I'm pretty sure it didn't used to do that, even with force throw. |
So in summary, #12904 makes the signal handling unreliable for
should work) but doesn't fix the
It's likely because of missing |
Does the stack trace I posted above help? It looks like it throws from the |
Does this help? diff --git a/base/client.jl b/base/client.jl
index 19cc480..04bf114 100644
--- a/base/client.jl
+++ b/base/client.jl
@@ -314,7 +314,9 @@ function _start()
empty!(ARGS)
append!(ARGS, Core.ARGS)
opts = JLOptions()
+ ccall(:jl_sigatomic_begin, Void, ())
try
+ ccall(:jl_sigatomic_end, Void, ())
(quiet,repl,startup,color_set,history_file) = process_options(opts)
local term
@@ -364,6 +366,7 @@ function _start()
display_error(err,catch_backtrace())
exit(1)
end
+ ccall(:jl_sigatomic_end, Void, ())
if is_interactive && have_color
print(color_normal)
end |
Although it seems that you've got a throw from another inner |
OTOH, making this transformation automatically is possible and will make all |
In another word, if you can get the backtrace without the last |
Here are stack traces from every
It looks like the mutex_unlock from codegen causes re-throwing InterruptException every time? |
Ok, so it seems that It throws in |
FWIW, Jeff's issue is actually not new. The REPL code has the same issue on 0.4 too. Try the following code (warning, you might need to force kill julia on 0.4) for i in 1:100
try
sleep(0.1)
end
end; while true
end It is only more noticeable on 0.5 since there are slightly more cases (to be fair, a dead loop without a safepoint is quite rare) now that need to be force exit (and there's force exit at all). Let me try to improve the user experience just in this case without fixing all uses in the REPL code in the physicist way. (I'm not 100% sure I understand the interleaving of the REPL frontend and backend to know where to put all try-catch/sigatomic....) |
* Add a SIGINT dead time after a force throw Fixes Jeff's issue in #17706 * Make the eval and print loop sigatomic to avoid sigint being delivered outside `try`-`catch` blocks.
I think this has been breaking PackageEvaluator's ability to interrupt freezing jobs with |
OK, so the signal-handling SIGSEGV is innocuous. However, when letting it be handled by our handler (which ASAN supports through
... but sanitizing libunwind yields infinite "stack corruption detected" errors. I'll have a closer look and spin it off in another issue if I find something tangible. |
If you've compiled everything with asan I won't be surprised if it overflows the signal stack. Segfault in libunwind also happens from time to time which is why we catch it and abort the backtrace collection. |
What do you mean? |
PackageEvaluator has frozen multiple times in the past few days because of a failure to download something, where it's supposed to be able to interrupt the job via |
What download method it is using? If it is calling C library and somehow have |
shelling out to curl |
@yuyichao good guess! Increasing the signal stack size resolves the stack corruption when building libunwind with ASAN, but also resolves the nested segfault with a regular ASAN build (ie. sanitized julia & LLVM). |
* Add a SIGINT dead time after a force throw Fixes Jeff's issue in #17706 * Make the eval and print loop sigatomic to avoid sigint being delivered outside `try`-`catch` blocks.
Use it to make sure that `jl_rethrow` and `jl_exit` are running on the right thread and right stack when an exception/exit is caused by a signal. Fix #17706
Use it to make sure that `jl_rethrow` and `jl_exit` are running on the right thread and right stack when an exception/exit is caused by a signal. Fix #17706
Use it to make sure that `jl_rethrow` and `jl_exit` are running on the right thread and right stack when an exception/exit is caused by a signal. Fix #17706
Avoids segfault in libunwind (see #17706).
SIGSEGV can be benign, but ASAN will die so we need to be able to handle the signal ourselves (see #17706).
Use it to make sure that `jl_rethrow` and `jl_exit` are running on the right thread and right stack when an exception/exit is caused by a signal. Fix #17706
I believe |
This is again a different issue and expected (as in, it is not allowed to interrupt compilation with |
Completely locking your terminal if you realize you'd like to interrupt bootstrap is really annoying behavior for devs/contributors. This seems like a recent regression, do any of your open PR's fix it? |
No |
SIGSEGV can be benign, but ASAN will die so we need to be able to handle the signal ourselves (see #17706).
Have never needed that before. Why is signal handling being changed during a feature freeze? |
When did it change? |
SIGSEGV can be benign, but ASAN will die so we need to be able to handle the signal ourselves (see #17706).
Bisecting now. Apparently before the formal feature freeze, but not by much. |
Looks like this is somewhat sensitive to exactly where you happen to be in bootstrap when you hit ctrl-C, and it may have had the freezing issue for longer than I thought. |
SIGSEGV can be benign, but ASAN will die so we need to be able to handle the signal ourselves (see #17706).
Use it to make sure that `jl_rethrow` and `jl_exit` are running on the right thread and right stack when an exception/exit is caused by a signal. Fix #17706
Use it to make sure that `jl_rethrow` and `jl_exit` are running on the right thread and right stack when an exception/exit is caused by a signal. Fix #17706
Use it to make sure that `jl_rethrow` and `jl_exit` are running on the right thread and right stack when an exception/exit is caused by a signal. Fix #17706
* Add a SIGINT dead time after a force throw Fixes Jeff's issue in JuliaLang#17706 * Make the eval and print loop sigatomic to avoid sigint being delivered outside `try`-`catch` blocks.
SIGSEGV can be benign, but ASAN will die so we need to be able to handle the signal ourselves (see JuliaLang#17706).
Use it to make sure that `jl_rethrow` and `jl_exit` are running on the right thread and right stack when an exception/exit is caused by a signal. Fix JuliaLang#17706
Calling
kill(self, 2)
in 0.5 is unreliable:Running this snippet results in different possible errors:
None of them properly delivering the SIGINT.
Meanwhile, on 0.4:
This is a reduced test-case from
core.jl
, which in its current form it only fails when running under ASAN:I'm not sure what's happening, but there seems to be an issue with the SIGINT delivery. If I transform the above test-case, ASAN sometimes spits out more information. In those cases, it traps a use-after-free on a known address, with those addresses always being freed by some trace starting at
__cxa_finalize
. So it looks like the SIGUSR2 thrown byjl_try_deliver_sigint
causes the world to collapse?cc @yuyichao
The text was updated successfully, but these errors were encountered: