Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore SEGV during profiler unwind on Unix #28291

Merged
merged 3 commits into from
Jul 30, 2018
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 28 additions & 3 deletions src/signals-unix.c
Original file line number Diff line number Diff line change
Expand Up @@ -225,6 +225,18 @@ static void segv_handler(int sig, siginfo_t *info, void *context)
jl_ptls_t ptls = jl_get_ptls_states();
assert(sig == SIGSEGV || sig == SIGBUS);

// if we're profiling, this segfault is likely caused by the unwinder.
// ignore the signal and jump back to where we came from.
if (running && ptls->safe_restore) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need this? Does it not work with the condition below?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just being conservative, only affecting the case where the profiler is running. Do it unconditionally then?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean you shouldn't need any code here in the segfault handler. Have you tested that it doesn't work without this but with the condition a few lines below?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Or in another word, it is meant to be doing this unconditionally)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, no it doesn't work, looking closer it triggers a segfault in jl_call_in_ctx (via jl_throw_in_ctx..., jl_stackovf_exception, ...)).
Inferring from the function names, doesn't that behave differently from the plain longjmp I do here, would I need to catch an exception then somehow?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, segfault in jl_call_in_ctx? Did you get a NULL context?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I guess this thread doesn't have signal_stack allocated. I believe this should fix it (you can merge this with the falllback ifdef below if you want).

diff --git a/src/signals-unix.c b/src/signals-unix.c
index 0fafe121cd..8da89b5fc4 100644
--- a/src/signals-unix.c
+++ b/src/signals-unix.c
@@ -89,6 +89,14 @@ static void jl_call_in_ctx(jl_ptls_t ptls, void (*fptr)(void), int sig, void *_c
     // checks that the syscall is made in the signal handler and that
     // the ucontext address is valid. Hopefully the value of the ucontext
     // will not be part of the validation...
+    if (!ptls->signal_stack) {
+        sigset_t sset;
+        sigemptyset(&sset);
+        sigaddset(&sset, sig);
+        sigprocmask(SIG_UNBLOCK, &sset, NULL);
+        fptr();
+        return;
+    }
     uintptr_t rsp = (uintptr_t)ptls->signal_stack + sig_stack_size;
     assert(rsp % 16 == 0);
 #if defined(_OS_LINUX_) && defined(_CPU_X86_64_)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that works. Thanks!
Why isn't this used for OSX btw? Mimicking profiler_segv_handler which does thread_set_state is what got me here in the first place.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't handle segfault the same way on OSX. I don't really know if the two ways could be used together.

// unblock the signal being handled
sigset_t sset;
sigemptyset(&sset);
sigaddset(&sset, sig);
sigprocmask(SIG_UNBLOCK, &sset, NULL);

jl_longjmp(*ptls->safe_restore, 1);
}

if (jl_addr_is_safepoint((uintptr_t)info->si_addr)) {
#ifdef JULIA_ENABLE_THREADING
jl_set_gc_and_wait();
Expand Down Expand Up @@ -667,9 +679,22 @@ static void *signal_listener(void *arg)
// do backtrace for profiler
if (profile && running) {
if (bt_size_cur < bt_size_max - 1) {
// Get backtrace data
bt_size_cur += rec_backtrace_ctx((uintptr_t*)bt_data_prof + bt_size_cur,
bt_size_max - bt_size_cur - 1, signal_context);
// unwinding can fail, so keep track of the current state
// and restore from the SEGV handler if anything happens.
jl_ptls_t ptls = jl_get_ptls_states();
jl_jmp_buf *old_buf = ptls->safe_restore;
jl_jmp_buf buf;

ptls->safe_restore = &buf;
if (jl_setjmp(buf, 0)) {
jl_safe_printf("WARNING: profiler attempt to access an invalid memory location\n");
} else {
// Get backtrace data
bt_size_cur += rec_backtrace_ctx((uintptr_t*)bt_data_prof + bt_size_cur,
bt_size_max - bt_size_cur - 1, signal_context);
}
ptls->safe_restore = old_buf;

// Mark the end of this block with 0
bt_data_prof[bt_size_cur++] = 0;
}
Expand Down