-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Abort on stack overflow instead of re-raising SIGSEGV #31333
Conversation
(rust_highfive has picked a reviewer for you, use r? to override) |
I've only tested this on Mac OS X, but I'm pretty sure that everything it does is kosher according to POSIX, and everything that's Unix-specific has been protected with |
|
||
let mut buf = libc::mmap(0 as *mut libc::c_void, | ||
pagesize as libc::size_t, | ||
libc::PROT_WRITE | libc::PROT_READ, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps just do not pass in PROT_WRITE
here to avoid the potent race condition where pointer could become valid between the unmap and dereference?
Not that it matters much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I tried that, but then I was getting SIGBUS
instead of SIGSEGV
like with mprotect
. I figured that just unmapping would be a better way to get a guaranteed SIGSEGV
. I don't know of any circumstances in which the kernel would re-map that memory for us without anything going on on our process to cause it to do so, so I don't think we need to worry about such a race here; also, POSIX requires that after calling munmap
, further references to those pages will produce SIGSEGV
.
49cb4e1
to
bba4272
Compare
|
||
// See comment above for why this function returns. | ||
libc::raise(libc::SIGABRT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To behave more like rtabort!
this may wish to use intrinsics::abort()
as it mimics other runtime-abort behavior and means we don't need to mess around with signals or anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but intrinsics::abort()
does so via an illegal instruction, giving us SIGILL
, which doesn't seem ideal if we're trying to clear up confusion caused by the particular signal received.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah I was just going off @brson's desire that stack overflow "should abort like any other fatal error". Our other fatal errors today use rtabort!
which ends up translating to intrinsics::abort
.
It would indeed do so via an illegal instruction, resulting in SIGILL
, and likely resulting in a core dump as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make more sense to change rtabort!
to call libc::abort
instead of intrinsics::abort
, now that we have libc::abort
? libc::abort
is supposed to handle all of the edge cases like this, either unregistering the signal handler or killing itself another way if the signal handler returns or just infinitely looping if all else fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be down with that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Er, to be clear, on Unix that seems fine but on Windows we're trying to avoid the CRT wherever possible, so in that sense it may be a platform-specific abort process. I guess it makes less sense in that case :(
r? @brson You seem more opinionated than I, but both the current strategy an the proposed strategy are fine by me! |
bba4272
to
563904e
Compare
My preference is for all aborts to be the same, and would prefer this to be an |
This is a breaking change, but probably one nobody is relying on. |
563904e
to
557313b
Compare
Switched to |
557313b
to
f608abb
Compare
use std::os::unix::process::ExitStatusExt; | ||
|
||
assert!(!status.success()); | ||
assert!(status.signal() != Some(libc::SIGSEGV)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably this is also true on Windows?
@bors r+ |
📌 Commit f608abb has been approved by |
⌛ Testing commit f608abb with merge 8a0874b... |
💔 Test failed - auto-mac-32-opt |
We use guard pages that cause the process to abort to protect against undefined behavior in the event of stack overflow. We have a handler that catches segfaults, prints out an error message if the segfault was due to a stack overflow, then unregisters itself and returns to allow the signal to be re-raised and kill the process. This caused some confusion, as it was unexpected that safe code would be able to cause a segfault, while it's easy to overflow the stack in safe code. To avoid this confusion, when we detect a segfault in the guard page, abort instead of the previous behavior of re-raising the SIGSEGV. To test this, we need to adapt the tests for segfault to actually check the exit status. Doing so revealed that the existing test for segfault behavior was actually invalid; LLVM optimizes the explicit null pointer reference down to an illegal instruction, so the program aborts with SIGILL instead of SIGSEGV and the test didn't actually trigger the signal handler at all. Use a C helper function to get a null pointer that LLVM can't optimize away, so we get our segfault instead. This is a [breaking-change] if anyone is relying on the exact signal raised to kill a process on stack overflow. Closes rust-lang#31273
f608abb
to
ee79bfa
Compare
Whoops, some architectures throw |
Abort on stack overflow instead of re-raising SIGSEGV We use guard pages that cause the process to abort to protect against undefined behavior in the event of stack overflow. We have a handler that catches segfaults, prints out an error message if the segfault was due to a stack overflow, then unregisters itself and returns to allow the signal to be re-raised and kill the process. This caused some confusion, as it was unexpected that safe code would be able to cause a segfault, while it's easy to overflow the stack in safe code. To avoid this confusion, when we detect a segfault in the guard page, abort instead of the previous behavior of re-raising SIGSEGV. To test this, we need to adapt the tests for segfault to actually check the exit status. Doing so revealed that the existing test for segfault behavior was actually invalid; LLVM optimizes the explicit null pointer reference down to an illegal instruction, so the program aborts with SIGILL instead of SIGSEGV and the test didn't actually trigger the signal handler at all. Use a C helper function to get a null pointer that LLVM can't optimize away, so we get our segfault instead. This is a [breaking-change] if anyone is relying on the exact signal raised to kill a process on stack overflow. Closes #31273
💔 Test failed - auto-win-msvc-32-opt |
Failure looks spurious. |
@bors retry |
cc @rust-lang/lang In some ways this is a pretty significant change to how processes terminate on stack overflow. It makes stack overflow terminate the process with |
Abort on stack overflow instead of re-raising SIGSEGV We use guard pages that cause the process to abort to protect against undefined behavior in the event of stack overflow. We have a handler that catches segfaults, prints out an error message if the segfault was due to a stack overflow, then unregisters itself and returns to allow the signal to be re-raised and kill the process. This caused some confusion, as it was unexpected that safe code would be able to cause a segfault, while it's easy to overflow the stack in safe code. To avoid this confusion, when we detect a segfault in the guard page, abort instead of the previous behavior of re-raising SIGSEGV. To test this, we need to adapt the tests for segfault to actually check the exit status. Doing so revealed that the existing test for segfault behavior was actually invalid; LLVM optimizes the explicit null pointer reference down to an illegal instruction, so the program aborts with SIGILL instead of SIGSEGV and the test didn't actually trigger the signal handler at all. Use a C helper function to get a null pointer that LLVM can't optimize away, so we get our segfault instead. This is a [breaking-change] if anyone is relying on the exact signal raised to kill a process on stack overflow. Closes #31273
intrinsics::abort compiles down to an illegal instruction, which on Unix-like platforms causes the process to be killed with SIGILL. A more appropriate way to kill the process would be SIGABRT; this indicates better that the runtime has explicitly aborted, rather than some kind of compiler bug or architecture mismatch that SIGILL might indicate. For rtassert!, replace this with libc::abort. libc::abort raises SIGABRT, but is defined to do so in such a way that it will terminate the process even if SIGABRT is currently masked or caught by a signal handler that returns. On non-Unix platforms, retain the existing behavior. On Windows we prefer to avoid depending on the C runtime, and we need a fallback for any other platforms that may be defined. An alternative on Windows would be to call TerminateProcess, but this seems less essential than switching to using SIGABRT on Unix-like platforms, where it is common for the process-killing signal to be printed out or logged. This is a [breaking-change] for any code that depends on the exact signal raised to abort a process via rtabort! cc rust-lang#31273 cc rust-lang#31333
intrinsics::abort compiles down to an illegal instruction, which on Unix-like platforms causes the process to be killed with SIGILL. A more appropriate way to kill the process would be SIGABRT; this indicates better that the runtime has explicitly aborted, rather than some kind of compiler bug or architecture mismatch that SIGILL might indicate. For rtassert!, replace this with libc::abort. libc::abort raises SIGABRT, but is defined to do so in such a way that it will terminate the process even if SIGABRT is currently masked or caught by a signal handler that returns. On non-Unix platforms, retain the existing behavior. On Windows we prefer to avoid depending on the C runtime, and we need a fallback for any other platforms that may be defined. An alternative on Windows would be to call TerminateProcess, but this seems less essential than switching to using SIGABRT on Unix-like platforms, where it is common for the process-killing signal to be printed out or logged. This is a [breaking-change] for any code that depends on the exact signal raised to abort a process via rtabort! cc rust-lang#31273 cc rust-lang#31333
Use libc::abort, not intrinsics::abort, in rtabort! intrinsics::abort compiles down to an illegal instruction, which on Unix-like platforms causes the process to be killed with SIGILL. A more appropriate way to kill the process would be SIGABRT; this indicates better that the runtime has explicitly aborted, rather than some kind of compiler bug or architecture mismatch that SIGILL might indicate. For rtassert!, replace this with libc::abort. libc::abort raises SIGABRT, but is defined to do so in such a way that it will terminate the process even if SIGABRT is currently masked or caught by a signal handler that returns. On non-Unix platforms, retain the existing behavior. On Windows we prefer to avoid depending on the C runtime, and we need a fallback for any other platforms that may be defined. An alternative on Windows would be to call TerminateProcess, but this seems less essential than switching to using SIGABRT on Unix-like platforms, where it is common for the process-killing signal to be printed out or logged. This is a [breaking-change] for any code that depends on the exact signal raised to abort a process via rtabort! cc #31273 cc #31333
Abort on stack overflow instead of re-raising SIGSEGV
We use guard pages that cause the process to abort to protect against
undefined behavior in the event of stack overflow. We have a handler
that catches segfaults, prints out an error message if the segfault was
due to a stack overflow, then unregisters itself and returns to allow
the signal to be re-raised and kill the process.
This caused some confusion, as it was unexpected that safe code would be
able to cause a segfault, while it's easy to overflow the stack in safe
code. To avoid this confusion, when we detect a segfault in the guard
page, abort instead of the previous behavior of re-raising SIGSEGV.
To test this, we need to adapt the tests for segfault to actually check
the exit status. Doing so revealed that the existing test for segfault
behavior was actually invalid; LLVM optimizes the explicit null pointer
reference down to an illegal instruction, so the program aborts with
SIGILL instead of SIGSEGV and the test didn't actually trigger the
signal handler at all. Use a C helper function to get a null pointer
that LLVM can't optimize away, so we get our segfault instead.
This is a [breaking-change] if anyone is relying on the exact signal
raised to kill a process on stack overflow.
Closes #31273