Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pthread_exit no longer works in panic=abort #101469

Closed
programmerjake opened this issue Sep 6, 2022 · 30 comments · Fixed by #104070
Closed

pthread_exit no longer works in panic=abort #101469

programmerjake opened this issue Sep 6, 2022 · 30 comments · Fixed by #104070
Assignees
Labels
C-bug Category: This is a bug. F-c_unwind `#![feature(c_unwind)]` WG-ffi-unwind Working group: FFI unwind

Comments

@programmerjake
Copy link
Member

programmerjake commented Sep 6, 2022

I tried this code:
https://rust.godbolt.org/z/M5Pz1rjnd

#![feature(c_unwind)]
#![no_main]
use std::ffi::{c_void, c_int, c_char};
use std::ptr::null_mut;

extern "C-unwind" {
    fn pthread_exit(v: *mut c_void) -> !;
}

#[no_mangle]
pub unsafe extern "C-unwind" fn main(_argc: c_int, _argv: *mut *mut c_char) {
    unsafe {pthread_exit(null_mut())}
}

I expected to see this happen: exits with exit code 0 as documented for pthread_exit

Instead, this happened: aborts with double panic

Meta

rustc --version --verbose:

rustc 1.65.0-nightly (8c6ce6b91 2022-09-02)
binary: rustc
commit-hash: 8c6ce6b91b172f77c795a74bfeaf74b865146b3f
commit-date: 2022-09-02
host: x86_64-unknown-linux-gnu
release: 1.65.0-nightly
LLVM version: 15.0.0

sorry, i can't figure out how to get a backtrace from godbolt.org, it doesn't seem to allow setting environment variables.

This bug serms to be caused by #96959

@rust-lang/wg-ffi-unwind

@programmerjake programmerjake added the C-bug Category: This is a bug. label Sep 6, 2022
@programmerjake
Copy link
Member Author

i expect this to work because pthread_exit is supposed to do a force-unwind, which RFC 2945 (specifically referenced by #96959) states:

panic=abort will have no impact on the behavior of forced unwinding.

@nbdd0121
Copy link
Contributor

nbdd0121 commented Sep 6, 2022

The RFC defines force unwind as undefined behaviour when it cross non-POF, which your example does.

@programmerjake
Copy link
Member Author

The RFC defines force unwind as undefined behaviour when it cross non-POF, which your example does.

show me the live variables with a Drop impl or the catch_unwind-style call...those are the two conditions in the RFC for a frame to not be a plain-old-frame. The code I wrote has neither of those, so should be a plain-old-frame.

@nbdd0121
Copy link
Contributor

nbdd0121 commented Sep 6, 2022

The RFC says:

A "POF", or "Plain Old Frame", is defined as a frame that can be trivially deallocated: returning from or unwinding a POF cannot cause any observable effects.

Unwinding through "C-unwind" in -Cpanic=abort is defined to cause an abort according to my reading of the RFC (not saying that it's correct, but it's my reading of the RFC), so the frame is not trivially deallocated and thus not POF.

@bjorn3
Copy link
Member

bjorn3 commented Sep 6, 2022

As I understand it a POF is any frame without pending destructors or catch_unwind calls. Neither is the case in this example.

@BatmanAoD
Copy link
Member

@nbdd0121 That would make every use of "C-unwind" a non-POF. I'm pretty sure that wasn't the intent... but you're right that the verbiage implies the opposite.

@BatmanAoD
Copy link
Member

This paragraph below makes explicit that "C-unwind" frames can be POFs:

This RFC specifies that, regardless of the platform or the ABI string ("C" or "C-unwind"), any platform features that may rely on forced unwinding will always be considered undefined behavior if they cross non-POFs. Crossing only POFs is necessary but not sufficient, however, to make forced unwinding safe, and for now we do not specify any safe form of forced unwinding; we will specify this in a future RFC.

@programmerjake
Copy link
Member Author

The RFC says:

A "POF", or "Plain Old Frame", is defined as a frame that can be trivially deallocated: returning from or unwinding a POF cannot cause any observable effects.

Unwinding through "C-unwind" in -Cpanic=abort is defined to cause an abort according to my reading of the RFC (not saying that it's correct, but it's my reading of the RFC), so the frame is not trivially deallocated and thus not POF.

it's defined to cause an abort only for non-forced unwinds, forced unwinds are specifically specified to work despite panic=abort:
https://github.com/rust-lang/rfcs/blob/master/text/2945-c-unwind-abi.md#interaction-with-panicabort

If a non-forced foreign unwind would enter a Rust frame via an extern "C-unwind" ABI boundary, but the Rust code is compiled with panic=abort, the
unwind will be caught and the process aborted.
...
panic=abort will have no impact on the behavior of forced unwinding.

@BatmanAoD
Copy link
Member

BatmanAoD commented Sep 6, 2022

I think @nbdd0121 is right that the definition of POF is somewhat confusing here. But certainly this should be considered a POF. "POF-ness" should never depend on the panic mode or the ABI string. These are worth calling out in the Reference, or possibly in a future RFC, presumably one where we define how longjmp works. I've posted about this before on the Inside Rust blog, complete with an alternate definition of POFs:

These are frames that can be trivially deallocated, i.e., they do no "cleanup" (such as running Drop destructors) before returning.

@nbdd0121
Copy link
Contributor

nbdd0121 commented Sep 8, 2022

I discussed this yesterday with @bjorn3 about this. The problem is that, for a nounwind function that calls a "C-unwind"` function, with otherwise no destructors, will have to:

  • Abort for an unforced unwind (Rust panic or foreign exception);
  • Do nothing for a forced unwind.

In order to satisfy the first requirement, we have to insert landing pads to "C-unwind" call-sites to abort the program in case it unwinds. It should be noted that LLVM don't have any mechanisms for generating custom LSDA data, and all exception handling info emitted by LLVM follows the GCC except table format. So we don't have a way to annotate a call-site to let the personality function know that these landing pads should be ignored for forced unwind.

The most reasonable way would be to tweak the personality function so that all landing pads are ignored for a forced unwind. This means that the destructors will not be ran at all for a forced unwind, but that should be okay as we define unwinding across non-POF frames as undefined behaviour. This method however is not very portable because there are platforms that we don't use our own personality function, such as EMCC and MSVC.

For MSVC we have #48572 which should avoid the aborting landing pad (catchpad) from being called. I am not sure about EMCC though (is forced unwind a thing for emscripten?)


Overall, I think allowing forced unwind through a nounwind function is quite problematic on platforms that does not require unwind table to always exist. With -C force-unwind-tables=no and -C force-frame-pointers=no, how would the unwind runtime be able to correctly unwind through a nounwind function?

@BatmanAoD
Copy link
Member

The most reasonable way would be to tweak the personality function so that all landing pads are ignored for a forced unwind. This means that the destructors will not be ran at all for a forced unwind, but that should be okay as we define unwinding across non-POF frames as undefined behaviour.

Yes, and that's exactly why we defined it that way! 😁

This method however is not very portable because there are platforms that we don't use our own personality function, such as EMCC and MSVC.

...even after #92845? Dang.

With -C force-unwind-tables=no and -C force-frame-pointers=no, how would the unwind runtime be able to correctly unwind through a nounwind function?

Maybe I'm not understanding the question, but forced unwind is, effectively, exactly like longjmp. As long as the stack frames are deallocated, the unwinding is "correct".

@Amanieu @nbdd0121 @bjorn3

@bjorn3
Copy link
Member

bjorn3 commented Sep 8, 2022

Maybe I'm not understanding the question, but forced unwind is, effectively, exactly like longjmp. As long as the stack frames are deallocated, the unwinding is "correct".

With longjmp you know the exact place to jump to because it is passed in as argument. With forced unwind, you unwind the stack to find the right location and depending in the platform and language destructors will still run. As for windows where longjmp is implemented using forced unwind, I believe unwind tables are mandatory.

@bjorn3 bjorn3 added F-c_unwind `#![feature(c_unwind)]` WG-ffi-unwind Working group: FFI unwind labels Sep 8, 2022
@BatmanAoD
Copy link
Member

@bjorn3 I'm not sure that's correct. Forced unwinds can't be "caught" (by definition), so either the "exact place to jump to" is the thread boundary (in the case of pthread_cancel et al.) or the unwind is in fact a longjmp (MSVC).

...depending in the platform and language destructors will still run.

Regardless of the language, running destructors for a forced unwind is UB. This is why we require frames that can be force-unwound to be POFs.

@bjorn3
Copy link
Member

bjorn3 commented Sep 8, 2022

Forced unwinds can't be "caught" (by definition)

AFAIK the personality function can catch forced unwinds. You just aren't supposed to do this in user code. I think libc does catch it to end unwinding at the end of the thread stack, but I haven't checked this. To find this catch at the end you need to unwind the stack using unwind tables.

@BatmanAoD
Copy link
Member

Let me rewind back to @nbdd0121's final statement above:

Overall, I think allowing forced unwind through a nounwind function is quite problematic on platforms that does not require unwind table to always exist. With -C force-unwind-tables=no and -C force-frame-pointers=no, how would the unwind runtime be able to correctly unwind through a nounwind function?

As I mentioned, "correctly" here just means that the correct frames are deallocated. So the question is essentially "how does the runtime know where to jump to," correct?

It sounds like the only platforms where we think this is an issue are the ones that use glibc, since other libc implementations don't use forced unwinding to implement pthread_cancel, and as @bjorn3 mentioned above, SEH-based longjmp (we believe) always requires unwind tables. (Can we confirm this just be making sure there's no way to invoke cl.exe/link.exe with an option equivalent to -C force-unwind-tables=no?)

So if I understand the problem correctly, on glibc, in the case of pthread_cancel, the runtime needs unwind-tables to know where to stop unwinding; but the user is able to tell the compiler/linker not to include the tables in the binary.

If that's the case, I don't understand why it would matter what Rust does. It sounds like pthread_cancel simply won't work in such a scenario, whether or not there are Rust frames involved. Am I misunderstanding?

@BatmanAoD
Copy link
Member

BatmanAoD commented Sep 12, 2022

Or, to put it another way: in the case of a forced unwind, the runtime must solve the problem of "where to stop unwinding" somehow, and since the place to stop unwinding should never be a Rust frame (because we don't abort on forced unwind), I think it simply doesn't matter whether or not there are any Rust frames in a thread being unwound by pthread_cancel.

@nbdd0121
Copy link
Contributor

As I mentioned, "correctly" here just means that the correct frames are deallocated. So the question is essentially "how does the runtime know where to jump to," correct?

Correctly means that the unwinding runtime can actually perform unwinding at all. Some unwinding runtimes require unwinding information to exist to perform unwinding at all; some unwinding runtimes try to unwind frames without any unwinding information at all by using the frame pointers alone. For the latter if force-frame-pointers is off, it can clobber the stack pointer and cause memory safety issues.

SEH-based longjmp (we believe) always requires unwind tables. (Can we confirm this just be making sure there's no way to invoke cl.exe/link.exe with an option equivalent to -C force-unwind-tables=no?)

On Windows platforms unwinding info is required: https://doc.rust-lang.org/1.63.0/nightly-rustc/src/rustc_target/spec/windows_msvc_base.rs.html#19

So if I understand the problem correctly, on glibc, in the case of pthread_cancel, the runtime needs unwind-tables to know where to stop unwinding; but the user is able to tell the compiler/linker not to include the tables in the binary.

If that's the case, I don't understand why it would matter what Rust does. It sounds like pthread_cancel simply won't work in such a scenario, whether or not there are Rust frames involved. Am I misunderstanding?

It needs unwind tables to unwind through intermediatary frames. It can't even see the catching frame without going through all the intermediatary frames.

@nbdd0121
Copy link
Contributor

Actually, just went into a deep dive into how libgcc and libunwind's unwinder was implemented, it turns out none of them are performing unwinding of frames with no unwinding information by using frame pointers. Both of them just bail out and say "this is the end of stack" when such function is found. Perhaps my memory was off recalling similar things from gdb or Linux kernel. This is good since we don't have to worry about memory safety issues.

I also had a deep dive into how pthread_exit is implemented in glibc, and turns out whenever it fails to unwind further, it will just do a longjmp and exits. So if the Rust program is compiled without unwind tables, pthread_exit still work, just behave differently (instead of unwinding through Rust POFs it just skip unwinding entirely).

@BatmanAoD
Copy link
Member

@nbdd0121 So is there not actually a problem in that case, then?

@nbdd0121
Copy link
Contributor

nbdd0121 commented Sep 12, 2022

Well, without unwind tables, it essentially makes pthread_exit behave like longjmp, so it's okay over Rust POFs. If there are some C++ frames or C frames with __cleanup__ then there will be behavioural difference for them, but probably that's not something for us to worry about.

I am not sure about other uses (are there any?) of forced unwind, though.

@BatmanAoD
Copy link
Member

@nbdd0121 I believe there are not; but in any case, it still seems to me that if there are, it's not on the Rust compiler to ensure that unwind tables are used or available.

@BatmanAoD
Copy link
Member

So we just need to ensure, somehow, that forced unwinds don't cause an automatic abort due to "C-unwind".

@programmerjake
Copy link
Member Author

here's a c++ equivalent that works: https://gcc.godbolt.org/z/8aWxr45jP maybe have rustc mostly duplicate that?

@nbdd0121
Copy link
Contributor

Interesting, I was under the false impression that clang will abort when forced unwind crosses a noexcept frame (because that's why #48572 existed, so our behaviour is different from clang's). But apparently that's only on MSVC and not on Linux.

Clang generates catch ptr null instead of cleanup for landing pads when the function is noexcept -- that's like a catch (...) in C++ which catches all C++ and foreign unwinds but not forced ones. We could just take the same approach. I still need to figure out the exact algorithm that Clang used to generate one or another, but this problem can indeed be fixed in codegen level.

Thanks @programmerjake for pointing this out. I was focusing on things more complicated but oblivious to the simple solution.

@rustbot claim

@Amanieu
Copy link
Member

Amanieu commented Sep 12, 2022

here's a c++ equivalent that works: gcc.godbolt.org/z/8aWxr45jP maybe have rustc mostly duplicate that?

This aborts with terminate called without an active exception on both GCC and Clang. So our behavior is the same as C++ in this regards.

@nbdd0121
Copy link
Contributor

here's a c++ equivalent that works: gcc.godbolt.org/z/8aWxr45jP maybe have rustc mostly duplicate that?

This aborts with terminate called without an active exception on both GCC and Clang. So our behavior is the same as C++ in this regards.

Now this is getting even more interesting. On my machine (x86_64, GCC 10.2.1, Clang 11.0.1, Linux 5.10.74, glibc 2.31) both GCC and Clang produced binaries complete without issue. What's your environment?

@Amanieu
Copy link
Member

Amanieu commented Sep 12, 2022

x86_64, GCC 12.2.0, clang 14.0.6, Linux 5.17.8, glibc 2.36.

With clang -stdlib=libc++ this still fails, but with a libc++abi: terminating with uncaught foreign exception message instead.

I think the key difference might be in the glibc version, specifically in the way it is throwing the forced unwind.

@Amanieu
Copy link
Member

Amanieu commented Sep 12, 2022

Actually it could also be a difference in the libgcc unwinder (it's still used even with -stdlib=libc++).

@nbdd0121
Copy link
Contributor

Okay, I am able to replicate @Amanieu's behaviour. So there are two versions of pthread_exit, one provided in libc and one provided by libpthread. If I just compile the file with g++, then it gets linked to the libc version, and if I compile with addition -pthread, it gets linked to the libpthread version. Only the latter will perform unwinding.

I was attempting to try to understand the behaviour with glibc and libstdc++ by plugging in my own unwinder using LD_PRELOAD but it seems that the libpthread somehow just calls into libgcc_s directly for unwinding while the personality calls the preloaded symbol, causing all sorts of issued because my unwinder is not ABI-compatible with libgcc.

So I decided to look directly at the libstdc++ personality code instead. It turns out it's sort of a deliberate decision to abort when noexcept function is encountered upon forced unwind in C++. https://github.com/gcc-mirror/gcc/blob/03381beccb52c0e2c15da3b8b8dfa3bb6eb71df9/libstdc%2B%2B-v3/libsupc%2B%2B/eh_personality.cc#L669-L673

@BatmanAoD
Copy link
Member

sort of a deliberate decision...

For what it's worth, that phrasing seems very apt to me. I don't believe the C++ standard makes any attempt to define "forced unwinding" or characterize its behavior, so it's unsurprising that the behavior isn't consistent.

@nbdd0121 Do you still feel like you know how to proceed here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. F-c_unwind `#![feature(c_unwind)]` WG-ffi-unwind Working group: FFI unwind
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants