LLVM 17 regression: tests/ui/backtrace.rs can't double-panic anymore #109672

durin42 · 2023-03-27T17:57:16Z

LLVM change 0d4a709 (https://reviews.llvm.org/D145210) caused LLVM to have enough smarts to figure out that double panics "can't" happen - we already adjusted some tests for this (eg e4a4064 where we fixed tests/codegen/vec-shrink-panik.rs) but this seems to be a bit of an actual regression, as the code asserts the double-panic happens, but instead the double panic fails like:

thread 'main' panicked at 'once', fake-test-src-base/backtrace.rs:33:5
stack backtrace:
   0: std::panicking::begin_panic
             at /rustc/FAKE_PREFIX/library/std/src/panicking.rs:611:12
   1: backtrace::double
             at fake-test-src-base/backtrace.rs:33:5
   2: backtrace::main
             at fake-test-src-base/backtrace.rs:127:9
   3: core::ops::function::FnOnce::call_once
             at /rustc/FAKE_PREFIX/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
fatal runtime error: failed to initiate panic, error 3

so it seems like maybe LLVM is incorrect about this double-panic not being reachable or something.

Filing this bug and disabling the test in our Rust/LLVM HEADs CI for now so we see other (more urgent) breakage reliably.

The text was updated successfully, but these errors were encountered:

tests/ui/abi/stack-protector.rs: rust-lang/rust#109671 the test is being optimized in newer LLVM which breaks its expectations. tests/ui/backtrace.rs: rust-lang/rust#109672 the second panic in a double-panic is being optimized out (reasonably correctly) by newer LLVM. Bug: b/274802621, 1401042 Change-Id: I2be621f38dd1630c13eaf9128c76e0d17d684367 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/4375277 Reviewed-by: Hans Wennborg <hans@chromium.org> Commit-Queue: danakj <danakj@chromium.org> Cr-Commit-Position: refs/heads/main@{#1122545}

clubby789 · 2023-03-27T19:10:27Z

@rustbot label +A-llvm

jyn514 · 2023-04-03T09:06:32Z

so it seems like maybe LLVM is incorrect about this double-panic not being reachable or something.

this seems like a bug in LLVM itself? why did you file it against rust-lang/rust?

durin42 · 2023-04-03T14:36:19Z

It certainly sounds like one, but the IR from the Rust test case is big enough that I feel bad just whining upstream, when I don't really understand what's going on.

Maybe @nikic has an idea, given they reviewed the upstream change?

nikic · 2023-04-03T15:52:29Z

Hm, the IR looks fine to me -- double() still has the check for double panics. Not sure why it doesn't work.

nikic · 2023-04-04T12:45:11Z

Looks like error code 3 is _URC_FATAL_PHASE1_ERROR.

nikic · 2023-04-04T13:07:05Z

rust_eh_personality returns _URC_CONTINUE_UNWIND for the backtrace::double frame and _URC_FATAL_PHASE1_ERROR for the bracktrace::main frame.

I think the problem here is that LLVM now realizes that double() can't unwind because it will always end up aborting due to double panic. This means that the invoke to double() in main() will be converted into a call and won't have an LSD entry. This then causes rust_eh_personality to signal an error.

This seems correct from an LLVM perspective, but not sure who/what is actually at fault here. Maybe @Amanieu knows.

Amanieu · 2023-04-04T13:32:50Z

_URC_FATAL_PHASE1_ERROR means that either:

Exception handling metadata is invalid, and libgcc (which provides the unwind implementation on Linux) is complaining.
The personality routine returned an error (anything except _URC_CONTINUE_UNWIND or _URC_HANDLER_FOUND).

Based on my reading of the personality function (library/std/src/peronality/gcc.rs), an error only occurs if exception handling metadata is invalid, or if there is an exception from a call site that doesn't have a corresponding entry in the LSDA call site table. This might happen if we are throwing an exception from a call site that LLVM assumes is noexcept.

Note that this is not something that LLVM should assume on its own: throwing an exception from a landing pad is well-defined and should propagate that exception up to the caller. If we (or LLVM) are something incorrectly marking a call as noexcept then this should be fixed.

Does this still reproduce if the test case is reduced to just the double() function? If so, could you post the IR and generated assembly code so I can have a look?

nikic · 2023-04-04T14:55:34Z

Hrm, just reducing it to double() makes it stop reproducing. Here's the smallest I was able to produce:

use std::env;
use std::process::Command;

#[inline(never)]
fn double() {
    struct Double;

    impl Drop for Double {
        fn drop(&mut self) { panic!("twice") }
    }

    let _d = Double;

    panic!("once");
}

fn main() {
    let args: Vec<String> = env::args().collect();
    if args.len() >= 2 && args[1] == "fail" {
    } else if args.len() >= 2 && args[1] == "double-fail" {
        double();
    } else {
        Command::new(&args[0]);
    }
}

The IR/asm for that file looks like this: https://gist.github.com/nikic/a3a4322442c8f63b507cbe5a4a32b0db

nikic · 2023-04-04T15:26:07Z

Based on my reading of the personality function (library/std/src/peronality/gcc.rs), an error only occurs if exception handling metadata is invalid, or if there is an exception from a call site that doesn't have a corresponding entry in the LSDA call site table. This might happen if we are throwing an exception from a call site that LLVM assumes is noexcept.

As I understand the problem, this is what happens. The call site is noexcept because the function is known to ultimately abort due to double panic.

Cleanup handlers get skipped during the phase 1 walk. Normally, we will always hit a catch handler eventually (in lang_start_internal if nothing else). In this case this doesn't happen because it is known that no exception can unwind that far up (once cleanup handlers have been executed).

Amanieu · 2023-04-04T16:02:54Z

Clang uses a catch handler for noexcept functions (https://godbolt.org/z/aYKGGdanq) which avoids this problem.

I remember we had a good reason for using cleanup instead of catch for TerminatorKind::Abort, but I can't remember what it was.

cc @nbdd0121

nbdd0121 · 2023-04-04T16:48:04Z

We just don't have a reliable way to tell whether we'll be aborting or not, because TerminatorKind::Abort is a terminator that is part of cleanup blocks. I do plan to move away from cleanup (#104070, but to filter instead of catch, so our personality function can tell it apart from the try intrinsic), but that PR is currently blocked on having the MIR unwind refactor #102906 merged (which will make aborting action explicit).

nbdd0121 · 2023-04-05T09:27:20Z

@nikic what will happen if you change the panic!("twice") to an abort()? Would LLVM still infer that double is indeed nounwind and thus omit the LSDA entry?

If so, then we have a bigger problem then just TerminatorKind::Abort.

nikic · 2023-04-05T09:27:23Z

I don't think that changing how we handle the double panic abort will fully fix, because we get the same behavior if we just explicitly abort inside the cleanup:

#[inline(never)]
fn double() {
    struct Double;

    impl Drop for Double {
        fn drop(&mut self) {
            std::process::abort();
        }
    }

    let _d = Double;

    panic!("once");
}

This wouldn't be affected by the proposed changes, right?

nbdd0121 · 2023-04-05T09:27:49Z

Hah, a race condition there :)

nbdd0121 · 2023-04-05T09:32:35Z

I think we should just return _URC_HANDLER_FOUND for a cleanup, instead of _URC_CONTINUE_UNWIND

EDIT: Actually this isn't quite right because we won't be able to resume if we returned _URC_HANDLER_FOUND

nbdd0121 · 2023-04-05T09:42:04Z

One possible approach is to return _URC_HANDLER_FOUND when we can't find a LSDA entry in phase 1. This should work if the caller frame is a Rust frame. However, this can be problematic if the function being transformed is a extern "C" function that's called from FFI, since the personality function will be out of our control.

nikic · 2023-04-05T09:45:03Z

Hm, it looks like this also affects C++. For the following snippet:

#include <cstdlib>
#include <iostream>

struct Abort {
  ~Abort() {
    abort();
  }
};

__attribute__((noinline))
static void abort_in_dtor() {
  Abort abort;
  throw "test";
}

int main() {
  try {
    abort_in_dtor();
  } catch (...) {
    std::cout << "caught" << std::endl;
  }
}

I get:

terminate called after throwing an instance of 'char const*'
Aborted (core dumped)

Rather than just an abort.

So this does also affect the __gxx_personality_v0 personality function as well.

nikic · 2023-04-05T09:52:23Z

I've filed llvm/llvm-project#61945 on the LLVM side with the Clang reproducer.

nikic · 2023-04-17T20:41:42Z

@durin42 This should be fixed now, can you please try re-enabling the test?

nbdd0121 · 2023-04-17T21:47:29Z

Would we still have issue for the abort case (the double-panic case should be fixed)?

nbdd0121 · 2023-04-17T21:48:47Z

test.rs:

#[no_mangle]
extern "C" fn d() {
    struct Double;

    impl Drop for Double {
        fn drop(&mut self) {
            std::process::abort();
        }
    }

    let _d = Double;

    panic!("once");
}

test.c:

void d();

int main(void) {
    d();
}

compile with:

rustc -O test.rs --crate-type staticlib
gcc test.c libtest.a
./a.out

and I get:

thread '<unnamed>' panicked at 'once', test.rs:13:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
fatal runtime error: failed to initiate panic, error 5
Aborted

nikic · 2023-04-18T08:46:01Z

@nbdd0121 Is that with LLVM 17? The abort case should be fixed as well. We consider all cleanup landingpads as unwinding now, including those that just abort.

durin42 · 2023-04-18T13:57:02Z

@nikic Confirmed this is now fixed (but #109671 is still broken)

nbdd0121 · 2023-04-18T18:22:43Z

@nbdd0121 Is that with LLVM 17? The abort case should be fixed as well. We consider all cleanup landingpads as unwinding now, including those that just abort.

I tested this on LLVM 15 and 16 and it could be reproduced in both cases. I'll file a new issue.

rustbot added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label Mar 27, 2023

Noratrieb added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Apr 5, 2023

nbdd0121 mentioned this issue Apr 9, 2023

Prevent aborting guard from aborting the process in a forced unwind #104070

Merged

durin42 closed this as completed Apr 18, 2023

nbdd0121 mentioned this issue Apr 18, 2023

Phase 1 unwind failure when a Rust frame containing an aborting drop guard is called from code without unwinding info #110499

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLVM 17 regression: tests/ui/backtrace.rs can't double-panic anymore #109672

LLVM 17 regression: tests/ui/backtrace.rs can't double-panic anymore #109672

durin42 commented Mar 27, 2023

clubby789 commented Mar 27, 2023

jyn514 commented Apr 3, 2023

durin42 commented Apr 3, 2023

nikic commented Apr 3, 2023

nikic commented Apr 4, 2023

nikic commented Apr 4, 2023 •

edited

Loading

Amanieu commented Apr 4, 2023

nikic commented Apr 4, 2023 •

edited

Loading

nikic commented Apr 4, 2023 •

edited

Loading

Amanieu commented Apr 4, 2023

nbdd0121 commented Apr 4, 2023

nbdd0121 commented Apr 5, 2023

nikic commented Apr 5, 2023

nbdd0121 commented Apr 5, 2023

nbdd0121 commented Apr 5, 2023 •

edited

Loading

nbdd0121 commented Apr 5, 2023

nikic commented Apr 5, 2023

nikic commented Apr 5, 2023

nikic commented Apr 17, 2023

nbdd0121 commented Apr 17, 2023

nbdd0121 commented Apr 17, 2023 •

edited

Loading

nikic commented Apr 18, 2023

durin42 commented Apr 18, 2023

nbdd0121 commented Apr 18, 2023

LLVM 17 regression: tests/ui/backtrace.rs can't double-panic anymore #109672

LLVM 17 regression: tests/ui/backtrace.rs can't double-panic anymore #109672

Comments

durin42 commented Mar 27, 2023

clubby789 commented Mar 27, 2023

jyn514 commented Apr 3, 2023

durin42 commented Apr 3, 2023

nikic commented Apr 3, 2023

nikic commented Apr 4, 2023

nikic commented Apr 4, 2023 • edited Loading

Amanieu commented Apr 4, 2023

nikic commented Apr 4, 2023 • edited Loading

nikic commented Apr 4, 2023 • edited Loading

Amanieu commented Apr 4, 2023

nbdd0121 commented Apr 4, 2023

nbdd0121 commented Apr 5, 2023

nikic commented Apr 5, 2023

nbdd0121 commented Apr 5, 2023

nbdd0121 commented Apr 5, 2023 • edited Loading

nbdd0121 commented Apr 5, 2023

nikic commented Apr 5, 2023

nikic commented Apr 5, 2023

nikic commented Apr 17, 2023

nbdd0121 commented Apr 17, 2023

nbdd0121 commented Apr 17, 2023 • edited Loading

nikic commented Apr 18, 2023

durin42 commented Apr 18, 2023

nbdd0121 commented Apr 18, 2023

nikic commented Apr 4, 2023 •

edited

Loading

nikic commented Apr 4, 2023 •

edited

Loading

nikic commented Apr 4, 2023 •

edited

Loading

nbdd0121 commented Apr 5, 2023 •

edited

Loading

nbdd0121 commented Apr 17, 2023 •

edited

Loading