Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLVM 17 regression: tests/ui/backtrace.rs can't double-panic anymore #109672

Closed
durin42 opened this issue Mar 27, 2023 · 24 comments
Closed

LLVM 17 regression: tests/ui/backtrace.rs can't double-panic anymore #109672

durin42 opened this issue Mar 27, 2023 · 24 comments
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@durin42
Copy link
Contributor

durin42 commented Mar 27, 2023

LLVM change 0d4a709 (https://reviews.llvm.org/D145210) caused LLVM to have enough smarts to figure out that double panics "can't" happen - we already adjusted some tests for this (eg e4a4064 where we fixed tests/codegen/vec-shrink-panik.rs) but this seems to be a bit of an actual regression, as the code asserts the double-panic happens, but instead the double panic fails like:

thread 'main' panicked at 'once', fake-test-src-base/backtrace.rs:33:5
stack backtrace:
   0: std::panicking::begin_panic
             at /rustc/FAKE_PREFIX/library/std/src/panicking.rs:611:12
   1: backtrace::double
             at fake-test-src-base/backtrace.rs:33:5
   2: backtrace::main
             at fake-test-src-base/backtrace.rs:127:9
   3: core::ops::function::FnOnce::call_once
             at /rustc/FAKE_PREFIX/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
fatal runtime error: failed to initiate panic, error 3

so it seems like maybe LLVM is incorrect about this double-panic not being reachable or something.

Filing this bug and disabling the test in our Rust/LLVM HEADs CI for now so we see other (more urgent) breakage reliably.

aarongable pushed a commit to chromium/chromium that referenced this issue Mar 27, 2023
tests/ui/abi/stack-protector.rs:
rust-lang/rust#109671 the test is being
optimized in newer LLVM which breaks its expectations.

tests/ui/backtrace.rs:
rust-lang/rust#109672 the second panic in a
double-panic is being optimized out (reasonably correctly) by newer
LLVM.

Bug: b/274802621, 1401042
Change-Id: I2be621f38dd1630c13eaf9128c76e0d17d684367
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/4375277
Reviewed-by: Hans Wennborg <hans@chromium.org>
Commit-Queue: danakj <danakj@chromium.org>
Cr-Commit-Position: refs/heads/main@{#1122545}
@clubby789
Copy link
Contributor

@rustbot label +A-llvm

@rustbot rustbot added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label Mar 27, 2023
@jyn514
Copy link
Member

jyn514 commented Apr 3, 2023

so it seems like maybe LLVM is incorrect about this double-panic not being reachable or something.

this seems like a bug in LLVM itself? why did you file it against rust-lang/rust?

@durin42
Copy link
Contributor Author

durin42 commented Apr 3, 2023

It certainly sounds like one, but the IR from the Rust test case is big enough that I feel bad just whining upstream, when I don't really understand what's going on.

Maybe @nikic has an idea, given they reviewed the upstream change?

@nikic
Copy link
Contributor

nikic commented Apr 3, 2023

Hm, the IR looks fine to me -- double() still has the check for double panics. Not sure why it doesn't work.

@nikic
Copy link
Contributor

nikic commented Apr 4, 2023

Looks like error code 3 is _URC_FATAL_PHASE1_ERROR.

@nikic
Copy link
Contributor

nikic commented Apr 4, 2023

rust_eh_personality returns _URC_CONTINUE_UNWIND for the backtrace::double frame and _URC_FATAL_PHASE1_ERROR for the bracktrace::main frame.

I think the problem here is that LLVM now realizes that double() can't unwind because it will always end up aborting due to double panic. This means that the invoke to double() in main() will be converted into a call and won't have an LSD entry. This then causes rust_eh_personality to signal an error.

This seems correct from an LLVM perspective, but not sure who/what is actually at fault here. Maybe @Amanieu knows.

@Amanieu
Copy link
Member

Amanieu commented Apr 4, 2023

_URC_FATAL_PHASE1_ERROR means that either:

  • Exception handling metadata is invalid, and libgcc (which provides the unwind implementation on Linux) is complaining.
  • The personality routine returned an error (anything except _URC_CONTINUE_UNWIND or _URC_HANDLER_FOUND).

Based on my reading of the personality function (library/std/src/peronality/gcc.rs), an error only occurs if exception handling metadata is invalid, or if there is an exception from a call site that doesn't have a corresponding entry in the LSDA call site table. This might happen if we are throwing an exception from a call site that LLVM assumes is noexcept.

Note that this is not something that LLVM should assume on its own: throwing an exception from a landing pad is well-defined and should propagate that exception up to the caller. If we (or LLVM) are something incorrectly marking a call as noexcept then this should be fixed.

Does this still reproduce if the test case is reduced to just the double() function? If so, could you post the IR and generated assembly code so I can have a look?

@nikic
Copy link
Contributor

nikic commented Apr 4, 2023

Hrm, just reducing it to double() makes it stop reproducing. Here's the smallest I was able to produce:

use std::env;
use std::process::Command;

#[inline(never)]
fn double() {
    struct Double;

    impl Drop for Double {
        fn drop(&mut self) { panic!("twice") }
    }

    let _d = Double;

    panic!("once");
}

fn main() {
    let args: Vec<String> = env::args().collect();
    if args.len() >= 2 && args[1] == "fail" {
    } else if args.len() >= 2 && args[1] == "double-fail" {
        double();
    } else {
        Command::new(&args[0]);
    }
}

The IR/asm for that file looks like this: https://gist.github.com/nikic/a3a4322442c8f63b507cbe5a4a32b0db

@nikic
Copy link
Contributor

nikic commented Apr 4, 2023

Based on my reading of the personality function (library/std/src/peronality/gcc.rs), an error only occurs if exception handling metadata is invalid, or if there is an exception from a call site that doesn't have a corresponding entry in the LSDA call site table. This might happen if we are throwing an exception from a call site that LLVM assumes is noexcept.

As I understand the problem, this is what happens. The call site is noexcept because the function is known to ultimately abort due to double panic.

Cleanup handlers get skipped during the phase 1 walk. Normally, we will always hit a catch handler eventually (in lang_start_internal if nothing else). In this case this doesn't happen because it is known that no exception can unwind that far up (once cleanup handlers have been executed).

@Amanieu
Copy link
Member

Amanieu commented Apr 4, 2023

Clang uses a catch handler for noexcept functions (https://godbolt.org/z/aYKGGdanq) which avoids this problem.

I remember we had a good reason for using cleanup instead of catch for TerminatorKind::Abort, but I can't remember what it was.

cc @nbdd0121

@nbdd0121
Copy link
Contributor

nbdd0121 commented Apr 4, 2023

We just don't have a reliable way to tell whether we'll be aborting or not, because TerminatorKind::Abort is a terminator that is part of cleanup blocks. I do plan to move away from cleanup (#104070, but to filter instead of catch, so our personality function can tell it apart from the try intrinsic), but that PR is currently blocked on having the MIR unwind refactor #102906 merged (which will make aborting action explicit).

@nbdd0121
Copy link
Contributor

nbdd0121 commented Apr 5, 2023

@nikic what will happen if you change the panic!("twice") to an abort()? Would LLVM still infer that double is indeed nounwind and thus omit the LSDA entry?

If so, then we have a bigger problem then just TerminatorKind::Abort.

@nikic
Copy link
Contributor

nikic commented Apr 5, 2023

I don't think that changing how we handle the double panic abort will fully fix, because we get the same behavior if we just explicitly abort inside the cleanup:

#[inline(never)]
fn double() {
    struct Double;

    impl Drop for Double {
        fn drop(&mut self) {
            std::process::abort();
        }
    }

    let _d = Double;

    panic!("once");
}

This wouldn't be affected by the proposed changes, right?

@nbdd0121
Copy link
Contributor

nbdd0121 commented Apr 5, 2023

Hah, a race condition there :)

@nbdd0121
Copy link
Contributor

nbdd0121 commented Apr 5, 2023

I think we should just return _URC_HANDLER_FOUND for a cleanup, instead of _URC_CONTINUE_UNWIND

EDIT: Actually this isn't quite right because we won't be able to resume if we returned _URC_HANDLER_FOUND

@nbdd0121
Copy link
Contributor

nbdd0121 commented Apr 5, 2023

One possible approach is to return _URC_HANDLER_FOUND when we can't find a LSDA entry in phase 1. This should work if the caller frame is a Rust frame. However, this can be problematic if the function being transformed is a extern "C" function that's called from FFI, since the personality function will be out of our control.

@nikic
Copy link
Contributor

nikic commented Apr 5, 2023

Hm, it looks like this also affects C++. For the following snippet:

#include <cstdlib>
#include <iostream>

struct Abort {
  ~Abort() {
    abort();
  }
};

__attribute__((noinline))
static void abort_in_dtor() {
  Abort abort;
  throw "test";
}

int main() {
  try {
    abort_in_dtor();
  } catch (...) {
    std::cout << "caught" << std::endl;
  }
}

I get:

terminate called after throwing an instance of 'char const*'
Aborted (core dumped)

Rather than just an abort.

So this does also affect the __gxx_personality_v0 personality function as well.

@nikic
Copy link
Contributor

nikic commented Apr 5, 2023

I've filed llvm/llvm-project#61945 on the LLVM side with the Clang reproducer.

@Noratrieb Noratrieb added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Apr 5, 2023
@nikic
Copy link
Contributor

nikic commented Apr 17, 2023

@durin42 This should be fixed now, can you please try re-enabling the test?

@nbdd0121
Copy link
Contributor

Would we still have issue for the abort case (the double-panic case should be fixed)?

@nbdd0121
Copy link
Contributor

nbdd0121 commented Apr 17, 2023

test.rs:

#[no_mangle]
extern "C" fn d() {
    struct Double;

    impl Drop for Double {
        fn drop(&mut self) {
            std::process::abort();
        }
    }

    let _d = Double;

    panic!("once");
}

test.c:

void d();

int main(void) {
    d();
}

compile with:

rustc -O test.rs --crate-type staticlib
gcc test.c libtest.a
./a.out

and I get:

thread '<unnamed>' panicked at 'once', test.rs:13:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
fatal runtime error: failed to initiate panic, error 5
Aborted

@nikic
Copy link
Contributor

nikic commented Apr 18, 2023

@nbdd0121 Is that with LLVM 17? The abort case should be fixed as well. We consider all cleanup landingpads as unwinding now, including those that just abort.

@durin42
Copy link
Contributor Author

durin42 commented Apr 18, 2023

@nikic Confirmed this is now fixed (but #109671 is still broken)

@durin42 durin42 closed this as completed Apr 18, 2023
@nbdd0121
Copy link
Contributor

@nbdd0121 Is that with LLVM 17? The abort case should be fixed as well. We consider all cleanup landingpads as unwinding now, including those that just abort.

I tested this on LLVM 15 and 16 and it could be reproduced in both cases. I'll file a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

8 participants