Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock when capturing a backtrace from allocator during panic with test output capturing enabled #130187

Open
Nemo157 opened this issue Sep 10, 2024 · 2 comments
Labels
A-backtrace Area: backtraces C-bug Category: This is a bug. I-prioritize Issue: Indicates that prioritization has been requested for this issue. needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. regression-from-stable-to-stable Performance or correctness regression from one stable version to another.

Comments

@Nemo157
Copy link
Member

Nemo157 commented Sep 10, 2024

I tried this code (minimized from a larger testcase):

use std::alloc::{GlobalAlloc, Layout};
use std::cell::Cell;
use std::backtrace::Backtrace;
use std::thread_local;

thread_local! {
    static CAN_ALLOCATE: Cell<bool> = const { Cell::new(true) };
}

#[derive(Debug)]
pub struct NoAllocate(std::alloc::System);

unsafe impl GlobalAlloc for NoAllocate {
    unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
        if !CAN_ALLOCATE.replace(true) {
            let _ =  Backtrace::force_capture();
        }
        self.0.alloc(layout)
    }

    unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
        self.0.dealloc(ptr, layout);
    }
}

#[global_allocator]
static GLOBAL: NoAllocate = NoAllocate(std::alloc::System);

#[test]
#[should_panic]
fn main() {
    CAN_ALLOCATE.set(false);
    panic!();
}

Compiled with rustc --test main.rs.

I expected to see this happen: the test successfully panics and exits.

Instead, this happened: the test deadlocks.

Relevant section of backtrace:

...
#4  std::sync::mutex::Mutex::lock<()> () at std/src/sync/mutex.rs:317
#5  std::sys::backtrace::lock () at std/src/sys/backtrace.rs:18
#6  std::backtrace::Backtrace::create () at std/src/backtrace.rs:326
#7  0x0000555555585b25 in std::backtrace::Backtrace::force_capture () at std/src/backtrace.rs:312
#8  0x000055555556ab52 in <main::NoAllocate as core::alloc::global::GlobalAlloc>::alloc ()
#9  0x000055555556ac45 in __rust_alloc ()
...
#15 alloc::vec::Vec::reserve<u8, alloc::alloc::Global> () at alloc/src/vec/mod.rs:973
...
#22 0x00005555555879a3 in std::io::Write::write_fmt<alloc::vec::Vec<u8, alloc::alloc::Global>> () at std/src/io/mod.rs:1823
#23 0x000055555558aeb7 in std::panicking::default_hook::{closure#1} () at std/src/panicking.rs:256

This is specifically related to output capturing from the test runner, running the same code as a non-test binary or with --nocapture works perfectly.

Meta

This worked in 1.80.1 and nightly-2024-07-13, it started failing in 1.81.0 and nightly-2024-07-14.

The deadlock was introduced by #127397 (cc @jyn514).

The lock is first taken when starting to print from the default panic hook:

let mut lock = backtrace::lock();

The first print to the output then happens:

let _ = writeln!(err, "thread '{name}' panicked at {location}:\n{msg}");

For captured output this requires then reallocating the Vec storing the capture, so it calls into the allocator, hitting the Backtrace::force_capture because this is the first allocation in the test. This attempts to re-entrantly acquire the lock a second time, deadlocking:

let _lock = lock();

@Nemo157 Nemo157 added regression-from-stable-to-stable Performance or correctness regression from one stable version to another. C-bug Category: This is a bug. A-backtrace Area: backtraces labels Sep 10, 2024
@rustbot rustbot added I-prioritize Issue: Indicates that prioritization has been requested for this issue. needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Sep 10, 2024
@jyn514
Copy link
Member

jyn514 commented Sep 10, 2024

i see three reasonable ways to fix this:

  1. document that alloc handlers aren't allowed to capture backtraces while panicking
  2. separate the panic lock from the backtrace capturing lock, so it's valid again to hold both locks at once. this would be pretty easy to do, but have some small amount of overhead.
  3. remove the need for the backtrace capturing lock: do the _unsynchronized functions need to exist? backtrace-rs#637. this needs a bit of the design for the impl, but i think it's very doable.

2 would be the short-term fix. i would like to do 3 eventually, it seems very silly to have so many nested locks around the backtrace crate.

@workingjubilee
Copy link
Member

happy to accept any PRs to backtrace-rs needed to make this better (assuming they are, uh, sound, obviously).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-backtrace Area: backtraces C-bug Category: This is a bug. I-prioritize Issue: Indicates that prioritization has been requested for this issue. needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. regression-from-stable-to-stable Performance or correctness regression from one stable version to another.
Projects
None yet
Development

No branches or pull requests

4 participants