Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash, invalid free in monterey #107929

Closed
kali opened this issue Feb 11, 2023 · 12 comments
Closed

Crash, invalid free in monterey #107929

kali opened this issue Feb 11, 2023 · 12 comments
Labels
C-bug Category: This is a bug. I-crash Issue: The compiler crashes (SIGSEGV, SIGABRT, etc). Use I-ICE instead when the compiler panics. O-macos Operating system: macOS

Comments

@kali
Copy link
Contributor

kali commented Feb 11, 2023

The following code, compiled optimized on MacOS Monterey, x64, will crash with an invalid free message.

Copy as a monterey-crasher.rs file.

#![allow(dead_code)]
#[derive(Copy, Clone)]
enum BinOp {
    Min,
}
#[derive(Clone, Copy)]
enum OutputStoreSpec {
    View(usize),
    Strides([isize; 5])
}
#[derive(Clone)]
enum AttrOrInput {
    Attr(Box<()>),
    Input(usize),
}
#[derive(Clone)]
enum ProtoFusedSpec {
    BinScalar(AttrOrInput, BinOp),
    BinPerRow(AttrOrInput, BinOp),
    BinPerCol(AttrOrInput, BinOp),
    AddRowColProducts(AttrOrInput, AttrOrInput),
    AddUnicast(OutputStoreSpec, AttrOrInput),
    Store,
}
fn main() {
    let mut stuff = vec!(vec!(1));
    for i in 0..50000 {
        let len = (stuff[i].len() * 134775813) % 4096;
        stuff.push((1234123414u32..).take(len).collect());
    }
    std::mem::drop(stuff);
    let _ = vec!((Box::new(()), vec![ProtoFusedSpec::Store])).as_slice().to_owned();
}

This shell script will loop 100 times over the generated executable, and will likely crash in the first couple of runs.

#!/bin/sh

set -e

rustc -C opt-level=3 monterey-crasher.rs -o monterey-crasher
for i in `seq 1 100`
do
    echo $i
    ./monterey-crasher
done

Meta

Reproducible with any stable version since 1.65. As far as we can tell it is a regression that appeared first with 512bd84

Output

[...]
1
monterey-crasher(80086,0x113e0d600) malloc: *** error for object 0x600001850d80: pointer being freed was not allocated
monterey-crasher(80086,0x113e0d600) malloc: *** set a breakpoint in malloc_error_break to debug

Crash stack trace In LLDB:

* thread #1, name = 'main', queue = 'com.apple.main-thread', stop reason = signal SIGABRT
  * frame #0: 0x00007ff80c60f00e libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007ff80c6451ff libsystem_pthread.dylib`pthread_kill + 263
    frame #2: 0x00007ff80c590d24 libsystem_c.dylib`abort + 123
    frame #3: 0x00007ff80c46e357 libsystem_malloc.dylib`malloc_vreport + 551
    frame #4: 0x00007ff80c47152b libsystem_malloc.dylib`malloc_report + 151
    frame #5: 0x00000001000024c6 monterey-crasher`monterey_crasher::main::h6d331bef051cd6b6 + 742
    frame #6: 0x0000000100002046 monterey-crasher`std::sys_common::backtrace::__rust_begin_short_backtrace::h1f33f2adc2752b09 + 6
    frame #7: 0x000000010000201c monterey-crasher`std::rt::lang_start::_$u7b$$u7b$closure$u7d$$u7d$::h05acf7e35f5cbc7f + 12
    frame #8: 0x000000010001c2a4 monterey-crasher`std::rt::lang_start_internal::hd56d2fa7efb2dd60 [inlined] core::ops::function::impls::_$LT$impl$u20$core..ops..function..FnOnce$LT$A$GT$$u20$for$u20$$RF$F$GT$::call_once::h2302f1d25ef2ca9b at function.rs:606:13 [opt]
    frame #9: 0x000000010001c2a1 monterey-crasher`std::rt::lang_start_internal::hd56d2fa7efb2dd60 [inlined] std::panicking::try::do_call::h6695e32a593de2cc at panicking.rs:483:40 [opt]
    frame #10: 0x000000010001c2a1 monterey-crasher`std::rt::lang_start_internal::hd56d2fa7efb2dd60 [inlined] std::panicking::try::hd4a93095627721a9 at panicking.rs:447:19 [opt]
    frame #11: 0x000000010001c2a1 monterey-crasher`std::rt::lang_start_internal::hd56d2fa7efb2dd60 [inlined] std::panic::catch_unwind::he41b3dba63feca94 at panic.rs:137:14 [opt]
    frame #12: 0x000000010001c2a1 monterey-crasher`std::rt::lang_start_internal::hd56d2fa7efb2dd60 [inlined] std::rt::lang_start_internal::_$u7b$$u7b$closure$u7d$$u7d$::hbf45583011495a61 at rt.rs:148:48 [opt]
    frame #13: 0x000000010001c2a1 monterey-crasher`std::rt::lang_start_internal::hd56d2fa7efb2dd60 [inlined] std::panicking::try::do_call::ha3e6b3edab7da449 at panicking.rs:483:40 [opt]
    frame #14: 0x000000010001c2a1 monterey-crasher`std::rt::lang_start_internal::hd56d2fa7efb2dd60 [inlined] std::panicking::try::hd4e0f354bf7022b9 at panicking.rs:447:19 [opt]
    frame #15: 0x000000010001c2a1 monterey-crasher`std::rt::lang_start_internal::hd56d2fa7efb2dd60 [inlined] std::panic::catch_unwind::h1035b163871a4269 at panic.rs:137:14 [opt]
    frame #16: 0x000000010001c2a1 monterey-crasher`std::rt::lang_start_internal::hd56d2fa7efb2dd60 at rt.rs:148:20 [opt]
    frame #17: 0x000000010000260c monterey-crasher`main + 44
    frame #18: 0x000000010007952e dyld`start + 462

Notes

  • Bug was discovered in tract, in conjuction with a big pile of code from ndarray. It took a huge amount of effort reducing, some manual, some semi-automatic to obtain a small test case without unsafe code (which was innocent). tract contribution to the test-case is the ProtoFusedSpec enumeration. ndarray main contribution to the issue is the .as_slice().to_owned() bit. One of the breakthrough was realizing that most of the remaining code was actually just putting non-zero bits in memory. At that point we could remove most of the remaining stuff and replace it by the pseudo-random allocation at the beginning of the main.
  • We don't know if it reproduces on arm64 monterey (not tried, could not find a machine). We could only reproduce on x86-64 Monterey, not arm64 Monterey, not Ventura.
  • rustc commit (512bd84) obtained by bisecting points to something related to enumeration and niche discriminant optimisation. We checked the layout, dumped the ProtoFusedSpec structure without finding anything suspect. @lqd also dumped the rustc internal structure for the enumeration representation https://gist.github.com/lqd/bb93888ee24540072141afd6b93df6f3 . The gist comes with the test-case variant at the time of dumping, we were able to reduce it more since (including alteration to the actual problematic enumeration).
  • Various sanitizing tools have failed us. Varnish and address sanitizer, as well as XCode MallocGard were unable to come with anything more interesting than the LLDB stack trace.
  • We also tried instrumenting rustc global allocator to try to figure out if the address was actually invalid. But the bug is very elusive. We could not reproduce it with the instrumentation.
  • What's specific with Monterey ? Apparently, MacOS Monterey has two variants of the system allocator. The MallocNanoZone environment variable seems to control which variant is used. Many applications have run into problems with the default choice and set MallocNanoZone=0 in the environment as a workaround. VScode actually does it: the bug does not appear in its terminal (unless running with env -i to discard the switch).
@kali kali added the C-bug Category: This is a bug. label Feb 11, 2023
@atsuzaki
Copy link
Contributor

atsuzaki commented Feb 11, 2023

@kali I have an arm64 Monterey. Happy to work with you to repro this on my machine if that's still worth doing at this point!

@kali
Copy link
Contributor Author

kali commented Feb 12, 2023

@atsuzaki Well I guess it may be an extra clue. Just try the program and tell us how it goes :)

@lqd
Copy link
Member

lqd commented Feb 12, 2023

I did try on an M1 on Monterey and it didn’t reproduce there.

@lqd
Copy link
Member

lqd commented Feb 12, 2023

While it looks like a system allocator bug, it seemed interesting to have an issue here:

It's not clear however how to move forward with this investigation, or how to fix it, so I'd like to ask for the macos group's help, if they see something obvious we're missing here, have ever encountered a similar issue, or if they know what procedure to follow (maybe filing a Radar on Apple's bug tracker) ?

@rustbot ping macos

Thanks in advance.

@rustbot rustbot added the O-macos Operating system: macOS label Feb 12, 2023
@rustbot
Copy link
Collaborator

rustbot commented Feb 12, 2023

Hey MacOS Group! This issue or PR could use some MacOS-specific guidance. Could one
of you weigh in? Thanks <3

cc @hkratz @inflation @nvzqz @shepmaster @thomcc

@Noratrieb
Copy link
Member

Can you try the following example?

fn main() {
    let mut stuff = vec!(vec!(1));
    for i in 0..50000 {
        let len = (stuff[i].len() * 134775813) % 4096;
        stuff.push((1234123414u32..).take(len).collect());
    }
    std::mem::drop(stuff);
    let _ = vec!((Box::new(()), vec![[0u64; 8]])).as_slice().to_owned();
}

Just minimized away the enums into a simple array with the same size/align. Should behave the exact same and would let us definitely rule out the enum optimization as just an accidental issue exposer on this code and nothing relevant.

@kali
Copy link
Contributor Author

kali commented Feb 12, 2023

@Nilstrieb This is one I had tried before. I've just given it a shot again, but no such luck, without the enumeration, no repro.

@BiffBish
Copy link

Just found this naturally in my codebase, In my repo here, This is far from a minimal example but i have the stack trace in llvm and it was caused by resizing a Vector
image

@workingjubilee workingjubilee added the I-crash Issue: The compiler crashes (SIGSEGV, SIGABRT, etc). Use I-ICE instead when the compiler panics. label Mar 11, 2023
@kali
Copy link
Contributor Author

kali commented Apr 22, 2023

I was made aware of this fix #110128 on nightly. I tested again and the monterey crash seems to be gone with nightly, whether or not this was the cause.

@kali
Copy link
Contributor Author

kali commented Apr 22, 2023

Bisected the fix to nightly-2023-03-26

@kali
Copy link
Contributor Author

kali commented Apr 22, 2023

Bug disappears with commit: 0c61c7a, which is a LLVM bump. Can't say whether the bug is fixed or the repro does not work anymore.

@workingjubilee
Copy link
Member

Presumptively declaring victory, I guess?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. I-crash Issue: The compiler crashes (SIGSEGV, SIGABRT, etc). Use I-ICE instead when the compiler panics. O-macos Operating system: macOS
Projects
None yet
Development

No branches or pull requests

7 participants