Unloading a Rust dylib with TLS used segfaults on OSX #28794

alexcrichton · 2015-10-01T17:47:47Z

The problem here is that we register a TLS destructor via _tlv_atexit when TLS is referenced the first time after it is used (e.g. when the dylib's function is called), but then when dlclose happens the function isn't actually there and a fault happens when the thread exits and tries to run its destructors.

I'm not entirely sure how we might handle this, perhaps there's a way to compile dylibs such that the TLS access is OK? Perhaps we should hook an "unload" event and deregister (e.g. leak) TLS destructors? Either way seems like a good thing to track!

The text was updated successfully, but these errors were encountered:

ranma42 · 2015-10-02T00:27:35Z

_dyld_register_func_for_remove_image might be the hook we need.
Manpage available at https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man3/dyld.3.html

emoon · 2016-02-23T05:48:09Z

cc @emoon

emoon · 2016-03-05T22:48:01Z

Has there been any progress on this issue? It can be worked around but not releasing the lib but it would be nice to have a proper solution.

ProDBG crashes on exit but that is due to rust-lang/rust#28794 Closes #177

noeleont · 2016-09-01T18:03:42Z

I had the same issue, one workaround was loading the lib with RTLD_NODELETE.

solarretrace · 2016-12-22T03:31:09Z

I think I'm having the same issue? But the RTLD_NODELETE option doesn't seem to help. It's also probably not appropriate for my use-case, which requires unloading and reloading the library repeatedly.

Stack looks like this:

Exception Type: EXC_BAD_ACCESS (SIGSEGV)
Exception Codes: KERN_INVALID_ADDRESS at 0x0000000115ab5cb0

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0 libsystem_malloc.dylib 0x00007fff8e8c5059 free + 58
1 dyld 0x00007fff6542cec1 ImageLoaderMachOCompressed::~ImageLoaderMachOCompressed() + 33
2 dyld 0x00007fff6541afc4 dyld::garbageCollectImages() + 831
3 dyld 0x00007fff65422428 dlclose + 134
4 libdyld.dylib 0x00007fff91143808 dlclose + 61
5 brightlab 0x000000010c54cfdb libloading::os::unix::{{impl}}::drop::{{closure}} + 43 (mod.rs:38)
6 brightlab 0x000000010c54cd56 libloading::os::unix::with_dlerror<(),closure> + 134 (mod.rs:38)
7 brightlab 0x000000010c54cf7d _$LT$libloading..os..unix..Library$u20$as$u20$core..ops..Drop$GT$::drop::hd8137c4da21c7d9d + 45 (mod.rs:38)
8 brightlab 0x000000010c4908e1 drop::h9ed5b642e5309eab + 17
9 brightlab 0x000000010c48e409 drop::h2bbca3905b6b58a1 + 9
10 brightlab 0x000000010c490e28 drop::haca7bed875860903 + 72
11 brightlab 0x000000010c48dee1 drop::h1e1054b6f31067c0 + 17
12 brightlab 0x000000010c48f385 drop::h529da6d106e4933f + 149
13 brightlab 0x000000010c4b2313 brightlab::main + 835 (main.rs:37)
14 brightlab 0x000000010c55997b __rust_maybe_catch_panic + 27 (lib.rs:106)
15 brightlab 0x000000010c558ec7 std::rt::lang_start::hefd96b70277e8a4a + 391 (rt.rs:57)
16 brightlab 0x000000010c4b245a main + 42
17 libdyld.dylib 0x00007fff911445c9 start + 1

nagisa · 2016-12-22T03:40:14Z

Lack of _tlv_atexit in the backtrace seems to suggest that yours is different issue. This comment might help to give some pointers.

solarretrace · 2016-12-23T00:17:38Z

Hmm, I got the impression from that comment that it would be fixed by default... I'll look into it, thanks.

Mark-Simulacrum · 2017-05-18T12:14:20Z

Copying code into here so it doesn't get lost; nagisa/rust_libloading#5 is potentially relevant.

test.rs:

#[no_mangle]
pub extern "system" fn test_fn() -> i32 {
    // Removing this line prevents the segfault.
    // I've tried flushing stdout as well but it doesn't change anything
    println!("In library!");
    123456
}

main.c:

#include <dlfcn.h>
#include <stdio.h>

int main() {
    printf("running\n");
    void* handle = dlopen("./libtest.dylib", RTLD_LAZY);
    printf("opened: %p\n", handle);

    int (*test_fn)() = dlsym(handle, "test_fn");
    printf("test_fn: %d\n", test_fn());

    printf("Closing...\n");
    int code = dlclose(handle); // Removing this line prevents the segfault upon exit?
    printf("Closed: %d.\n", code);
}

$ rustc --crate-type=dylib test.rs
$ gcc main.c
$ ./a.out
running
opened: 0x7f8a93c02640
In library!
test_fn: 123456
Closing...
Closed: 0.
Segmentation fault: 11

aidanhs · 2017-11-14T19:11:56Z

Mention of RTLD_NODELETE on reddit - https://www.reddit.com/r/rust/comments/7cxknc/evolving_our_rust_with_milksnake/dptgnng/?context=3

mitsuhiko · 2017-11-14T20:54:34Z

I ran into this and looked at ways to work around this. It comes up with Python extension modules and so far we just decided to leak the module. The reason this cannot really be fixed to the best of my knowledge is that _tlv_atexit (which is somewhat of an undocumented api as far as I can tell) does not have a way to unregister the callback.

Since the only callback that rust can reasonably place here is from the dylib we can't really register something here that does not crash if the dylib goes away. One would need to find a trampoline that can be used and does not unload. Unsure what the fix here is. This seems like a bug in macos albeit one that has low changes of fixing.

* Refs rust-lang/rust#28794

BurntPizza · 2018-02-03T02:20:53Z

I'm getting something similar on Arch: https://github.com/BurntPizza/dylib_tls_crash

$ cargo build && valgrind target/debug/dylib_crash
    Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs
==27114== Memcheck, a memory error detector
==27114== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==27114== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==27114== Command: target/debug/dylib_crash
==27114== 
Dropping lib
Lib is dropped
Success: No thread
Dropping lib
Lib is dropped
==27114== Thread 2:
==27114== Jump to the invalid address stated on the next line
==27114==    at 0x6EC23D0: ???
==27114==    by 0x524B1B7: __nptl_deallocate_tsd.part.5 (in /usr/lib/libpthread-2.26.so)
==27114==    by 0x524C1DC: start_thread (in /usr/lib/libpthread-2.26.so)
==27114==    by 0x577042E: clone (in /usr/lib/libc-2.26.so)
==27114==  Address 0x6ec23d0 is not stack'd, malloc'd or (recently) free'd
==27114== 
==27114== Can't extend stack to 0x402a138 during signal delivery for thread 2:
==27114==   no stack segment
==27114== 
==27114== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==27114==  Access not within mapped region at address 0x402A138
==27114==    at 0x6EC23D0: ???
==27114==    by 0x524B1B7: __nptl_deallocate_tsd.part.5 (in /usr/lib/libpthread-2.26.so)
==27114==    by 0x524C1DC: start_thread (in /usr/lib/libpthread-2.26.so)
==27114==    by 0x577042E: clone (in /usr/lib/libc-2.26.so)
==27114==  If you believe this happened as a result of a stack
==27114==  overflow in your program's main thread (unlikely but
==27114==  possible), you can try to increase the size of the
==27114==  main thread stack using the --main-stacksize= flag.
==27114==  The main thread stack size used in this run was 8388608.
==27114== Invalid write of size 8
==27114==    at 0x4A27630: _vgnU_freeres (in /usr/lib/valgrind/vgpreload_core-amd64-linux.so)
==27114==  Address 0x402aff8 is on thread 2's stack
==27114== 
==27114== 
==27114== Process terminating with default action of signal 11 (SIGSEGV)
==27114==  Access not within mapped region at address 0x402AFF8
==27114==    at 0x4A27630: _vgnU_freeres (in /usr/lib/valgrind/vgpreload_core-amd64-linux.so)
==27114==  If you believe this happened as a result of a stack
==27114==  overflow in your program's main thread (unlikely but
==27114==  possible), you can try to increase the size of the
==27114==  main thread stack using the --main-stacksize= flag.
==27114== 
==27114== HEAP SUMMARY:
==27114==     in use at exit: 384 bytes in 4 blocks
==27114==   total heap usage: 33 allocs, 29 frees, 9,316 bytes allocated
==27114== 
==27114== LEAK SUMMARY:
==27114==    definitely lost: 0 bytes in 0 blocks
==27114==    indirectly lost: 0 bytes in 0 blocks
==27114==      possibly lost: 288 bytes in 1 blocks
==27114==    still reachable: 96 bytes in 3 blocks
==27114==         suppressed: 0 bytes in 0 blocks
==27114== Rerun with --leak-check=full to see details of leaked memory
==27114== 
==27114== For counts of detected and suppressed errors, rerun with: -v
==27114== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
[1]    27114 segmentation fault (core dumped)  valgrind target/debug/dylib_crash

The difference here is that the segfault happens when a (non-main) thread exits, if there has been a dylib dropped in that thread, even if the lib doesn't contain anything (source-wise). mem::forget "works", but interestingly so does setting the lib's crate-type to cdylib. What are the differences between dylib and cdylib in regards to TLS destructors, at least with an empty rust library?

Various things can be found be searching for __nptl_deallocate_tsd but this seems above my pay grade. The most promising thing was this comment and the patch after it: https://bugzilla.redhat.com/show_bug.cgi?id=1065695#c12

Info:

$ rustc -V
rustc 1.23.0 (766bd11c8 2018-01-01)

$ uname -srv 
Linux 4.14.11-1-ARCH #1 SMP PREEMPT Wed Jan 3 07:02:42 UTC 2018

Should I make a separate issue?

ubolonton · 2018-02-26T11:58:22Z

This seems to have been fixed by some dyld changes in High Sierra.
Looks like dyld now marks a dylib as "never unload" if it has MH_HAS_TLV_DESCRIPTORS flag in the header.

On the other hands, this means that Rust dylibs will never be unloaded on OS X?

scottjmaddox · 2018-02-26T18:30:35Z

Are we 100% sure the issue is with TLS? I've got TLS working without segfault through a dlclose (and reopen) on macOS Sierra (10.12.6), but only if the dylib statically links to libstd. If the dylib is dynamically linked to libstd, then I get a segfault on dlclose regardless of whether or not I'm using TLS.

This is on stable rustc 1.24.0.

This is good enough for my use case (code hot reloading during development), but it would be nice to remove the requirement to statically link libstd into the dylib.

mitsuhiko · 2018-02-26T18:43:46Z

@scottjmaddox this particular crash is from what's registered to _tlv_atexit which is exclusively used by thread locals. If you have a different crash that might be interesting as well.

nanotech · 2018-02-26T23:49:16Z

WWDC 2017's Session 413 @ 29:36 mentions that using TLS prevents a dylib from being unloaded:

There are also a number of features on our platforms that prevent dylibs from unloading, and I'd like to go through a few of those because maybe you do them.

You can have Objective-C classes in your dylib. That will make it not unloadable.

You could have Swift classes. That will also make it not unloadable.

And you can have C __thread or C++ thread local variables, all of which make it impossible to unload a dylib.

So on macOS, where there's a number of existing Unix apps, obviously we will keep this working, but because almost every dylib on all of our other platforms does one of these things, effectively it hasn't really worked on any of them ever.

So we are considering making it just a straight up no-op, that will not do anything on any of those platforms. If there's a reason why that's a problem, please, we want to hear about it.

nagisa · 2018-03-28T07:15:05Z

@alexcrichton seems like MacOS people have fixed this. Should we close?

alexcrichton · 2018-03-28T14:21:54Z

Sure!

The bug was fixed, no need for the note now: rust-lang/rust#28794

alexcrichton added the O-macos Operating system: macOS label Oct 1, 2015

MasonRemaley mentioned this issue Feb 23, 2016

TLS destructors are not run on Library::drop resulting in illegal instruction on OS X nagisa/rust_libloading#5

Open

emoon mentioned this issue Feb 23, 2016

ProDBG crashes on exit on Mac emoon/ProDBG#102

Closed

emoon added a commit to emoon/ProDBG that referenced this issue Jun 17, 2016

Fixed reloading of plugins

a3956b1

ProDBG crashes on exit but that is due to rust-lang/rust#28794 Closes #177

alexcrichton mentioned this issue Dec 14, 2016

Dynamic library, OSX, Segmentation fault #38370

Closed

Mark-Simulacrum added the C-bug Category: This is a bug. label Jul 24, 2017

alexcrichton added the A-thread-locals Area: Thread local storage (TLS) label Aug 25, 2017

joshlf mentioned this issue Nov 14, 2017

Test dlopen for dynamically-loaded elfmalloc ezrosent/allocators-rs#102

Open

aroden-crowdstrike added a commit to aroden-crowdstrike/rure-python that referenced this issue Dec 4, 2017

Fixes OSX segfault on unload

066954a

* Refs rust-lang/rust#28794

aroden-crowdstrike mentioned this issue Dec 4, 2017

Fixes OSX segfault on unload davidblewett/rure-python#11

Merged

nagisa mentioned this issue Feb 3, 2018

dlclose() does not behave properly on Mac #47974

Open

emoon mentioned this issue Feb 19, 2018

Not working on mac emoon/dynamic_reload#13

Closed

alexcrichton closed this as completed Mar 28, 2018

nagisa mentioned this issue Jul 8, 2018

Unloading Rust-made SOs can lead to segfaults in host program #52138

Closed

This was referenced Sep 22, 2018

SEGFAULT sometimes occurs when testing libloading code nagisa/rust_libloading#41

Open

Thread safety issue when loading dynamic modules on Linux solana-labs/solana#1314

Closed

mmastrac mentioned this issue Dec 30, 2018

Figure out and document how registration works when plugins are loaded dynamically by dlopen dtolnay/inventory#1

Open

zicklag added a commit to zicklag/rust-dlopen that referenced this issue May 25, 2019

Remove Outdated Note About Mac Bug

2428dd5

The bug was fixed, no need for the note now: rust-lang/rust#28794

kommen mentioned this issue Sep 7, 2019

macOS live reloading documentation outdated ubolonton/emacs-module-rs#22

Closed

kinke mentioned this issue Oct 25, 2019

macOS: v10.13+ doesn't unload .dylibs with TLS ldc-developers/ldc#3002

Open

follower mentioned this issue Dec 16, 2021

Segmentation fault when thread using dynamically loaded Rust library exits #91979

Open

matthiasblaesing mentioned this issue Jan 12, 2023

com.sun.jna.NativeLibraryTest fails on macOS 12 (Monterey) java-native-access/jna#1423

Closed

MaulingMonkey mentioned this issue Dec 5, 2023

Question regarding loading libraries on Unix systems MaulingMonkey/minidl#8

Closed

ThiagoIze mentioned this issue Apr 15, 2024

Remove boost thread_specific_ptr AcademySoftwareFoundation/OpenImageIO#4221

Merged

5 tasks

psionic-k mentioned this issue Jul 4, 2024

What are the pitfalls of reloading? ubolonton/emacs-module-rs#57

Open

bjorn3 mentioned this issue Jul 5, 2024

std::process::exit is not thread-safe in combination with C code calling exit #126600

Open

sunnycase mentioned this issue Nov 14, 2024

Refactor CPU module kendryte/nncase#1268

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unloading a Rust dylib with TLS used segfaults on OSX #28794

Unloading a Rust dylib with TLS used segfaults on OSX #28794

alexcrichton commented Oct 1, 2015

ranma42 commented Oct 2, 2015

emoon commented Feb 23, 2016

emoon commented Mar 5, 2016

noeleont commented Sep 1, 2016

solarretrace commented Dec 22, 2016

nagisa commented Dec 22, 2016

solarretrace commented Dec 23, 2016 •

edited

Loading

Mark-Simulacrum commented May 18, 2017

aidanhs commented Nov 14, 2017

mitsuhiko commented Nov 14, 2017

BurntPizza commented Feb 3, 2018 •

edited

Loading

ubolonton commented Feb 26, 2018

scottjmaddox commented Feb 26, 2018

mitsuhiko commented Feb 26, 2018

nanotech commented Feb 26, 2018

nagisa commented Mar 28, 2018

alexcrichton commented Mar 28, 2018

Unloading a Rust dylib with TLS used segfaults on OSX #28794

Unloading a Rust dylib with TLS used segfaults on OSX #28794

Comments

alexcrichton commented Oct 1, 2015

ranma42 commented Oct 2, 2015

emoon commented Feb 23, 2016

emoon commented Mar 5, 2016

noeleont commented Sep 1, 2016

solarretrace commented Dec 22, 2016

nagisa commented Dec 22, 2016

solarretrace commented Dec 23, 2016 • edited Loading

Mark-Simulacrum commented May 18, 2017

aidanhs commented Nov 14, 2017

mitsuhiko commented Nov 14, 2017

BurntPizza commented Feb 3, 2018 • edited Loading

ubolonton commented Feb 26, 2018

scottjmaddox commented Feb 26, 2018

mitsuhiko commented Feb 26, 2018

nanotech commented Feb 26, 2018

nagisa commented Mar 28, 2018

alexcrichton commented Mar 28, 2018

solarretrace commented Dec 23, 2016 •

edited

Loading

BurntPizza commented Feb 3, 2018 •

edited

Loading