-
Notifications
You must be signed in to change notification settings - Fork 257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SIGSEGV in phtread_exit after switching from gnustl_static to libc++_static #360
Comments
do you have a reproduceable example we can actually build? a stack trace of a real crash? |
Here is stack trace:
Unfortunately I have no reproducable example yet, there are tons of proprietary code with many dependencies (like boost libs).. The only definite fact that it works normally with gnustl. Another thing may be important, inside jni call the rest of code is loaded on-demand by dlopen (libcore.so). So we have a pair of dlopen/dlclose inside the call. (preventing keeping in memory huge lib when it is not actually needed) And std::string& argument is passing through libjni.so -> libcore.so |
Here is complete stack trace (whithout ndk-trace shortening):
|
What is the meaning of 'Abort message' here? Looks like it's a string from our logging engine... |
Both libjni and libcore are your libraries, right? You don't mean libjavacore.so from the system? If you're passing STL objects between your code and the system you're gonna have a bad time since there isn't a stable C++ ABI for the platform. |
Sorry for unclear naming. Yes, libjni and libcore have nothing to system libraries. And passing STL objects only inside my code, not to the system. I'll try to make an example project when I have time... |
Thanks, I'll take a look once we have a test case. Unfortunately not much I can do until then. |
Here is a test case: I've played some time to get the problem within the above test case, but failed to reproduce all the same. But I've faced very similar problem concerning both libc++ and gnustl. When I introduced thread_local object, got the same behavior as I described originally. But original code does not contain any explicit thread_locals (though thread locals might be there since the code is multithreaded and pthread is widely used in boost, mutexes, lockers, semaphores, etc..). |
I am facing the same problems with thread_local variables with both libc++ and gnustl (ndk r15). |
Unfortunately, it breaks even C++11 support (when thread_local appeared), currently in the middle of 2017... Is everything clear with the issue and you (NDK Developers) can reproduce it? |
Sorry, haven't had a chance to look back at this yet, but yes, now that we have source there should be something actionable here. I'll let you know if for some reason it doesn't repro. |
So, for one, the platform's implementation of foo_impl.cpp, creates shared library #include <stdio.h>
class Foo {
public:
~Foo() {
fprintf(stderr, "dtor\n");
}
};
extern "C" void myfunc() {
thread_local Foo test;
} foo.cpp, creates executable #include <dlfcn.h>
#include <stdio.h>
int main(int, char**) {
void* lib = dlopen("libfoo_impl.so", RTLD_LAZY);
auto myfunc = reinterpret_cast<void (*)()>(dlsym(lib, "myfunc"));
myfunc();
dlclose(lib);
return 0;
} This doesn't print "dtor" unless I remove the If I modify libc++abi to never use the platform's I'm going to file a platform bug for the first issue, and I'll work on a fix for libc++abi. Going to try to hit this for r17 since the plan is for gnustl to be removed in r18. btw, from my experimentation none of these problems exist if you don't call |
Looks like this is broken for gnustl_shared as well, fwiw. Their builtin implementation has the same problem that libc++ does: the destructor is in a static object in libgnustl_shared.so, not in the library that actually has the |
Also, never mind, the platform also crashes. The device I'm using just has a broken adbd that isn't reporting its exit status and was returning before the segfault message was sent to the host. According to the glibc docs (it seems Another workaround you could use is adding |
Thanks for your investigation! |
The fix will only be able to fix devices running P or newer.
|
Thank you for comprehensive explanation! |
Definitely. Thank you for the reminder. |
Bug: android/ndk#360 Test: N/A Change-Id: I964a6c9abd1ae65d74c6aee1c842b2f3d24b5556
Introduce new flag to mark soinfo as TLS_NODELETE when there are thread_local dtors associated with dso_handle belonging to it. Test: bionic-unit-tests --gtest_filter=dl* Test: bionic-unit-tests-glibc --gtest_filter=dl* Bug: android/ndk#360 Change-Id: I724ef89fc899788f95c47e6372c38b3313f18fed
Thanks, @dimitry-! |
Bug: android/ndk#360 Test: N/A Change-Id: Ib9807045bd3206fa3cd300ab70ebed93c73a58e4
In my tests, a .so statically linked with tensorflow v1.6 (NDK r15c, gcc, gnustl_static, armeabi-v7a) also breaks down on native Android 5.1.1 system. In fact, it breaks down on both 5.1.1, 6.0.1 and 7.1.1 ( Added: on Android6.1.1, |
@gdh1995: This bug was about libc++, not gnustl. gnustl did not have the issue discussed in the bug, and is no longer supported. |
Test: None, markdown only Bug: android/ndk#360 Change-Id: I1b1ecce8f988c9d4949079f0296518c0604e82ec
NDK doesn't support thread_local well, so on Android it should use __thread instead. Google Protocol Buffer and other libraries have kept away from thread_local on Android. In Tensorflow, there's a "thread_local" in code about CUDA, which should be safe enough. More discussions are on android/ndk#360 .
NDK doesn't support thread_local variables which require destructors, so on Android it should use __thread instead. Observations: * ProtoBuf and other libraries are not using thread_local on Android. * In Tensorflow, there's a "thread_local" in code about CUDA, which should be safe enough. More discussions are on android/ndk#360 .
Test: bionic unit tests Bug: android/ndk#360 Change-Id: I733fb303379828c0ac83b1b64f4cb85878e88909
Bug: android/ndk#360 Test: N/A Change-Id: I964a6c9abd1ae65d74c6aee1c842b2f3d24b5556
Mark library NODELETE once first pthread_atexit handler is registered. According to the test runs it seems that glibc loader agrees on the fact that it is too dangerous to unload a library on pthread_exit(). It only does it on dlclose if there are no active thread local dtors left associated with the dso. (At this point we just have tests to demonstrate how it works) Test: bionic-unit-tests --gtest_filter=dl* Test: bionic-unit-tests-glibc --gtest_filter=dl* Bug: android/ndk#360 Change-Id: I724ef89fc899788f95c47e6372c38b3313f18fed
Mark library NODELETE once first pthread_atexit handler is registered. According to the test runs it seems that glibc loader agrees on the fact that it is too dangerous to unload a library on pthread_exit(). It only does it on dlclose if there are no active thread local dtors left associated with the dso. (At this point we just have tests to demonstrate how it works) Test: bionic-unit-tests --gtest_filter=dl* Test: bionic-unit-tests-glibc --gtest_filter=dl* Bug: android/ndk#360 Change-Id: I724ef89fc899788f95c47e6372c38b3313f18fed
Mark library NODELETE once first pthread_atexit handler is registered. According to the test runs it seems that glibc loader agrees on the fact that it is too dangerous to unload a library on pthread_exit(). It only does it on dlclose if there are no active thread local dtors left associated with the dso. (At this point we just have tests to demonstrate how it works) Test: bionic-unit-tests --gtest_filter=dl* Test: bionic-unit-tests-glibc --gtest_filter=dl* Bug: android/ndk#360 Change-Id: I724ef89fc899788f95c47e6372c38b3313f18fed
Mark library NODELETE once first pthread_atexit handler is registered. According to the test runs it seems that glibc loader agrees on the fact that it is too dangerous to unload a library on pthread_exit(). It only does it on dlclose if there are no active thread local dtors left associated with the dso. (At this point we just have tests to demonstrate how it works) Test: bionic-unit-tests --gtest_filter=dl* Test: bionic-unit-tests-glibc --gtest_filter=dl* Bug: android/ndk#360 Change-Id: I724ef89fc899788f95c47e6372c38b3313f18fed
Introduce new flag to mark soinfo as TLS_NODELETE when there are thread_local dtors associated with dso_handle belonging to it. Test: bionic-unit-tests --gtest_filter=dl* Test: bionic-unit-tests-glibc --gtest_filter=dl* Bug: android/ndk#360 Change-Id: I724ef89fc899788f95c47e6372c38b3313f18fed
Mark library NODELETE once first pthread_atexit handler is registered. According to the test runs it seems that glibc loader agrees on the fact that it is too dangerous to unload a library on pthread_exit(). It only does it on dlclose if there are no active thread local dtors left associated with the dso. (At this point we just have tests to demonstrate how it works) Test: bionic-unit-tests --gtest_filter=dl* Test: bionic-unit-tests-glibc --gtest_filter=dl* Bug: android/ndk#360 Change-Id: I724ef89fc899788f95c47e6372c38b3313f18fed
Introduce new flag to mark soinfo as TLS_NODELETE when there are thread_local dtors associated with dso_handle belonging to it. Test: bionic-unit-tests --gtest_filter=dl* Test: bionic-unit-tests-glibc --gtest_filter=dl* Bug: android/ndk#360 Change-Id: I724ef89fc899788f95c47e6372c38b3313f18fed
Description
We have a project which is for a long time compiled gnustl_static in c++ code.
Jni tasks are running async in separate thread like this (simplified for readability):
So, everything is fine when all code is linked with gnustl_static, but when I've tried libc++_static then I got SIGSEGV when finishing java thread (actual stacktrace points to releasing threadlocal data in pthread_exit).
Can anyone please help me figuring out what is wrong with libc++?
Environment Details
The text was updated successfully, but these errors were encountered: