-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
jax/tests/lax_scipy_sparse_test.py segfaults on GPU; other GPU test failures #5713
Comments
I built
This is probably an instance of OpenBlas (used by NumPy) misbehaving in a process that also |
OpenMathLib/OpenBLAS#3111 should fix the underlying OpenBLAS problem, I think. I can't confirm that 100% because I was unable to reproduce the original issue with a self-built OpenBLAS, only the one that is bundled with NumPy. However, since it will take some time for any OpenBLAS fix to make it into a NumPy release and for that fix to make it to users, I'll also look into avoiding calling |
With an upcoming fix to TensorFlow to avoid calling
The former is related to NumPy 1.20 on my machine and unrelated to GPU specifically. The latter I am unsure: it doesn't appear when I run that one file in isolation. So I'm guessing it must have something to do with |
…/execvp() on non-Android POSIX platforms. The goal of this change is to avoid calling pthread_atfork() handlers. Some libraries, in particular the version of OpenBLAS included in NumPy, have buggy pthread_atfork() handlers. See OpenMathLib/OpenBLAS#3111 and jax-ml/jax#5713 for details. Now, while we can and have fixed the buggy atfork handlers, it will take some time for the fix to be deployed in a NumPy release and for users to update to a new NumPy release. So we also take an additional step: avoid running atfork handlers in Subprocess. My copy of the glibc documentation says: " According to POSIX, it unspecified whether fork handlers established with pthread_atfork(3) are called when posix_spawn() is invoked. On glibc, fork handlers are called only if the child is created using fork(2). " It appears glibc 2.24 and newer do not call pthread_atfork() handlers from posix_spawn(). Using posix_spawn() should be at least no worse than an explicit fork()/execvp() pair, and on glibc it should do the right thing. PiperOrigin-RevId: 358317859 Change-Id: Ic1d95446706efa7c0db4e79bf8281f14b2bd99df
The
|
I think all the issues identified here are already fixed at head. |
I'm unable to run all unit tests with
jaxlib==0.1.60+cuda111
. I suspect this is an issue for all GPU builds.Looks like there are other test failures too, but they didn't print due to the segfault.
cc @hawkinsp
The text was updated successfully, but these errors were encountered: