Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sklearn/gaussian_process/tests/test_gpr.py:test_sample_statistics segfaults with libopenblas 0.3.10 #17980

Closed
ogrisel opened this issue Jul 24, 2020 · 13 comments

Comments

@ogrisel
Copy link
Member

ogrisel commented Jul 24, 2020

Steps to reproduce:

conda create -n cf -y -c conda-forge cython pillow numpy scipy pytest joblib threadpoolctl
conda activate cf
pip install -e . --no-build-isolation

Then:

$ pytest -vlk test_sample_statistics sklearn/gaussian_process
============================================================================================== test session starts ==============================================================================================
platform linux -- Python 3.8.5, pytest-5.4.3, py-1.9.0, pluggy-0.13.1 -- /home/ogrisel/miniconda3/envs/cf/bin/python
cachedir: .pytest_cache
rootdir: /home/ogrisel/code/scikit-learn, inifile: setup.cfg
collected 422 items / 416 deselected / 6 selected                                                                                                                                                               

sklearn/gaussian_process/tests/test_gpr.py::test_sample_statistics[kernel0] Fatal Python error: Segmentation fault

Current thread 0x00007fe741e14740 (most recent call first):
  File "<__array_function__ internals>", line 5 in dot
  File "/home/ogrisel/code/scikit-learn/sklearn/gaussian_process/_gpr.py", line 410 in sample_y
  File "/home/ogrisel/code/scikit-learn/sklearn/gaussian_process/tests/test_gpr.py", line 171 in test_sample_statistics
  File "/home/ogrisel/miniconda3/envs/cf/lib/python3.8/site-packages/_pytest/python.py", line 182 in pytest_pyfunc_call
Segmentation fault (core dumped)
$ conda list
# packages in environment at /home/ogrisel/miniconda3/envs/cf:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       0_gnu    conda-forge
attrs                     19.3.0                     py_0    conda-forge
ca-certificates           2020.6.20            hecda079_0    conda-forge
certifi                   2020.6.20        py38h32f6830_0    conda-forge
cython                    0.29.21          py38h950e882_0    conda-forge
freetype                  2.10.2               he06d7ca_0    conda-forge
joblib                    0.16.0                     py_0    conda-forge
jpeg                      9d                   h516909a_0    conda-forge
lcms2                     2.11                 hbd6801e_0    conda-forge
ld_impl_linux-64          2.34                 h53a641e_7    conda-forge
libblas                   3.8.0               17_openblas    conda-forge
libcblas                  3.8.0               17_openblas    conda-forge
libffi                    3.2.1             he1b5a44_1007    conda-forge
libgcc-ng                 9.2.0                h24d8f2e_2    conda-forge
libgfortran-ng            7.5.0                hdf63c60_6    conda-forge
libgomp                   9.2.0                h24d8f2e_2    conda-forge
liblapack                 3.8.0               17_openblas    conda-forge
libopenblas               0.3.10          pthreads_hb3c22a3_3    conda-forge
libpng                    1.6.37               hed695b0_1    conda-forge
libstdcxx-ng              9.2.0                hdf63c60_2    conda-forge
libtiff                   4.1.0                hc7e4089_6    conda-forge
libwebp-base              1.1.0                h516909a_3    conda-forge
lz4-c                     1.9.2                he1b5a44_1    conda-forge
more-itertools            8.4.0                      py_0    conda-forge
ncurses                   6.2                  he1b5a44_1    conda-forge
numpy                     1.19.1           py38h8854b6b_0    conda-forge
olefile                   0.46                       py_0    conda-forge
openssl                   1.1.1g               h516909a_0    conda-forge
packaging                 20.4               pyh9f0ad1d_0    conda-forge
pillow                    7.2.0            py38h9776b28_1    conda-forge
pip                       20.1.1                     py_1    conda-forge
pluggy                    0.13.1           py38h32f6830_2    conda-forge
py                        1.9.0              pyh9f0ad1d_0    conda-forge
pyparsing                 2.4.7              pyh9f0ad1d_0    conda-forge
pytest                    5.4.3            py38h32f6830_0    conda-forge
python                    3.8.5           h425cb1d_1_cpython    conda-forge
python_abi                3.8                      1_cp38    conda-forge
readline                  8.0                  he28a2e2_2    conda-forge
scikit-learn              0.24.dev0                 dev_0    <develop>
scipy                     1.5.2            py38h8c5af15_0    conda-forge
setuptools                49.2.0           py38h32f6830_0    conda-forge
six                       1.15.0             pyh9f0ad1d_0    conda-forge
sqlite                    3.32.3               hcee41ef_1    conda-forge
threadpoolctl             2.1.0              pyh5ca1d4c_0    conda-forge
tk                        8.6.10               hed695b0_0    conda-forge
wcwidth                   0.2.5              pyh9f0ad1d_0    conda-forge
wheel                     0.34.2                     py_1    conda-forge
xz                        5.2.5                h516909a_1    conda-forge
zlib                      1.2.11            h516909a_1006    conda-forge
zstd                      1.4.5                h6597ccf_1    conda-forge
@ogrisel ogrisel changed the title sklearn/gaussian_process/tests/test_gpr.py:test_sample_statistics segfaults with conda-forge dependencies sklearn/gaussian_process/tests/test_gpr.py:test_sample_statistics segfaults with conda-forge's libopenblas Jul 24, 2020
@ogrisel
Copy link
Member Author

ogrisel commented Jul 24, 2020

This can be fixed by switching the env to use MKL instead of OpenBLAS:

conda install -c conda-forge libblas=*=*mkl

@ogrisel
Copy link
Member Author

ogrisel commented Jul 24, 2020

I can reproduce the segfault with the main channel openblas 0.3.10:

conda create -n tmp -y  cython pillow numpy scipy pytest joblib threadpoolctl blas=*=*openblas
$ conda list
# packages in environment at /home/ogrisel/miniconda3/envs/tmp:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
attrs                     19.3.0                     py_0  
blas                      1.0                    openblas  
ca-certificates           2020.6.24                     0  
certifi                   2020.6.20                py38_0  
cython                    0.29.21          py38he6710b0_0  
freetype                  2.10.2               h5ab3b9f_0  
joblib                    0.16.0                     py_0  
jpeg                      9b                   h024ee3a_2  
lcms2                     2.11                 h396b838_0  
ld_impl_linux-64          2.33.1               h53a641e_7  
libedit                   3.1.20191231         h14c3975_1  
libffi                    3.3                  he6710b0_2  
libgcc-ng                 9.1.0                hdf63c60_0  
libgfortran-ng            7.3.0                hdf63c60_0  
libopenblas               0.3.10               h5a2b251_0  
libpng                    1.6.37               hbc83047_0  
libstdcxx-ng              9.1.0                hdf63c60_0  
libtiff                   4.1.0                h2733197_1  
lz4-c                     1.9.2                he6710b0_0  
more-itertools            8.4.0                      py_0  
ncurses                   6.2                  he6710b0_1  
numpy                     1.18.5           py38h7130bb8_0  
numpy-base                1.18.5           py38h2f8d375_0  
olefile                   0.46                       py_0  
openssl                   1.1.1g               h7b6447c_0  
packaging                 20.4                       py_0  
pillow                    7.2.0            py38hb39fc2d_0  
pip                       20.1.1                   py38_1  
pluggy                    0.13.1                   py38_0  
py                        1.9.0                      py_0  
pyparsing                 2.4.7                      py_0  
pytest                    5.4.3                    py38_0  
python                    3.8.3                hcff3b4d_2  
readline                  8.0                  h7b6447c_0  
scipy                     1.5.0            py38habc2bb6_0  
setuptools                49.2.0                   py38_0  
six                       1.15.0                     py_0  
sqlite                    3.32.3               h62c20be_0  
threadpoolctl             2.1.0              pyh5ca1d4c_0  
tk                        8.6.10               hbc83047_0  
wcwidth                   0.2.5                      py_0  
wheel                     0.34.2                   py38_0  
xz                        5.2.5                h7b6447c_0  
zlib                      1.2.11               h7b6447c_3  
zstd                      1.4.5                h0b5b093_0

@ogrisel
Copy link
Member Author

ogrisel commented Jul 24, 2020

Pinning openblas to 0.3.9 fixes the issue. I tested with conda-forge using this env:

conda create -n cf -y -c conda-forge cython pillow numpy scipy pytest joblib threadpoolctl libopenblas=0.3.9

@ogrisel ogrisel changed the title sklearn/gaussian_process/tests/test_gpr.py:test_sample_statistics segfaults with conda-forge's libopenblas sklearn/gaussian_process/tests/test_gpr.py:test_sample_statistics segfaults with conda-forge'slibopenblas Jul 24, 2020
@ogrisel ogrisel changed the title sklearn/gaussian_process/tests/test_gpr.py:test_sample_statistics segfaults with conda-forge'slibopenblas sklearn/gaussian_process/tests/test_gpr.py:test_sample_statistics segfaults with libopenblas 0.3.10 Jul 24, 2020
@thomasjpfan
Copy link
Member

thomasjpfan commented Jul 24, 2020

The segfault is happening in multivariable_normal:

import numpy as np

y_mean = np.ones((5))
y_cov = np.ones((5, 5))
rng = np.random.RandomState(0)

# segfaults
rng.multivariate_normal(y_mean, y_cov, 300000)

@ogrisel
Copy link
Member Author

ogrisel commented Jul 24, 2020

Thanks @thomasjpfan, I was trying to slowly narrow it down. Not sure how this is related to openblas. Will use a debugger to run step by step.

@thomasjpfan
Copy link
Member

In sklearn, the call to multivariable_normal is made here:

y_samples = rng.multivariate_normal(y_mean, y_cov, n_samples).T

@ogrisel
Copy link
Member Author

ogrisel commented Jul 24, 2020

I used you code snippet to get a backtrace:

$ gdb python
(gdb) r /tmp/debug.py
Starting program: /home/ogrisel/miniconda3/envs/cf/bin/python /tmp/debug.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff5335700 (LWP 229282)]
[New Thread 0x7ffff4b34700 (LWP 229283)]
[New Thread 0x7ffff2333700 (LWP 229284)]

Thread 2 "python" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff5335700 (LWP 229282)]
0x00007ffff633cac3 in dgemm_oncopy_HASWELL () from /home/ogrisel/miniconda3/envs/cf/lib/python3.8/site-packages/numpy/core/../../../../libcblas.so.3
(gdb) bt
#0  0x00007ffff633cac3 in dgemm_oncopy_HASWELL () from /home/ogrisel/miniconda3/envs/cf/lib/python3.8/site-packages/numpy/core/../../../../libcblas.so.3
#1  0x00007ffff56a736a in inner_thread () from /home/ogrisel/miniconda3/envs/cf/lib/python3.8/site-packages/numpy/core/../../../../libcblas.so.3
#2  0x00007ffff57d65dd in blas_thread_server () from /home/ogrisel/miniconda3/envs/cf/lib/python3.8/site-packages/numpy/core/../../../../libcblas.so.3
#3  0x00007ffff7f8d609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#4  0x00007ffff7eb4103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

@ogrisel
Copy link
Member Author

ogrisel commented Jul 24, 2020

So it's a double precision matrix matrix multiplication that's crashing...

@ogrisel
Copy link
Member Author

ogrisel commented Jul 24, 2020

If you disable openblas blas threads, the crash goes away (both for your script and the original test):

OPENBLAS_NUM_THREADS=1 pytest -vlk test_sample_statistics sklearn/gaussian_process

@thomasjpfan
Copy link
Member

Looks like this only segfaults when size is large enough:

import numpy as np

y_mean = np.ones((5))
y_cov = np.ones((5, 5))
rng = np.random.RandomState(0)

# segfaults
rng.multivariate_normal(y_mean, y_cov, size=249033)

# does not segfault
rng.multivariate_normal(y_mean, y_cov, size=249032)

I think we have enough context to raise an issue on the numpy issue tracker.

@ogrisel
Copy link
Member Author

ogrisel commented Jul 24, 2020

I simplified it down to:

import numpy as np

np.ones(shape=(300000, 5)) @ np.ones(shape=(5, 5))

@ogrisel
Copy link
Member Author

ogrisel commented Jul 24, 2020

I will try to write a minimal C program to report it to the OpenBLAS developers.

@ogrisel
Copy link
Member Author

ogrisel commented Jul 25, 2020

This was fixed upstream in OpenMathLib/OpenBLAS#2729 and the conda-forge package has already been updated with the fix. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants