-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AVX2 STRSM kernels #2516
AVX2 STRSM kernels #2516
Conversation
Thanks for accepting my code. The kernels passed basic tests using the program in |
This appears to have caused a segfault in the scipy testsuite unfortunately conda-forge/scipy-feedstock#130 although it passed all the usual BLAS and LAPACK tests |
A simpler reproduction
|
Can we revert this PR until it is fixed? |
Not reproduced with the simple reproducer so far (but using python 3.6.10 rather than 3.7 - is that expected to make a difference ?) |
@martin-frbg which architecture are you using. We have collected a few more informations here: conda-forge/openblas-feedstock#101 Its also reproducible on travis for me. |
SKX with OpenBLAS built for Haswell, and Kaby Lake (but using a Virtualbox VM) so far. (Valgrind on the latter also unhelpful, lots of screaming about python and numpy but no backtrace leading into OpenBLAS). Would love to see the source line in the assembly where it crashes to get some kind of handle on this problem. |
Still cannot reproduce this on i7-8700K either :-( |
Just saw one crash now, but only with a DYNAMIC_ARCH build and the valgrind log started with a free of an unallocated region by/in the python parser. This was then followed by a segfault caused by "bad permissions for mapped region" in the DGEMM ONCOPY kernel. (Which would suggest it has nothing to do with this particular PR) |
Created issue #2728 to track the numpy/scipy segfaults as this PR is most likely unrelated |
Thanks for checking this! If you have anything I can test please let me know. If we run the same code with openblas 0.3.9 it works for us. Thats why we thought its an openblas issue. |
Also, turn gfortran-diff into proper patch & put patches into folder. Remove now-unnecessary patch to revert OpenMathLib/OpenBLAS/pull/2516.
The performances of AVX2 STRSM on HSW-EP & SKX CPUs are comparable to MKL2019 now.