Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AVX2 STRSM kernels #2516

Merged
merged 2 commits into from
Mar 16, 2020
Merged

AVX2 STRSM kernels #2516

merged 2 commits into from
Mar 16, 2020

Conversation

wjc404
Copy link
Contributor

@wjc404 wjc404 commented Mar 16, 2020

The performances of AVX2 STRSM on HSW-EP & SKX CPUs are comparable to MKL2019 now.

@martin-frbg martin-frbg added this to the 0.3.10 milestone Mar 16, 2020
@martin-frbg martin-frbg merged commit a1b181c into OpenMathLib:develop Mar 16, 2020
@wjc404
Copy link
Contributor Author

wjc404 commented Mar 17, 2020

Thanks for accepting my code. The kernels passed basic tests using the program in
strsm_kernel_tester.tar.gz

@martin-frbg
Copy link
Collaborator

This appears to have caused a segfault in the scipy testsuite unfortunately conda-forge/scipy-feedstock#130 although it passed all the usual BLAS and LAPACK tests

@isuruf
Copy link
Contributor

isuruf commented Jul 18, 2020

A simpler reproduction

$ python
Python 3.7.6 | packaged by conda-forge | (default, Jun 1 2020, 18:57:50)
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

import numpy as np
t = np.array([[1, 0.5], [-1, 0.5]])
matrix_a = np.dot([(1,2) for i in range(1000000)], t)

@bgruening
Copy link

Can we revert this PR until it is fixed?

@martin-frbg
Copy link
Collaborator

martin-frbg commented Jul 20, 2020

Not reproduced with the simple reproducer so far (but using python 3.6.10 rather than 3.7 - is that expected to make a difference ?)

@bgruening
Copy link

@martin-frbg which architecture are you using. We have collected a few more informations here: conda-forge/openblas-feedstock#101

Its also reproducible on travis for me.

@martin-frbg
Copy link
Collaborator

SKX with OpenBLAS built for Haswell, and Kaby Lake (but using a Virtualbox VM) so far. (Valgrind on the latter also unhelpful, lots of screaming about python and numpy but no backtrace leading into OpenBLAS). Would love to see the source line in the assembly where it crashes to get some kind of handle on this problem.

@martin-frbg
Copy link
Collaborator

Still cannot reproduce this on i7-8700K either :-(
Does your default conda build use any other compilation options in addition to what was mentioned in conda-forge/openblas-feedstock#101 ?

@martin-frbg
Copy link
Collaborator

Just saw one crash now, but only with a DYNAMIC_ARCH build and the valgrind log started with a free of an unallocated region by/in the python parser. This was then followed by a segfault caused by "bad permissions for mapped region" in the DGEMM ONCOPY kernel. (Which would suggest it has nothing to do with this particular PR)

@martin-frbg
Copy link
Collaborator

Created issue #2728 to track the numpy/scipy segfaults as this PR is most likely unrelated

@bgruening
Copy link

Thanks for checking this! If you have anything I can test please let me know. If we run the same code with openblas 0.3.9 it works for us. Thats why we thought its an openblas issue.

h-vetinari added a commit to h-vetinari/openblas-feedstock that referenced this pull request Jul 24, 2020
Also, turn gfortran-diff into proper patch & put patches into folder.
Remove now-unnecessary patch to revert OpenMathLib/OpenBLAS/pull/2516.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants