-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use generic C for (D/Z)NRM2 on Windows x86_64 #2882
Conversation
Is there a reason to prefer the |
Performance ? This is nothing but a workaround for the convenience of users of a particularly buggy patchlevel of Windows, I do not see why it should be forced on other uses cases that do not need it. There is no reason to expect the plain (and simple) "generic" C version to suddenly outperform the .S across several generations of hardware and compilers, and I lack the time to benchmark it now. If anything, a "modern" replacement would probably need to be coded with SSE or AVX (the present nrm2_sse.S does not handle double precision) but I am not an assembler wizard who could whip this up overnight. |
Thanks for the explanation. I agree that this whole issue has been quite a time sink, and appreciate your work in trying to fix what should be fixed elsewhere. |
I am still getting failures with this and NumPy. I guess we still haven't worked out what is causing these:
|
My assumption was that numpy would be unlikely to use dsum/zsum as these are BLAS extensions |
Not sure I follow. In KERNEL.generic DSUM, ZSUM are already the c-variants. Is there some way I can test the hypothesis that these kernels are involved? |
KERNEL.generic will only be used when building with TARGET=GENERIC, or when using MSVC. For the mingw builds, KERNEL (plus the respective KERNEL.target) are relevant. |
#2887 adds these to kernel/x86_64/KERNEL (the initial defaults to override are given in kernel/Makefile.L1) |
to work around recent fpu exception bug in Win10 since build 19041
tentative fix for #2709