-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
argon2: optimize with AVX2 SIMD #440
Conversation
Ok, I was able to verify that I ported all the macros correctly (at least as far as I could tell), and I fixed some stuff. I'm still not entirely sure where I'm going wrong here, as the tests are still failing. |
I found out there was a way to tell the compiler to generate code based on a target feature being enabled with the https://godbolt.org/z/843z7eGYb demonstrates the code gen using this method, which is implemented in 70e34de. Running benchmarks on my Ryzen 9 3900X, it shows about a 20-30% performance improvement. Would this solution be acceptable? I can clean up the commit history here if it matters. |
So the compress function implementations are identical and the only difference is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will wait for @tarcieri to give his feedback on the compress
comment before merging.
I've tried my hardest to get this to work, but I think I'm in a little over my head. The most I've managed to do is port the macros from the optimized reference version.
My main problem is that I have no idea how these indexes intostate
work:https://github.com/P-H-C/phc-winner-argon2/blob/92cd2e1/src/opt.c#L94-L102If someone could give me a hand, I'd be happy to fix/finish this. This is the first time I've ever tried to do stuff with SIMD, so bear with me as I'm still learning.
related to #104