Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

argon2: optimize with AVX2 SIMD #440

Merged
merged 12 commits into from
Jul 12, 2023
Merged

argon2: optimize with AVX2 SIMD #440

merged 12 commits into from
Jul 12, 2023

Conversation

dyc3
Copy link
Contributor

@dyc3 dyc3 commented Jul 5, 2023

I've tried my hardest to get this to work, but I think I'm in a little over my head. The most I've managed to do is port the macros from the optimized reference version.

My main problem is that I have no idea how these indexes into state work:

https://github.com/P-H-C/phc-winner-argon2/blob/92cd2e1/src/opt.c#L94-L102

If someone could give me a hand, I'd be happy to fix/finish this. This is the first time I've ever tried to do stuff with SIMD, so bear with me as I'm still learning.

related to #104

@dyc3
Copy link
Contributor Author

dyc3 commented Jul 9, 2023

Ok, I was able to verify that I ported all the macros correctly (at least as far as I could tell), and I fixed some stuff. I'm still not entirely sure where I'm going wrong here, as the tests are still failing.

@dyc3
Copy link
Contributor Author

dyc3 commented Jul 10, 2023

I found out there was a way to tell the compiler to generate code based on a target feature being enabled with the #[target_feature(enable = "avx2")] attribute. With this method, it likely be safe to do the same thing with avx512, but I don't have a CPU that supports it to make sure. It also completely sidesteps the code gen problems I was having before (demonstrated in https://godbolt.org/z/TM94EbjKf).

https://godbolt.org/z/843z7eGYb demonstrates the code gen using this method, which is implemented in 70e34de.

Running benchmarks on my Ryzen 9 3900X, it shows about a 20-30% performance improvement. Would this solution be acceptable? I can clean up the commit history here if it matters.

@dyc3 dyc3 marked this pull request as ready for review July 10, 2023 12:51
argon2/src/block.rs Outdated Show resolved Hide resolved
argon2/src/block.rs Outdated Show resolved Hide resolved
argon2/Cargo.toml Outdated Show resolved Hide resolved
argon2/src/block.rs Outdated Show resolved Hide resolved
argon2/src/block.rs Outdated Show resolved Hide resolved
argon2/src/block.rs Outdated Show resolved Hide resolved
argon2/src/block.rs Outdated Show resolved Hide resolved
argon2/src/block.rs Outdated Show resolved Hide resolved
@newpavlov
Copy link
Member

So the compress function implementations are identical and the only difference is #[target_feature(enable = "avx2")], correct? I don't think we need to duplicate the function implementations. You could try to mark the compress function as #[inline(always)] and then introduce an AVX2 wrapper around it.

@dyc3 dyc3 changed the title WIP: argon2: optimize with AVX2 SIMD argon2: optimize with AVX2 SIMD Jul 10, 2023
Copy link
Member

@newpavlov newpavlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will wait for @tarcieri to give his feedback on the compress comment before merging.

@tarcieri tarcieri merged commit 19e0cdf into RustCrypto:master Jul 12, 2023
tarcieri added a commit that referenced this pull request Aug 7, 2023
This reverts commit 5e3f574.

These changes are incompatible with #440, which performs runtime CPU
feature detection.
tarcieri added a commit that referenced this pull request Aug 7, 2023
This reverts commit 5e3f574.

These changes are incompatible with #440, which performs runtime CPU
feature detection.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants