argon2: optimize with AVX2 SIMD #440

dyc3 · 2023-07-05T21:19:01Z

I've tried my hardest to get this to work, but I think I'm in a little over my head. The most I've managed to do is port the macros from the optimized reference version.

~~My main problem is that I have no idea how these indexes into state work:~~

~~https://github.com/P-H-C/phc-winner-argon2/blob/92cd2e1/src/opt.c#L94-L102~~

If someone could give me a hand, I'd be happy to fix/finish this. This is the first time I've ever tried to do stuff with SIMD, so bear with me as I'm still learning.

related to #104

dyc3 · 2023-07-09T16:33:22Z

Ok, I was able to verify that I ported all the macros correctly (at least as far as I could tell), and I fixed some stuff. I'm still not entirely sure where I'm going wrong here, as the tests are still failing.

dyc3 · 2023-07-10T12:03:36Z

I found out there was a way to tell the compiler to generate code based on a target feature being enabled with the #[target_feature(enable = "avx2")] attribute. With this method, it likely be safe to do the same thing with avx512, but I don't have a CPU that supports it to make sure. It also completely sidesteps the code gen problems I was having before (demonstrated in https://godbolt.org/z/TM94EbjKf).

https://godbolt.org/z/843z7eGYb demonstrates the code gen using this method, which is implemented in 70e34de.

Running benchmarks on my Ryzen 9 3900X, it shows about a 20-30% performance improvement. Would this solution be acceptable? I can clean up the commit history here if it matters.

argon2/src/block.rs

argon2/Cargo.toml

argon2/src/block.rs

newpavlov · 2023-07-10T16:49:22Z

So the compress function implementations are identical and the only difference is #[target_feature(enable = "avx2")], correct? I don't think we need to duplicate the function implementations. You could try to mark the compress function as #[inline(always)] and then introduce an AVX2 wrapper around it.

argon2/src/block.rs

newpavlov

I will wait for @tarcieri to give his feedback on the compress comment before merging.

This reverts commit 5e3f574. These changes are incompatible with #440, which performs runtime CPU feature detection.

dyc3 added 2 commits July 5, 2023 17:04

add flamegraph to benches

dfefd6f

argon2: WIP: optimize with AVX2 SIMD

f0c0f03

dyc3 force-pushed the argon2-opt branch from 7c780fb to f0c0f03 Compare July 5, 2023 21:21

dyc3 added 4 commits July 9, 2023 10:38

add a unit test for comparing compress_safe and compress_av2

87320dd

misc clean up

04f9187

fix some blatently wrong math, whoops

3a681ef

roll up a for loop

814a367

dyc3 added 2 commits July 9, 2023 13:56

fix more wrong math

a7a6321

use an alternate method to get the compiler to emit simd code

70e34de

argon2: adjust which target_arch uses cpufeatures

7f5515a

dyc3 marked this pull request as ready for review July 10, 2023 12:51

tarcieri reviewed Jul 10, 2023

View reviewed changes

argon2/src/block.rs Outdated Show resolved Hide resolved

tarcieri reviewed Jul 10, 2023

View reviewed changes

argon2/src/block.rs Outdated Show resolved Hide resolved

newpavlov reviewed Jul 10, 2023

View reviewed changes

rename compress_safe to compress_soft

e204222

dyc3 changed the title ~~WIP: argon2: optimize with AVX2 SIMD~~ argon2: optimize with AVX2 SIMD Jul 10, 2023

dyc3 force-pushed the argon2-opt branch from 21c5bc6 to 55cd132 Compare July 10, 2023 17:28

fix minor requested code changes

3faaff1

dyc3 force-pushed the argon2-opt branch from 55cd132 to 3faaff1 Compare July 10, 2023 18:04

dyc3 requested review from tarcieri and newpavlov July 10, 2023 18:20

dyc3 force-pushed the argon2-opt branch from 22ead03 to a5dd345 Compare July 10, 2023 18:22

tarcieri reviewed Jul 10, 2023

View reviewed changes

argon2/src/block.rs Show resolved Hide resolved

refactor to have the argon2 struct hold InitToken

48f3c0a

dyc3 force-pushed the argon2-opt branch from a5dd345 to 48f3c0a Compare July 10, 2023 18:44

newpavlov approved these changes Jul 12, 2023

View reviewed changes

tarcieri merged commit 19e0cdf into RustCrypto:master Jul 12, 2023

This was referenced Jul 12, 2023

argon2: fold compress_avx2 into an inner function #444

Merged

argon2 v0.5.1 #445

Merged

tarcieri mentioned this pull request Jul 26, 2023

polyval: detect VPCLMULQDQ at runtime RustCrypto/universal-hashes#184

Open

tarcieri mentioned this pull request Aug 7, 2023

argon2: Add const compatibility #438

Merged

tarcieri added a commit that referenced this pull request Aug 7, 2023

Revert "argon2: improve const compatibility (#438)"

bb63e6c

This reverts commit 5e3f574. These changes are incompatible with #440, which performs runtime CPU feature detection.

tarcieri mentioned this pull request Aug 7, 2023

Revert "argon2: improve const compatibility (#438)" #447

Merged

tarcieri added a commit that referenced this pull request Aug 7, 2023

Revert "argon2: improve const compatibility (#438)" (#447)

e19aabd

This reverts commit 5e3f574. These changes are incompatible with #440, which performs runtime CPU feature detection.

C0D3-M4513R mentioned this pull request Aug 11, 2023

argon2: more const-ness #450

Merged

Palladinium mentioned this pull request Dec 26, 2023

argon2: optimized implementation #104

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

argon2: optimize with AVX2 SIMD #440

argon2: optimize with AVX2 SIMD #440

dyc3 commented Jul 5, 2023 •

edited

Loading

dyc3 commented Jul 9, 2023 •

edited

Loading

dyc3 commented Jul 10, 2023 •

edited

Loading

newpavlov commented Jul 10, 2023

newpavlov left a comment

argon2: optimize with AVX2 SIMD #440

argon2: optimize with AVX2 SIMD #440

Conversation

dyc3 commented Jul 5, 2023 • edited Loading

dyc3 commented Jul 9, 2023 • edited Loading

dyc3 commented Jul 10, 2023 • edited Loading

newpavlov commented Jul 10, 2023

newpavlov left a comment

Choose a reason for hiding this comment

dyc3 commented Jul 5, 2023 •

edited

Loading

dyc3 commented Jul 9, 2023 •

edited

Loading

dyc3 commented Jul 10, 2023 •

edited

Loading