blake2: integrate `blake2b_simd`/`blake2s_simd` crates #228

tarcieri · 2021-02-04T15:30:58Z

Integrates original sources from these crates, which provide AVX2-accelerated SIMD backends:

https://github.com/oconnor663/blake2_simd

Taken from this commit:

Hash: 7bf791e67245bb84132d1ee0e6a893bb8c85c093
Author: Jack O'Connor jack.oconnor@zoom.us
Date: Fri Nov 13 15:50:16 2020 -0500
Title: AES-CTR benchmarks

Next steps

Determine if we want to proceed with a combined blake2 crate or if we can get the blake2b crate and use blake2b/blake2s
Define Blake2b/Blake2s structs which provide impls of the digest traits. These can also be used to replace the blake2::blake2b::blake2b/blake2::blake2s::blake2s static functions e.g. with Blake2b::digest(...)
Also define VarBlake2b/VarBlake2s(?)
Address compile errors/panics on no_std targets
Figure out solution for handling KATs
~~Address clippy errors (presently TODOs) in the PR~~ (caused by arrayref, can circle back on that)

tarcieri · 2021-02-04T15:39:05Z

blake2/src/blake2b/guts.rs

+            Platform::AVX2 | Platform::SSE41 => unsafe {
+                sse41::compress2_loop(jobs, finalize, stride)
+            },
+            _ => panic!("unsupported"),


@oconnor663 noticing compile failures (well, more like warnings) for this and the other cases below when testing on a thumbv7em-none-eabi target (which is SOP for us).

Is there some sort of fallback on these platforms that will avoid this panic? If not, these should probably be changed to compile_error! (and separately, we should address the issue before merging if so)

@tarcieri are you sure you can trigger this panic in practice? If you look at how these functions are used in compress_many, imp.degree() is checked first, which should prevent this panic. In particular, I expect the Portable platform should get selected on thumbv7em-none-eabi, which should have a degree() of 1.

tarcieri · 2021-02-04T15:49:20Z

As discussed in #88, this PR merges the blake2b_simd and blake2s_simd crates into the blake2 crate, placing each algorithm under blake2b and blake2s modules respectively, and promoting the blake2bp and blake2sp modules to the toplevel of the crate.

Alternatively it might be possible to retain the two-crate structure, however unfortunately while we own the blake2s crate, we don't own the blake2b crate. I have contacted its current owner (as presently all releases are yanked so it's unused) to see if we can obtain it, in which case we can do a more straightforward import of blake2b_simd => blake2b and blake2s_simd => blake2s.

This PR is presently about 170 klocs(!) owing to the vendored C implementations in blake2/benches/. We should probably decide if we want to retain all that code or figure out a different solution. It'd be nice if there were *-sys crates of the C code sourced from elsewhere which we could use for comparison instead. (Edit: removed the benches for now. We can follow up on them separately)

tarcieri · 2021-02-04T15:50:11Z

Also note: I haven't tried to impl any of the digest traits. I'd like to sort out whether we'd like to proceed with a combined single blake2 crate with submodules for blake2b and blake2s or if we can get the blake2b crate first.

@newpavlov any thoughts on this?

tarcieri · 2021-02-04T16:04:55Z

Oof, the KATs are enormous:

$ ls -lh tests/blake2-kat.json
-rw-r--r--  1 bascule  staff   1.8M Feb  4 05:15 tests/blake2-kat.json

It seems like https://github.com/oconnor663/blake2_simd was able to keep them conveniently out-of-band by virtue of having a workspace and keeping it outside the crates themselves. ~~We should definitely do something similar to keep the crate size small.~~

Edit: moved them to .kat/blake2-kat.json in the toplevel of the repo. This unfortunately means that these tests will fail if anyone tries to test the packaged crate. Perhaps we could have e.g. a special cfg directive (e.g. --cfg kat) to enable these tests so they're ignored when testing from a released crate?

tarcieri · 2021-02-04T16:11:03Z

The good news: it compiles and all of the original tests are now passing! 🎉

I'll update the toplevel description with some next steps.

tarcieri · 2021-02-05T14:07:53Z

@oconnor663 @newpavlov so it turns out I was able to obtain ownership of the blake2b crate after all.

So the question is: should we proceed with a combined blake2 crate as this PR currently implements, or deprecate the blake2 crate and split it into blake2b and blake2s like blake2_simd did?

newpavlov · 2021-02-06T08:04:13Z

Assuming that amount of code reuse between b and s modules is minimal, I am fine with the crate split. But it could be worth to postpone it until digest v0.10 release.

Oof, the KATs are enormous:

This is why we have blobby.

tarcieri · 2021-02-06T14:11:34Z

Assuming that amount of code reuse between b and s modules is minimal...

When I was making this PR I did notice considerable overlap/duplication between the two crates. There are large swaths of the code that are completely identical. For example, the file many.rs is nearly identical, with some small changes to constants (e.g. 2/4/8), the blake2b vs blake2s names, and some small differences in comments.

I think there's quite a bit of opportunity to extract a common module shared between them which eliminates such duplication.

This is why we have blobby.

I did a quick ballparking exercise, ignoring all framing and just decoding and concatenating the hex strings of the 1.8MB blake2-kat.json file.

This still results in 827kB of binary data, which I would consider enormous and probably a bad idea to include in a crate that would otherwise be ~60kB otherwise.

tarcieri · 2021-02-06T15:16:44Z

But it could be worth to postpone it until digest v0.10 release.

I definitely think we should land #217 first and do another release of the current blake2 crate before moving forward with this.

oconnor663 · 2021-02-08T17:44:40Z

Yes there is quite a lot of duplicated code between 2b and 2s. I experimented a couple times with refactoring it to share more, but I wound up getting frustrated with having everything in a giant macro, and I gave up and decided copy-pasting was a better result (even as a maintainer, but especially for anyone who has to read the code). Some scattered thoughts:

There are some quirky differences in the way the parameter blocks are laid out between 2b and 2s, particularly the "node offset", which needs different code and not just different constants. This sort of thing is annoying to model in a shared macro.
I don't think it's possible to share SIMD code between 2b and 2s. Of course they use different intrinsics, though maybe that could be abstracted out. But more fundamentally, the word loading in the single-block compression functions is all optimized by hand (originally by Samuel upstream), and the two versions don't look similar.
In contrast, most of the code that might be easier to share is the relatively straightforward parts like buffer management and the public parameters API. I'm not thrilled to maintain two copies of this code, but so far I've preferred to do that rather than make the SIMD code any harder to reason about.
The many.rs code is in between. It's duplicated, but it's definitely tricky code that's not easy to reason about. That might be the highest-value opportunity for finding a way to share it. But on the other hand, again, the fact that it's so tricky makes me even less excited about making it a big macro.

Maybe there are some code sharing strategies besides "giant macro" that could work here, that I haven't thought about? Maybe some interesting possibilities open up when we can use const generics?

oconnor663 · 2021-02-08T17:57:59Z

Another scattered thought: The many APIs were super valuable to me back when I was working on BLAKE3 prototypes. Now that BLAKE3 is its own thing and doesn't depend on this code, it might be that the only existing callers of the many APIs are the BLAKE2bp and BLAKE2sp implementations also in these crates. Having a totally generic "hash many" capability is neat, but it might be that in practice no one is ever going to use it. It's possible that we could simplify the API, if it was just a private implementation detail of BLAKE2bp and BLAKE2sp. Something to think about, if many ends up getting in the way of code sharing.

newpavlov · 2021-02-09T10:33:10Z

@oconnor663
Thank you for your comments!

In digest v0.10 we have buffer management included behind a feature (see the core_api module). So implementation crates only have to define state initialization, blocks compression, and finalization parts.

One possible alternative to giant macros is to be generic over a private trait with necessary associated constants and types. Since Rust monomorphizes generics by default, it should be identical to the macro approach performance-wise. I have successfully refactored some macros using this approach (though const generics would have been a better fit in some cases), so it may apply here as well.

tarcieri · 2021-02-09T14:30:45Z

I'm weakly in favor of merging the crates for the following reasons:

It will provide the simplest transition path for existing users of the blake2 crate
It leaves the door open for DRYing out some of the duplicated code

I think the core implementation of update_many could be converted into a sealed trait as @newpavlov suggested, with the constant value differences hoisted into associated consts. That could probably be made easier/more futureproof by making the many modules private and re-exporting the functions that comprise the public APIs from the blake2b/blake2s modules.

oconnor663 · 2021-09-05T22:33:56Z

Apologies for the delay in getting to this. I notice that we have Blake2b, VarBlake2b, and State in the same module, with overlapping purposes. Similarly we have Hash and Params, but those are only used by State, and perhaps confusingly the Blake2b::with_params constructor doesn't involve Params. What do you think the long-term strategy will be with these API details?

oconnor663 · 2021-09-05T22:41:30Z

Pie-in-the-sky, this has me wondering about an alternative strategy: What if we separated out the blake2_simd backends into a couple standalone crates (e.g. blake2b_backend and blake2s_backend) that provided a very low-level API, like the raw compression function and a minimal parallel compression function for BLAKE2bp/sp? That could be maintained here in the hashes repo, and then depended on by all the user-facing crates. That might let us kick the can down the road on API questions. How ridiculous would this be?

tarcieri · 2021-09-07T13:32:32Z

I notice that we have Blake2b, VarBlake2b, and State in the same module, with overlapping purposes. Similarly we have Hash and Params, but those are only used by State, and perhaps confusingly the Blake2b::with_params constructor doesn't involve Params. What do you think the long-term strategy will be with these API details?

There's definitely a lot of work and consolidation to be done there.

One thing that can definitely go away is Blake2* vs VarBlake2*, which is already addressed in #217.

From there the rest is largely up in the air. I'd probably suggest getting rid of State or at least making it non-pub. Hash should probably go away in favor of digest::Output and crypto_mac::Output (which would also eliminate the arrayvec dependency)

What if we separated out theblake2_simd backends into a couple standalone crates (e.g. blake2b_backend and blake2s_backend) that provided a very low-level API, like the raw compression function and a minimal parallel compression function for BLAKE2bp/sp? That could be maintained here in the hashes repo, and then depended on by all the user-facing crates. That might let us kick the can down the road on API questions. How ridiculous would this be?

Personally I'm still in favor of trying to bring everything under one roof. There is a lot of duplication right now.

The current blake2 crate does this using macros, which isn't great. An alternative to that which is worth considering IMO is writing the internals more using (sealed) traits/generics, and then wrapping that up in a higher-level API which hides the internal implementation strategy.

I don't think it's a good idea to explore that sort of thing as part of this PR, but merging everything into a single crate at least leaves the door open to such refactorings.

Replaces the current implementation of `blake2` by integrating the original sources from these crates, which provide AVX2-accelerated SIMD backends: https://github.com/oconnor663/blake2_simd Taken from this commit: Hash: 7bf791e67245bb84132d1ee0e6a893bb8c85c093 Author: Jack O'Connor <jack.oconnor@zoom.us> Date: Fri Nov 13 15:50:16 2020 -0500 Title: AES-CTR benchmarks

Renames the `guts` module to `backend`, and refactors the various implementations to be submodules thereof.

Clearing out the `blake2b` and `blake2s` modules in order to add the primary user-facing types there.

This is more consistent with our other crates

Adds a set of `Blake2b`/`VarBlake2b` and `Blake2s`/`VarBlake2s` which are API-compatible with the ones in the current `blake2` v0.9.x release. This commit does not yet include proper tests besides the rustdoc tests.

These are the original tests from: https://github.com/RustCrypto/hashes/tree/4a2845c226ba6748babdb2e704713fa9103d09f0/blake2/tests They show that the new `blake2_simd` crate is API-compatible with the old one.

These clippy warnings all originate in macros from the `arrayref` crate.

Many of the warnings were occurring because the parallel backend requires x86 SIMD intrinsics (AVX2 or SSE41) and therefore the parallel code is dead on other targets. This commit feature gates all such code, ensuring that everything will build warning-free on other targets, and that users who attempt to use the parallel APIs on unsupported targets get compile errors rather than warnings.

tarcieri requested a review from newpavlov February 4, 2021 15:30

tarcieri mentioned this pull request Feb 4, 2021

switch the BLAKE2 implementation to blake2b_simd/blake2s_simd #88

Closed

tarcieri commented Feb 4, 2021

View reviewed changes

tarcieri force-pushed the blake2_simd branch 2 times, most recently from e086b47 to 40180b5 Compare February 4, 2021 15:43

tarcieri force-pushed the blake2_simd branch 4 times, most recently from 5f449a9 to 704dde6 Compare February 4, 2021 16:01

tarcieri force-pushed the blake2_simd branch from 704dde6 to 5a64949 Compare February 4, 2021 16:07

newpavlov mentioned this pull request Feb 23, 2021

blake2.rs: remove unnecessary unsafe #239

Closed

tarcieri mentioned this pull request May 5, 2021

Upstream this crate to RustCrypto? mitsuhiko/sha1-smol#38

Closed

tarcieri force-pushed the blake2_simd branch 6 times, most recently from 0586497 to 6568bc8 Compare July 18, 2021 22:49

tarcieri mentioned this pull request Oct 7, 2021

blake2 code size very high (overuse of#[inline(always)]?) #322

Open

vmx mentioned this pull request Jan 27, 2022

Using RustCrypto Digest trait directly multiformats/rust-multihash#179

Open

newpavlov mentioned this pull request Feb 16, 2022

blake2b gives a wrong value on powerpc-unknown-linux-gnu #356

Closed

tarcieri mentioned this pull request Apr 11, 2022

Update workspace feature resolver version to 2 #375

Merged

newpavlov mentioned this pull request Sep 15, 2022

blake2 simd benches produce unexpected results #407

Open

tarcieri added 14 commits January 13, 2023 08:57

blake2: fix clippy nits

974a482

blake2: rustdoc improvements

f386c6d

blake2: fix fuzz

02dd382

blake2: factors AVX2/SSE41/portable impls under backend module

d8c8716

Renames the `guts` module to `backend`, and refactors the various implementations to be submodules thereof.

blake2: extract Params types into modules

49ed198

Clearing out the `blake2b` and `blake2s` modules in order to add the primary user-facing types there.

blake2: extract State types into modules

236d6e0

Clearing out the `blake2b` and `blake2s` modules in order to add the primary user-facing types there.

blake2: extract Hash types into modules

0ce9be6

Clearing out the `blake2b` and `blake2s` modules in order to add the primary user-facing types there.

blake2: switch from constant_time_eq to subtle

6b5baa8

This is more consistent with our other crates

blake2: impl crypto_mac and digest traits

8f34bce

Adds a set of `Blake2b`/`VarBlake2b` and `Blake2s`/`VarBlake2s` which are API-compatible with the ones in the current `blake2` v0.9.x release. This commit does not yet include proper tests besides the rustdoc tests.

blake2: re-add tests from upstream crate

647fb62

These are the original tests from: https://github.com/RustCrypto/hashes/tree/4a2845c226ba6748babdb2e704713fa9103d09f0/blake2/tests They show that the new `blake2_simd` crate is API-compatible with the old one.

blake2: update allow(clippy::ptr_offset_with_cast) comment

3889d22

These clippy warnings all originate in macros from the `arrayref` crate.

blake2: disable tests in mod.rs for now

78292ec

tarcieri force-pushed the blake2_simd branch from 2f45194 to 78292ec Compare January 13, 2023 16:11

This was referenced Jan 16, 2023

Implementations of BLAKE2sp and BLAKE2bp #31

Open

Port blake2 on coresimd #56

Open

tarcieri mentioned this pull request Jan 26, 2023

Adding an optional zeroize feature that enables ZeroizeOnDrop for blake2 hashers #448

Closed

tarcieri mentioned this pull request Nov 3, 2023

blake2: Refuse empty keys in keyed hash construction #510

Merged

This was referenced Jan 10, 2024

Use README as crate docs #537

Merged

Add zeroize support #545

Merged

Added zeroize to blake2_simd #449

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

blake2: integrate `blake2b_simd`/`blake2s_simd` crates #228

blake2: integrate `blake2b_simd`/`blake2s_simd` crates #228

tarcieri commented Feb 4, 2021 •

edited

Loading

tarcieri Feb 4, 2021 •

edited

Loading

oconnor663 Aug 31, 2021

tarcieri commented Feb 4, 2021 •

edited

Loading

tarcieri commented Feb 4, 2021

tarcieri commented Feb 4, 2021 •

edited

Loading

tarcieri commented Feb 4, 2021 •

edited

Loading

tarcieri commented Feb 5, 2021 •

edited

Loading

newpavlov commented Feb 6, 2021

tarcieri commented Feb 6, 2021 •

edited

Loading

tarcieri commented Feb 6, 2021

oconnor663 commented Feb 8, 2021 •

edited

Loading

oconnor663 commented Feb 8, 2021

newpavlov commented Feb 9, 2021 •

edited

Loading

tarcieri commented Feb 9, 2021

oconnor663 commented Sep 5, 2021

oconnor663 commented Sep 5, 2021

tarcieri commented Sep 7, 2021 •

edited

Loading

blake2: integrate blake2b_simd/blake2s_simd crates #228

Are you sure you want to change the base?

blake2: integrate blake2b_simd/blake2s_simd crates #228

Conversation

tarcieri commented Feb 4, 2021 • edited Loading

Next steps

tarcieri Feb 4, 2021 • edited Loading

Choose a reason for hiding this comment

oconnor663 Aug 31, 2021

Choose a reason for hiding this comment

tarcieri commented Feb 4, 2021 • edited Loading

tarcieri commented Feb 4, 2021

tarcieri commented Feb 4, 2021 • edited Loading

tarcieri commented Feb 4, 2021 • edited Loading

tarcieri commented Feb 5, 2021 • edited Loading

newpavlov commented Feb 6, 2021

tarcieri commented Feb 6, 2021 • edited Loading

tarcieri commented Feb 6, 2021

oconnor663 commented Feb 8, 2021 • edited Loading

oconnor663 commented Feb 8, 2021

newpavlov commented Feb 9, 2021 • edited Loading

tarcieri commented Feb 9, 2021

oconnor663 commented Sep 5, 2021

oconnor663 commented Sep 5, 2021

tarcieri commented Sep 7, 2021 • edited Loading

blake2: integrate `blake2b_simd`/`blake2s_simd` crates #228

blake2: integrate `blake2b_simd`/`blake2s_simd` crates #228

tarcieri commented Feb 4, 2021 •

edited

Loading

tarcieri Feb 4, 2021 •

edited

Loading

tarcieri commented Feb 4, 2021 •

edited

Loading

tarcieri commented Feb 4, 2021 •

edited

Loading

tarcieri commented Feb 4, 2021 •

edited

Loading

tarcieri commented Feb 5, 2021 •

edited

Loading

tarcieri commented Feb 6, 2021 •

edited

Loading

oconnor663 commented Feb 8, 2021 •

edited

Loading

newpavlov commented Feb 9, 2021 •

edited

Loading

tarcieri commented Sep 7, 2021 •

edited

Loading