-
Notifications
You must be signed in to change notification settings - Fork 256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
blake2: integrate blake2b_simd
/blake2s_simd
crates
#228
base: master
Are you sure you want to change the base?
Conversation
blake2/src/blake2b/guts.rs
Outdated
Platform::AVX2 | Platform::SSE41 => unsafe { | ||
sse41::compress2_loop(jobs, finalize, stride) | ||
}, | ||
_ => panic!("unsupported"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@oconnor663 noticing compile failures (well, more like warnings) for this and the other cases below when testing on a thumbv7em-none-eabi
target (which is SOP for us).
Is there some sort of fallback on these platforms that will avoid this panic? If not, these should probably be changed to compile_error!
(and separately, we should address the issue before merging if so)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tarcieri are you sure you can trigger this panic in practice? If you look at how these functions are used in compress_many
, imp.degree()
is checked first, which should prevent this panic. In particular, I expect the Portable
platform should get selected on thumbv7em-none-eabi
, which should have a degree()
of 1.
e086b47
to
40180b5
Compare
As discussed in #88, this PR merges the Alternatively it might be possible to retain the two-crate structure, however unfortunately while we own the
|
Also note: I haven't tried to impl any of the @newpavlov any thoughts on this? |
5f449a9
to
704dde6
Compare
Oof, the KATs are enormous:
It seems like https://github.com/oconnor663/blake2_simd was able to keep them conveniently out-of-band by virtue of having a workspace and keeping it outside the crates themselves. Edit: moved them to |
The good news: it compiles and all of the original tests are now passing! 🎉 I'll update the toplevel description with some next steps. |
@oconnor663 @newpavlov so it turns out I was able to obtain ownership of the So the question is: should we proceed with a combined |
Assuming that amount of code reuse between
This is why we have |
When I was making this PR I did notice considerable overlap/duplication between the two crates. There are large swaths of the code that are completely identical. For example, the file I think there's quite a bit of opportunity to extract a
I did a quick ballparking exercise, ignoring all framing and just decoding and concatenating the hex strings of the 1.8MB This still results in 827kB of binary data, which I would consider enormous and probably a bad idea to include in a crate that would otherwise be ~60kB otherwise. |
I definitely think we should land #217 first and do another release of the current |
Yes there is quite a lot of duplicated code between 2b and 2s. I experimented a couple times with refactoring it to share more, but I wound up getting frustrated with having everything in a giant macro, and I gave up and decided copy-pasting was a better result (even as a maintainer, but especially for anyone who has to read the code). Some scattered thoughts:
Maybe there are some code sharing strategies besides "giant macro" that could work here, that I haven't thought about? Maybe some interesting possibilities open up when we can use const generics? |
Another scattered thought: The |
@oconnor663 In One possible alternative to giant macros is to be generic over a private trait with necessary associated constants and types. Since Rust monomorphizes generics by default, it should be identical to the macro approach performance-wise. I have successfully refactored some macros using this approach (though const generics would have been a better fit in some cases), so it may apply here as well. |
I'm weakly in favor of merging the crates for the following reasons:
I think the core implementation of |
0586497
to
6568bc8
Compare
Apologies for the delay in getting to this. I notice that we have |
Pie-in-the-sky, this has me wondering about an alternative strategy: What if we separated out the |
There's definitely a lot of work and consolidation to be done there. One thing that can definitely go away is From there the rest is largely up in the air. I'd probably suggest getting rid of
Personally I'm still in favor of trying to bring everything under one roof. There is a lot of duplication right now. The current I don't think it's a good idea to explore that sort of thing as part of this PR, but merging everything into a single crate at least leaves the door open to such refactorings. |
Replaces the current implementation of `blake2` by integrating the original sources from these crates, which provide AVX2-accelerated SIMD backends: https://github.com/oconnor663/blake2_simd Taken from this commit: Hash: 7bf791e67245bb84132d1ee0e6a893bb8c85c093 Author: Jack O'Connor <jack.oconnor@zoom.us> Date: Fri Nov 13 15:50:16 2020 -0500 Title: AES-CTR benchmarks
Renames the `guts` module to `backend`, and refactors the various implementations to be submodules thereof.
Clearing out the `blake2b` and `blake2s` modules in order to add the primary user-facing types there.
Clearing out the `blake2b` and `blake2s` modules in order to add the primary user-facing types there.
Clearing out the `blake2b` and `blake2s` modules in order to add the primary user-facing types there.
This is more consistent with our other crates
Adds a set of `Blake2b`/`VarBlake2b` and `Blake2s`/`VarBlake2s` which are API-compatible with the ones in the current `blake2` v0.9.x release. This commit does not yet include proper tests besides the rustdoc tests.
These are the original tests from: https://github.com/RustCrypto/hashes/tree/4a2845c226ba6748babdb2e704713fa9103d09f0/blake2/tests They show that the new `blake2_simd` crate is API-compatible with the old one.
These clippy warnings all originate in macros from the `arrayref` crate.
Many of the warnings were occurring because the parallel backend requires x86 SIMD intrinsics (AVX2 or SSE41) and therefore the parallel code is dead on other targets. This commit feature gates all such code, ensuring that everything will build warning-free on other targets, and that users who attempt to use the parallel APIs on unsupported targets get compile errors rather than warnings.
Integrates original sources from these crates, which provide AVX2-accelerated SIMD backends:
https://github.com/oconnor663/blake2_simd
Taken from this commit:
Next steps
blake2
crate or if we can get theblake2b
crate and useblake2b
/blake2s
Blake2b
/Blake2s
structs which provide impls of thedigest
traits. These can also be used to replace theblake2::blake2b::blake2b
/blake2::blake2s::blake2s
static functions e.g. withBlake2b::digest(...)
VarBlake2b
/VarBlake2s
(?)no_std
targetsAddress clippy errors (presently TODOs) in the PR(caused byarrayref
, can circle back on that)