Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/crypto/sha3: add SHA3 assembly implementation for ARMv7 #28148

Open
anonymouse64 opened this issue Oct 11, 2018 · 17 comments
Open

x/crypto/sha3: add SHA3 assembly implementation for ARMv7 #28148

anonymouse64 opened this issue Oct 11, 2018 · 17 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@anonymouse64
Copy link

Currently, there's no assembly implementation for SHA3 hashing for ARM platforms (specifically ARMv7). On ARMv7+ there are vector assembly instructions (known as NEON) available which greatly speed up the speed of SHA3 hashing. There is an upstream reference implementation (here: https://github.com/KeccakTeam/KeccakCodePackage/blob/master/lib/low/KeccakP-1600-times2/OptimizedAsmARM/KeccakP-1600-inplace-pl2-armv7a-neon-le-gcc.s) available that implements SHA3 hashing using these vector instructions and so I have ported this to Go.

Unfortunately, there is no support in the Go assembler/dissassembler for ARMv7 vector instructions, and so I wrote a small tool (available: https://github.com/anonymouse64/asm2go) which translates native assembly code for ARM into Go's plan9 based assembly unsupported opcode syntax in order to integrate the upstream implementation in Go.

I see approximately 3-4 time speedup in SHA3 hashing on a reference Raspberry Pi 3 Model B Revision 1.2 board:

benchmark                          old ns/op     new ns/op     delta
BenchmarkPermutationFunction-4     19033         6054          -68.19%
BenchmarkSha3_512_MTU-4            388484        137001        -64.73%
BenchmarkSha3_384_MTU-4            279054        100177        -64.10%
BenchmarkSha3_256_MTU-4            224595        81443         -63.74%
BenchmarkSha3_224_MTU-4            210459        79196         -62.37%
BenchmarkShake128_MTU-4            181445        66482         -63.36%
BenchmarkShake256_MTU-4            199495        71572         -64.12%
BenchmarkShake256_16x-4            2704755       1094933       -59.52%
BenchmarkShake256_1MiB-4           151870003     53953383      -64.47%
BenchmarkSha3_512_1MiB-4           283838790     97048578      -65.81%

benchmark                          old MB/s     new MB/s     speedup
BenchmarkPermutationFunction-4     10.51        33.03        3.14x
BenchmarkSha3_512_MTU-4            3.48         9.85         2.83x
BenchmarkSha3_384_MTU-4            4.84         13.48        2.79x
BenchmarkSha3_256_MTU-4            6.01         16.58        2.76x
BenchmarkSha3_224_MTU-4            6.41         17.05        2.66x
BenchmarkShake128_MTU-4            7.44         20.31        2.73x
BenchmarkShake256_MTU-4            6.77         18.86        2.79x
BenchmarkShake256_16x-4            6.06         14.96        2.47x
BenchmarkShake256_1MiB-4           6.90         19.43        2.82x

I opened a CL providing this implementation here: https://go-review.googlesource.com/c/crypto/+/119255 however I have not received any feedback on the CL, so here I am opening this issue to hopefully get more visibility on this.

@ALTree ALTree changed the title Add SHA3 assembly implementation on ARM x/crypto/sha3: add SHA3 assembly implementation for ARMv7 Oct 11, 2018
@gopherbot gopherbot added this to the Unreleased milestone Oct 11, 2018
@ALTree ALTree added the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Oct 11, 2018
@aead
Copy link
Contributor

aead commented Oct 12, 2018

Ref: #28126

@anonymouse64
Copy link
Author

As I wasn't aware of the mentioned Assembly Policy mentioned in the issue, I will provide responses to some of the points mentioned there that should be discussed when considering assembly contributions:

  • The change I have only implements the minimal KeccakF1600 hashing function, which is the minimal part that ought to be implemented in assembly.

This function is on the "fast path" when performing SHA3 hashing of large files, etc. I don't have specific benchmarks of implementing other parts of the hashing algorithm in assembly to compare to, but it should be clear from both the speedup I got and the fact that the amd64 implementation only implements this same function in assembly that the function I'm providing is the "minimal" amount to implement in assembly.

  • While my CL includes a large amount of generated assembly, I think because it's generated it's okay that there's a lot more of it than say in the equivalent amd64 assembly implementation.

I would argue this is okay because the generated assembly is generated from a relatively small amount of hand-written native assembly. The reason why I didn't write the native assembly in plan9 assembly is because Go's assembler doesn't support the ARM vector instructions needed for this implementation (and using vector instructions is why the implementation is much faster than the native Go implementation).
Additionally, this small hand-written assembly comes from the upstream reference implementation for SHA3, and is in fact the same place where the original amd64 assembly implementation comes from.

  • In the future, this assembly code could be simplified with either support for vector instructions in plan9 assembly code, or if the go compiler gained the ability to emit vector instructions when compiling specifically for ARMv7+ processors.

  • SHA3 is likely not going to become a legacy primitive anytime soon, considering it was specifically chosen in a public, open manner to replace older hashing mechanisms

  • Specifically for our use case at Canonical Ltd., our package management system snapd uses SHA3 hashing for almost all (possibly all not 100% sure) hashing of files.

One particular place this lack of performance on ARM hits us is during the first boot of a newly flashed Ubuntu Core ARM device, where the system when installing snaps (and other tasks) takes up a good portion of the time performing SHA3 hashes. An example difference for a customer device I can't unfortunately release more details on, performs the first boot in a factory environment about 1-2 minutes faster (approx 10% faster) when using this assembly implementation, and this time/percentage increases with the number of snaps the customer wants to configure their device with initially.

@andybons andybons changed the title x/crypto/sha3: add SHA3 assembly implementation for ARMv7 proposal: x/crypto/sha3: add SHA3 assembly implementation for ARMv7 Oct 17, 2018
@andybons andybons added Proposal-Crypto Proposal related to crypto packages or other security issues and removed NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. labels Oct 17, 2018
@andybons
Copy link
Member

/cc @FiloSottile

@anonymouse64
Copy link
Author

Hi folks, any update on this? Is it now too late for this to be considered for go 1.12?

@andybons
Copy link
Member

andybons commented Dec 4, 2018

@anonymouse64 hi there. It is too late for this to make it into 1.12 since we are deep into the freeze period where only bugs are being fixed. More info is here: https://golang.org/wiki/Go-Release-Cycle

@anonymouse64
Copy link
Author

Okay, so can I at least get a decision about whether an assembly implementation of SHA3 for ARM is acceptable or not (disregarding what specific go release it goes into)?
My CL has been open for almost 6 months and this issue has been open for almost 2 months and I have not yet heard any feedback on whether it is acceptable or not. I would like to at least know if it is acceptable, as I was told that I should attempt to upstream this implementation rather than try to maintain our own fork of the SHA3 crypto library. If it's not acceptable, I would need to prepare for making the case for us to maintain a fork which would be highly unfortunate so it would be good for me to know.

@andybons
Copy link
Member

andybons commented Dec 4, 2018

I'm sorry you've been waiting so long for a response. Hopefully this finds itself at the top of @FiloSottile's notifications and can be looked at more closely soon.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/119255 mentions this issue: sha3: implement assembly implementation on ARM

@anonymouse64
Copy link
Author

Hi @FiloSottile any comment on this?

@kubiko
Copy link

kubiko commented Oct 11, 2019

This provides significant improvement and should be considered

@anonymouse64
Copy link
Author

Any update on this, even an ACK/NACK on the generic approach would be great at this point, it's getting close to two years with no response on whether the approach is okay :-/

@rsc
Copy link
Contributor

rsc commented Feb 24, 2021

I commented on the CL - a file full of WORD instructions is not really assembly and not something I want to see more of in the Go tree. And when we say "Use higher level programs to generate non-trivial amounts of assembly" we mean programs that can be read to make the structure of the code clear, such as crypto/md5/gen.go (using other tools like avo is OK too).

The overall goal is long-term maintainability. The CL does not achieve that.

That said, the proposal process is not the place to decide about specific CLs.
The Assembly Policy itself went through the proposal process, so the CL should just follow that.
(But again today it does not.)

I'm going to remove this from the proposal process.

The way forward on that CL would be to write something that is more reviewable and possible to maintain in the long term, like turning the original assembly's macros into something like crypto/md5/gen.go instead.

@rsc rsc changed the title proposal: x/crypto/sha3: add SHA3 assembly implementation for ARMv7 x/crypto/sha3: add SHA3 assembly implementation for ARMv7 Feb 24, 2021
@rsc rsc removed Proposal Proposal-Crypto Proposal related to crypto packages or other security issues labels Feb 24, 2021
@howjmay
Copy link
Contributor

howjmay commented Apr 24, 2021

@rsc Would you mind I am taking this with cgo implementation which conducts ARM support with NEON intrinsics?

@andrius4669
Copy link
Contributor

I suspect cgo would be no go either.

@howjmay
Copy link
Contributor

howjmay commented May 11, 2021

Hi @andybons @rsc I have opened a PR for SHA3 assembly on ARMv8.
The PR takes advantage on the new SIMD instructions for accelerating SHA3. If possible, please take a look. Thank you so much.

@rsc
Copy link
Contributor

rsc commented May 11, 2021

@howjmay, thanks. Filippo is already marked as a reviewer on https://go-review.googlesource.com/c/crypto/+/318869, and I expect he will review it.

@howjmay
Copy link
Contributor

howjmay commented May 12, 2021

thank you so much

@seankhliao seankhliao added Performance NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Aug 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
None yet
Development

No branches or pull requests

10 participants