-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x/crypto/sha3: add SHA3 assembly implementation for ARMv7 #28148
Comments
Ref: #28126 |
As I wasn't aware of the mentioned Assembly Policy mentioned in the issue, I will provide responses to some of the points mentioned there that should be discussed when considering assembly contributions:
This function is on the "fast path" when performing SHA3 hashing of large files, etc. I don't have specific benchmarks of implementing other parts of the hashing algorithm in assembly to compare to, but it should be clear from both the speedup I got and the fact that the amd64 implementation only implements this same function in assembly that the function I'm providing is the "minimal" amount to implement in assembly.
I would argue this is okay because the generated assembly is generated from a relatively small amount of hand-written native assembly. The reason why I didn't write the native assembly in plan9 assembly is because Go's assembler doesn't support the ARM vector instructions needed for this implementation (and using vector instructions is why the implementation is much faster than the native Go implementation).
One particular place this lack of performance on ARM hits us is during the first boot of a newly flashed Ubuntu Core ARM device, where the system when installing snaps (and other tasks) takes up a good portion of the time performing SHA3 hashes. An example difference for a customer device I can't unfortunately release more details on, performs the first boot in a factory environment about 1-2 minutes faster (approx 10% faster) when using this assembly implementation, and this time/percentage increases with the number of snaps the customer wants to configure their device with initially. |
/cc @FiloSottile |
Hi folks, any update on this? Is it now too late for this to be considered for go 1.12? |
@anonymouse64 hi there. It is too late for this to make it into 1.12 since we are deep into the freeze period where only bugs are being fixed. More info is here: https://golang.org/wiki/Go-Release-Cycle |
Okay, so can I at least get a decision about whether an assembly implementation of SHA3 for ARM is acceptable or not (disregarding what specific go release it goes into)? |
I'm sorry you've been waiting so long for a response. Hopefully this finds itself at the top of @FiloSottile's notifications and can be looked at more closely soon. |
Change https://golang.org/cl/119255 mentions this issue: |
Hi @FiloSottile any comment on this? |
This provides significant improvement and should be considered |
Any update on this, even an ACK/NACK on the generic approach would be great at this point, it's getting close to two years with no response on whether the approach is okay :-/ |
I commented on the CL - a file full of WORD instructions is not really assembly and not something I want to see more of in the Go tree. And when we say "Use higher level programs to generate non-trivial amounts of assembly" we mean programs that can be read to make the structure of the code clear, such as crypto/md5/gen.go (using other tools like avo is OK too). The overall goal is long-term maintainability. The CL does not achieve that. That said, the proposal process is not the place to decide about specific CLs. I'm going to remove this from the proposal process. The way forward on that CL would be to write something that is more reviewable and possible to maintain in the long term, like turning the original assembly's macros into something like crypto/md5/gen.go instead. |
@rsc Would you mind I am taking this with cgo implementation which conducts ARM support with NEON intrinsics? |
I suspect cgo would be no go either. |
@howjmay, thanks. Filippo is already marked as a reviewer on https://go-review.googlesource.com/c/crypto/+/318869, and I expect he will review it. |
thank you so much |
Currently, there's no assembly implementation for SHA3 hashing for ARM platforms (specifically ARMv7). On ARMv7+ there are vector assembly instructions (known as NEON) available which greatly speed up the speed of SHA3 hashing. There is an upstream reference implementation (here: https://github.com/KeccakTeam/KeccakCodePackage/blob/master/lib/low/KeccakP-1600-times2/OptimizedAsmARM/KeccakP-1600-inplace-pl2-armv7a-neon-le-gcc.s) available that implements SHA3 hashing using these vector instructions and so I have ported this to Go.
Unfortunately, there is no support in the Go assembler/dissassembler for ARMv7 vector instructions, and so I wrote a small tool (available: https://github.com/anonymouse64/asm2go) which translates native assembly code for ARM into Go's plan9 based assembly unsupported opcode syntax in order to integrate the upstream implementation in Go.
I see approximately 3-4 time speedup in SHA3 hashing on a reference Raspberry Pi 3 Model B Revision 1.2 board:
I opened a CL providing this implementation here: https://go-review.googlesource.com/c/crypto/+/119255 however I have not received any feedback on the CL, so here I am opening this issue to hopefully get more visibility on this.
The text was updated successfully, but these errors were encountered: