-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add bignum_mont{sqr,mul}_p256_neon for Arm #118
Conversation
TODO: add an instruction for reconstructing the rescheduling optimization |
@aqjune Thanks a lot! Yes, it would be good to have
Which uArch's benefit from this new version? |
Actually, I was updating slothy/targets/aarch64/cortex_a55.py on my local repo branch to add instruction latency/throughputs for neoverse n1, becauase curve25519 was optimized with the Cortex A55 model IIRC. It seems Cortext A55's instruction cost model is not equivalent to Neoverse N1 though. Would it be a good idea if the instruction cost models for Neoverse N1 are backported to the official SLOTHY? Then, which file should we update (neoverse_n1_experimental.py?)? Rerunning SLOTHY will not be a good problem modulo its running time. Proof can be updated mechanically. |
@aqjune Yes, we should be aiming to build & use the Neoverse N1 model, I think. If it turns out that the A55 model performs better on N1, then this is something to be investigated -- let's see! Could you move your changes from the A55 model (which, as I understand, use N1 SWOG data?) over to the N1 model, and rerun your optimization scripts using the N1 model? And, once everything works, open a PR on the SLOTHY repository with the model enhancements (nothing else)? |
458c6af
to
8aaa240
Compare
Okay, CI checks are running. Fingers crossed... |
This patch adds `bignum_mont{sqr,mul}_p256_neon` functions. These are vectorized and instruction-rescheduled versions of `bignum_mont{sqr,mul}_p256`. They are verified using the equivalence checking tactics. A new bash script `tools/external/slothy.sh` is added to help reproduce the optimized output. The 'intermediate' functions of the two functions are written as comments in the two assembly files. Additionally, - A new instruction `umull2` is formalized add added to the simulator in order to verify the new functions. - Old `*_neon` functions' proofs are refactored a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This now looks great to me, thank you!
This patch adds
bignum_mont{sqr,mul}_p256_neon
functions.These are vectorized and instruction-rescheduled versions of
bignum_mont{sqr,mul}_p256
.They are verified using the equivalence checking tactics.
A new bash script
tools/external/slothy.sh
is added to help reproduce the optimized output.The 'intermediate' functions of the two functions are written as comments in the two assembly files.
Additionally,
umull2
is formalized add added to the simulator in order to verify the new functions.*_neon
functions' proofs are refactored a bit.