Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bignum_mont{sqr,mul}_p256_neon for Arm #118

Merged
merged 1 commit into from
Apr 20, 2024

Conversation

aqjune-aws
Copy link
Collaborator

@aqjune-aws aqjune-aws commented Mar 28, 2024

This patch adds bignum_mont{sqr,mul}_p256_neon functions.

These are vectorized and instruction-rescheduled versions of bignum_mont{sqr,mul}_p256.
They are verified using the equivalence checking tactics.

A new bash script tools/external/slothy.sh is added to help reproduce the optimized output.
The 'intermediate' functions of the two functions are written as comments in the two assembly files.

Additionally,

  • A new instruction umull2 is formalized add added to the simulator in order to verify the new functions.
  • Old *_neon functions' proofs are refactored a bit.

@aqjune
Copy link
Contributor

aqjune commented Apr 1, 2024

TODO: add an instruction for reconstructing the rescheduling optimization

@hanno-becker
Copy link
Contributor

hanno-becker commented Apr 2, 2024

@aqjune Thanks a lot! Yes, it would be good to have

  • the hybrid code that was used as the input to the SLOTHY optimization
  • the exact command you used to optimize the code
  • the commit hash of the SLOTHY version you were using (unfortunately, I haven't yet started versioning SLOTHY properly)

Which uArch's benefit from this new version?

@aqjune-aws
Copy link
Collaborator Author

  • the commit hash of the SLOTHY version you were using (unfortunately, I haven't yet started versioning SLOTHY properly)

Actually, I was updating slothy/targets/aarch64/cortex_a55.py on my local repo branch to add instruction latency/throughputs for neoverse n1, becauase curve25519 was optimized with the Cortex A55 model IIRC. It seems Cortext A55's instruction cost model is not equivalent to Neoverse N1 though.
The local repo branch is https://github.com/aqjune-aws/slothy/tree/customize .

Would it be a good idea if the instruction cost models for Neoverse N1 are backported to the official SLOTHY? Then, which file should we update (neoverse_n1_experimental.py?)?

Rerunning SLOTHY will not be a good problem modulo its running time. Proof can be updated mechanically.

@hanno-becker
Copy link
Contributor

hanno-becker commented Apr 2, 2024

@aqjune Yes, we should be aiming to build & use the Neoverse N1 model, I think. If it turns out that the A55 model performs better on N1, then this is something to be investigated -- let's see!

Could you move your changes from the A55 model (which, as I understand, use N1 SWOG data?) over to the N1 model, and rerun your optimization scripts using the N1 model? And, once everything works, open a PR on the SLOTHY repository with the model enhancements (nothing else)?

@aqjune-aws aqjune-aws force-pushed the equiv-muls2 branch 3 times, most recently from 458c6af to 8aaa240 Compare April 9, 2024 20:55
@aqjune-aws aqjune-aws changed the title Add bignum_mont{sqr,mul}_p256_neon Add bignum_mont{sqr,mul}_p256_neon for Arm Apr 10, 2024
@aqjune-aws
Copy link
Collaborator Author

Okay, CI checks are running. Fingers crossed...

@aqjune-aws aqjune-aws marked this pull request as ready for review April 11, 2024 03:26
This patch adds `bignum_mont{sqr,mul}_p256_neon` functions.

These are vectorized and instruction-rescheduled versions of `bignum_mont{sqr,mul}_p256`.
They are verified using the equivalence checking tactics.

A new bash script `tools/external/slothy.sh` is added to help reproduce the optimized output.
The 'intermediate' functions of the two functions are written as comments in the two assembly files.

Additionally,
- A new instruction `umull2` is formalized add added to the simulator in order to verify the new functions.
- Old `*_neon` functions' proofs are refactored a bit.
Copy link
Contributor

@jargh jargh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now looks great to me, thank you!

@jargh jargh merged commit 0a3b3f3 into awslabs:main Apr 20, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants