Detect all Apple M* CPUs and enable the wide multiplier assembly implementations #1901

andrewhop · 2024-10-02T18:22:04Z

Description of changes:

Looking at the benchmarks on https://github.com/ctz/graviola/ I noticed AWS-LC was slower than expected. Those benchmarks were performed on an M2 Mac and previously AWS-LC was only checking for M1 CPUs. This change opts all Apple M* CPUs in the alt/wide multiplier implementations. A future Apple M CPU might benefit from the non-alt implementation, but for now it seems like all M CPUs benefit from the alt implementation and this is a sane default.

Call-outs:

I think this is safe because both implementations use the same instructions, it's just a reordering. So if a future M CPU doesn't support the wide multiplier it will still work it will just be slower than if it was using the non-alt implementation.

Testing:

Tested on an M1 and M3 that the alt implementation is picked and is faster. M3 before:

Did 29631 EVP ECDH P-224 operations in 1018316us (29098.0 ops/sec)
Did 30000 EVP ECDH P-256 operations in 1020237us (29404.9 ops/sec)
Did 5840 EVP ECDH P-384 operations in 1000133us (5839.2 ops/sec)
Did 3630 EVP ECDH P-521 operations in 1096572us (3310.3 ops/sec)
Did 5430 EVP ECDH secp256k1 operations in 1000506us (5427.3 ops/sec)
Did 41000 EVP ECDH X25519 operations in 1009621us (40609.3 ops/sec)

M3 After

Did 30233 EVP ECDH P-224 operations in 1013635us (29826.3 ops/sec)
Did 31000 EVP ECDH P-256 operations in 1022186us (30327.2 ops/sec)
Did 7227 EVP ECDH P-384 operations in 1076336us (6714.4 ops/sec)
Did 4690 EVP ECDH P-521 operations in 1042402us (4499.2 ops/sec)
Did 5710 EVP ECDH secp256k1 operations in 1051837us (5428.6 ops/sec)
Did 52000 EVP ECDH X25519 operations in 1008149us (51579.7 ops/sec)

As expected P-224 and P-256 are unaffected, while 384/521/X25519 which do use s2n-bignum are much faster.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and the ISC license.

…ementations

codecov-commenter · 2024-10-02T18:43:50Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 78.48%. Comparing base (ad93747) to head (17028e8).
Report is 5 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1901      +/-   ##
==========================================
+ Coverage   78.45%   78.48%   +0.02%     
==========================================
  Files         585      585              
  Lines       99516    99517       +1     
  Branches    14244    14243       -1     
==========================================
+ Hits        78080    78105      +25     
+ Misses      20796    20777      -19     
+ Partials      640      635       -5

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ctz · 2024-10-02T18:52:43Z

Nice! I had assumed this was coming from shipping the fastest implementation on Graviton; I didn't realise you were switching between them at runtime :)

Detect all Apple M* CPUs and enable the wide multiplier assembly impl…

17028e8

…ementations

andrewhop requested a review from a team as a code owner October 2, 2024 18:22

andrewhop requested review from torben-hansen and nebeid October 2, 2024 18:22

jakemas self-requested a review October 2, 2024 18:32

torben-hansen approved these changes Oct 2, 2024

View reviewed changes

jakemas approved these changes Oct 2, 2024

View reviewed changes

nebeid approved these changes Oct 2, 2024

View reviewed changes

andrewhop merged commit ca5f197 into aws:main Oct 2, 2024
107 of 110 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detect all Apple M* CPUs and enable the wide multiplier assembly implementations #1901

Detect all Apple M* CPUs and enable the wide multiplier assembly implementations #1901

andrewhop commented Oct 2, 2024

codecov-commenter commented Oct 2, 2024

ctz commented Oct 2, 2024

Detect all Apple M* CPUs and enable the wide multiplier assembly implementations #1901

Detect all Apple M* CPUs and enable the wide multiplier assembly implementations #1901

Conversation

andrewhop commented Oct 2, 2024

Description of changes:

Call-outs:

Testing:

codecov-commenter commented Oct 2, 2024

Codecov Report

ctz commented Oct 2, 2024