Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS-LC s2n-bignum update 2024-07-22 #1718

Merged
merged 52 commits into from
Jul 24, 2024

Conversation

dkostic
Copy link
Contributor

@dkostic dkostic commented Jul 22, 2024

Issues:

N/A

Description of changes:

Latest update from s2n-bignum.

Call-outs:

Point out areas that need special attention or support during the review process. Discuss architecture or design changes.

Testing:

How is this change tested (unit tests, fuzz tests, etc.)? Are there any testing steps to be verified by the reviewer?

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and the ISC license.

aqjune-aws and others added 30 commits September 15, 2023 15:32
This patch adds constant-time table-lookup functions
(`bignum_copy_row_from_table*`) and their proofs.
This patch only contains its AArch64 version, and the x86 version will
follow later.
The failure of proving its x86 version seems to be related to handling
negative offsets, and (if this is right) this can be avoided by simply
proving positive offsets.
I will record this as a Github issue with a promise that the x86 scalar
version will be provided after the RSA related things are finished.

This patch contains four table-lookup functions:
1. `bignum_copy_row_from_table`: a lookup for a generic table size
2. `bignum_copy_row_from_table_8n_neon`: a Neon version for a table
   whose width is a multiple of 8
3. `bignum_copy_row_from_table_16_neon`: Neon implementation of a table
   whose width is 16*64=1024 bits
4. `bignum_copy_row_from_table_32_neon`: Neon implementation of a table
   whose width is 32*64=2048 bits

The last two versions are initially written by Hanno Becker.

To successfully compile and run `test` and `benchmark` in x86, the
scalar `bignum_copy_row_from_table` function is processed as a way
similar to Neon functions.

s2n-bignum original commit: awslabs/s2n-bignum@f1ad23c
Add bignum_copy_row_from_table and its Neon-variants for AArch64
s2n-bignum original commit: awslabs/s2n-bignum@50aa85b
This implements the point compression encoding to a byte array from
https://datatracker.ietf.org/doc/html/rfc8032#section-5.1.2
as function "edwards25519_encode". It assumes the input is a point
(x,y) on the edwards25519 curve, with coordinates reduced mod
p_25519 = 2^255 - 19, and does not check any of that.

s2n-bignum original commit: awslabs/s2n-bignum@67430be
This implements point decoding from a 256-bit little-endian byte
sequence to a point (x,y) on the edwards25519 curve as specified in
https://datatracker.ietf.org/doc/html/rfc8032#section-5.1.3
The function returns 0 for success and 1 for failure, the latter
meaning that the input is not the encoding of any edwards25519 point.

s2n-bignum original commit: awslabs/s2n-bignum@97f7493
The function bignum_mod_n25519 performs reduction of an input of any
size (k digits) modulo the order of the curve25519/edwards25519
basepoint, n_25519 = 2^252 + 27742317777372353535851937790883648493.
It generalizes bignum_mod_n25519_4, which is the special case of
4-digit (256-bit) inputs.

s2n-bignum original commit: awslabs/s2n-bignum@e23fd30
This replaces the inlined variant of "bignum_modinv" with code from
"bignum_inv_p25519" in all "curve25519_" functions returning an affine
point and hence using modular inverse. There are also a few
consequential changes related to the slightly different amount of
temporary storage needed by this function.

s2n-bignum original commit: awslabs/s2n-bignum@777d574
…ck_no

Document that x25519 function does not implement zero-check
s2n-bignum original commit: awslabs/s2n-bignum@5c4b15a
This replaces the inlined variant of "bignum_modinv" with code from
"bignum_inv_p25519" in all "edwards25519_scalarmul*" functions.
Again, there are consequential changes related to the slightly
different amount of temporary storage needed by bignum_inv_p25519.

s2n-bignum original commit: awslabs/s2n-bignum@7e7b18e
Ed25519 support and related updates
s2n-bignum original commit: awslabs/s2n-bignum@db8409d
Add BFM, BIC, FCSEL, INS, SUB, TRN1, TRN2, USHR, ZIP2 to ARM model
s2n-bignum original commit: awslabs/s2n-bignum@f1caaf1
In general, BOUNDER_RULE now directly handles operations over Z and N,
assuming an outer real_of_int / real_of_num cast into R (this is also
automated in the tactic form BOUNDER_TAC). In particular, this change
can greatly improve bounds for terms involving integer or natural
number division and remainder (DIV, div, MOD and rem) as well as
cutoff subtraction over N. There is also now support for conditionals,
though the condition is not used as extra context, simply being the
basis for a case split.

This update rolls in various trivial typographic fixes in comments.

s2n-bignum original commit: awslabs/s2n-bignum@ccefa2a
…5519

Avoid duplicate labels in ed25519 x86 implementation
s2n-bignum original commit: awslabs/s2n-bignum@f629458
64-bit SIMD regs in ARM model, better BOUNDER_RULE, slow-ARM field optimizations
s2n-bignum original commit: awslabs/s2n-bignum@06781d2
…_input_const

Make _input_ parameter to ed25519 decode function const
s2n-bignum original commit: awslabs/s2n-bignum@4097178
* Allow MIT-0 license as well as Apache-2.0 and ISC

* Add appropriate year range to MIT-0 license
s2n-bignum original commit: awslabs/s2n-bignum@48fb153
This completely changes the implementation of ARM curve25519_x25519
and curve25519_x25519_byte (not the _alt forms, which remain faster
on their target microarchitectures) to a base-25.5 unsaturated version
with interleaved integer and SIMD operations, the inner loop closely
following Emil Lenngren's implementation described in the paper

  https://github.com/Emill/X25519-AArch64/blob/master/X25519_AArch64.pdf

and available here:

  https://github.com/Emill/X25519-AArch64

A version of this code was generated by SLOTHY from the reorganized
implementation by Abdulrahman, Becker, Kannwischer and Klein here:

 https://github.com/slothy-optimizer/slothy/blob/main/paper/clean/neon/X25519-AArch64-simple.s

as described in the associated paper

  https://eprint.iacr.org/2022/1303.pdf

with some additional annotations for use in the formal proof. The
final modular inverse computation reverts to the usual saturated
representation and s2n-bignum's divstep-based inverse function.

s2n-bignum original commit: awslabs/s2n-bignum@fc0b9bf
Lenngren-based X25519 for non-alt ARM code
s2n-bignum original commit: awslabs/s2n-bignum@57eb68a
Enable testing bignum_copy_row_from_table on x86
s2n-bignum original commit: awslabs/s2n-bignum@26bfe44
This simplifies the remaining Montgomery ladder implementations by
avoiding the special code for zero handling, since it is not actually
necessary given the behavior of the modular inverse in this case. In
addition, the proofs have been tidied up a bit, factoring out the
basic mathematics so that the loop invariant becomes simpler.

The update also fixes a README typo pointed out by Dan Bernstein
and removes a couple of stray comments arising from SLOTHY output
in the Lenngren-derived X25519 code.

s2n-bignum original commit: awslabs/s2n-bignum@e14394d
This patch performs a few syntactic updates to make AWS-LC's delocator
work.

s2n-bignum original commit: awslabs/s2n-bignum@3b4f73c
Update curve25519_x25519{_byte} to make AWS-LC's delocator work
s2n-bignum original commit: awslabs/s2n-bignum@88324d8
jargh and others added 14 commits April 20, 2024 00:02
The code now handles specially the case where P1 = (x,y,z) is the
point at infinity, i.e. has z = 0. It then returns the other point P2
augmented (since that is in affine coordinates, this being mixed
addition) with z = 1 or its Montgomery equivalent to give the more
desirable result 0 + P2 = P2. The selection is constant-time as
usual with a single code path.

s2n-bignum original commit: awslabs/s2n-bignum@72ccfda
This is analogous to the earlier changes for mixed addition. In a
point addition operation P1 + P2, the cases where P1 = 0 or P2 = 0 are
handled specially (though of course using constant-time selection) as
0 + P2 = P2 and P1 + 0 = P1. More precisely, writing P1 = (x1,y1,z1)
and P2 = (x2,y2,z2), the special-case logic is triggered when
precisely *one* of z1 = 0 or z2 = 0 holds; in the case that both
z1 = 0 and z2 = 0 the standard computation is followed and yields the
"right" result (one with its z coordinate also zero).

s2n-bignum original commit: awslabs/s2n-bignum@061ea51
As with the earlier update for doublings, the Jacobian point adidtion
and mixed addition operations for the curves P-256, P-384, P-521,
secp256k1 and SM2 now all have the usual two versions targeting
different microarchitectures, one of them called "_alt", following the
general s2n-bignum convention.

The "_alt" forms for ARM now present are just renamed versions of the
originals (which were based on "_alt" field operations), with the new
code taking over the old non-alt name. For x86 the non-alt ones are
the same as before and the "_alt" forms are new.

s2n-bignum original commit: awslabs/s2n-bignum@acd4fd3
Complete improvements to Weierstrass point additions
s2n-bignum original commit: awslabs/s2n-bignum@b9266e7
…in tactics

This patch adds `bignum_mont{mul,sqr}_p384_neon` which are slightly faster than
`bignum_mont{mul,sqr}_p384`.
They use SIMD instructions and better scheduling found with SLOTHY.
Their correctness is verified using equivalence check w.r.t. specifications of their scalar ops.
The new SUBROUTINE lemmas are added to the specification list using
```
./tools/collect-specs.sh arm >arm/proofs/specifications.txt
```

Benchmark results on Graviton2:
```
bignum_montsqr_p384             :    58.6 ns each (var  0.3%, corr  0.06) =   17053295 ops/sec
bignum_montsqr_p384_neon        :    52.6 ns each (var  0.4%, corr -0.04) =   19017192 ops/sec
bignum_montmul_p384             :    72.9 ns each (var  0.2%, corr -0.02) =   13726633 ops/sec
bignum_montmul_p384_neon        :    68.1 ns each (var  0.3%, corr  0.02) =   14680905 ops/sec
```

Test and benchmark were updated to include these & fix incorrect naming bugs
in my previous p256_neon patch.

Also, some speedups in tactics are made:

1. `ARM_STEPS'_AND_ABBREV_TAC` and `ARM_STEPS'_AND_REWRITE_TAC`.

They are tactics for symbolic execution when showing equivalence of two programs
after reordering instructions.
`ARM_STEPS'_AND_ABBREV_TAC` does symbolic execution of the 'left' program and
abbreviates every RHS of new `read comp s = RHS`s,
meaning that after the tactic is done there are a bunch of equality assumptions whose
number increases linearly to the number of instructions.
`ARM_STEPS'_AND_REWRITE_TAC` then does symbolic execution of the 'right' program
and rewrites the results using the assumptions.
This means the overall complexity of `ARM_STEPS'_AND_REWRITE_TAC` was quadratic
to the number of instructions (# assum * # insts = (# insts)^2).
This is fixed to be (close to) linear, by separately maintaining the
abbreviations as a list of theorems internally rather than assumptions.
This doesn’t mean that the therotical time complexity is now linear,
but many tactics inside `ARM_STEPS'_AND_REWRITE_TAC` that inspect assumptions
now run linearly.

2. `FIND_HOLE_TAC`

`FIND_HOLE_TAC` tactic finds the 'hole' in the memory space that can place the
machine code that is used in program equivalence. This is done by inspecting
`nonoverlapping` assumptions, properly segmenting the memory with fixed-width
ranges and doing case analysis. Previously the # splitted cases was something
like 2^((# segments)^2), but now it is reduced to (# segments)^(#segments).
Comparing these two numbers is easier if logarithm is used.

Finally, some lemmas in existing `_neon.ml` proofs are updated so that
they do not mix usage of '*_mc' and '*_core_mc'. '*_core_mc' is a machine
code that is a sub-list of '*_mc' retrieved by stripping the callee-save register
store/loads as well as the ret instruction.
If possible, a lemmas is updated to only use '*_core_mc' because this
makes the modular usage of the lemma possible in bigger theorems.

s2n-bignum original commit: awslabs/s2n-bignum@d3a7b19
Add `bignum_mont{mul,sqr}_p384_neon`, speed improvements/refactoring in tactics
s2n-bignum original commit: awslabs/s2n-bignum@cbef866
This patch adds `bignum_mont{sqr,mul}_p521_neon`.

```
bignum_montsqr_p521             :   114.7 ns each (var  0.2%, corr  0.06) =    8720010 ops/sec
bignum_montsqr_p521_neon        :    83.8 ns each (var  0.4%, corr -0.04) =   11926387 ops/sec
bignum_montmul_p521             :   130.8 ns each (var  0.2%, corr -0.00) =    7644702 ops/sec
bignum_montmul_p521_neon        :   111.4 ns each (var  0.2%, corr  0.04) =    8978421 ops/sec
```

The new subroutine specs are added to specification.txt, and test as well as benchmark are updated.

Modular squaring/multiplication functions are not included in this patch.

This patch also contains the following updates:

- A tactic for showing equivalence of loops is added (the tactic is not used yet).
- Definitions for input state equivalence are canonicalized as `.. /\ (?a. read c1 s = a /\ read c1 s' = a /\ (?b. read c2 s = b /\ read c2 s' = b /\ ( ... )))`
- Minor buggy behaviors in equiv tactics are fixed and performance improvements done

s2n-bignum original commit: awslabs/s2n-bignum@65f046e
Add `bignum_mont{sqr,mul}_p521_neon`
s2n-bignum original commit: awslabs/s2n-bignum@e6ac9bd
This adds `bignum_{sqr,mul}_p521_neon` and their proofs.

The new subroutine specs are added to specification.txt, and test as well as benchmark are updated.

Benchmark results on GV2 are:

```
bignum_mul_p521                 :   135.1 ns each (var  0.2%, corr -0.01) =    7404184 ops/sec
bignum_mul_p521_neon            :   115.5 ns each (var  0.3%, corr  0.00) =    8660108 ops/sec
bignum_sqr_p521                 :   108.9 ns each (var  0.2%, corr  0.08) =    9184994 ops/sec
bignum_sqr_p521_neon            :    78.7 ns each (var  0.3%, corr  0.06) =   12708368 ops/sec
```

s2n-bignum original commit: awslabs/s2n-bignum@02df8e4
P-256 scalar multiplication and related tweaks
s2n-bignum original commit: awslabs/s2n-bignum@2237fe8
@dkostic dkostic requested a review from a team as a code owner July 22, 2024 21:31
@codecov-commenter
Copy link

codecov-commenter commented Jul 22, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 78.35%. Comparing base (98ccf4a) to head (a4f3e5a).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1718      +/-   ##
==========================================
- Coverage   78.35%   78.35%   -0.01%     
==========================================
  Files         573      573              
  Lines       96065    96059       -6     
  Branches    13764    13771       +7     
==========================================
- Hits        75270    75263       -7     
+ Misses      20195    20194       -1     
- Partials      600      602       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

skmcgrail
skmcgrail previously approved these changes Jul 23, 2024
Copy link
Contributor

@andrewhop andrewhop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got two differences with this PR when following the runbook:

  • third_party/s2n-bignum/arm/generic/bignum_copy_row_from_table_16_neon.S
  • third_party/s2n-bignum/arm/generic/bignum_copy_row_from_table_32_neon.S

@dkostic dkostic force-pushed the aws-lc-s2n-bignum-update-2024-07-22 branch from 7ccbc44 to 8c5996d Compare July 24, 2024 16:02
@dkostic dkostic merged commit b7d7a99 into aws:main Jul 24, 2024
103 checks passed
@dkostic dkostic deleted the aws-lc-s2n-bignum-update-2024-07-22 branch July 24, 2024 23:05
skmcgrail added a commit that referenced this pull request Aug 1, 2024
## What's Changed
* Added options to x509 tool by @ecdeye in
#1696
* Add support to detect Neoverse V2 cores by @andrewhop in
#1706
* Move OCSP functions for Ruby out of internal.h by @samuel40791765 in
#1704
* Add aes-256-xts to EVP_get_cipherbyname by @torben-hansen in
#1707
* Match using CMAKE_SYSTEM_PROCESSOR_LOWER by @justsmth in
#1709
* Update MySQL to 9.0.0 by @skmcgrail in
#1685
* [EC] Unify scalar multiplication for P-256/384/521 by @dkostic in
#1693
* Adds const qualifier to ciphertext parameter in EVP_PKEY_decapsulate
by @maddeleine in #1713
* Upstream merge 2024 06 24 by @nebeid in
#1661
* NIST SP 800-108r1-upd1: KDF Counter Implementation by @skmcgrail in
#1644
* Upstream merge 2024 07 09 by @nebeid in
#1694
* Design for support of HMAC precomputed keys by @fabrice102 in
#1574
* Fix for select point from table in ec_nistp scalar_mul by @dkostic in
#1719
* X509toolcomparison by @ecdeye in
#1714
* AWS-LC s2n-bignum update 2024-07-22 by @dkostic in
#1718
* Add OpenVPN to CI by @smittals2 in
#1705
* Lower required Go version, add CI test for specific version by
@andrewhop in #1717
* ec2-test-framework enhancements and graviton 4 testing by
@samuel40791765 in #1715
* sha + chacha: Move AArch64/X86-64 dispatching to C. by @justsmth in
#1625
* Show number of pruned ec2 instances in dashboard by @samuel40791765 in
#1728
* rsa and md5 tools by @ecdeye in
#1722
* FIPS 203 IPD update: ML-KEM-IPD-768 and ML-KEM-IPD-1024 by @jakemas in
#1724
* bump mysql CI to 9.0.1 by @samuel40791765 in
#1727
* Support utility OCSP request functions by @samuel40791765 in
#1708
* add support for OCSP_SINGLERESP functions by @samuel40791765 in
#1703
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants