Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove dead tail code from (non-SHA3) AES-GCM AArch64 kernel #1639

Merged
merged 4 commits into from
Jul 8, 2024

Conversation

hanno-becker
Copy link
Contributor

On AArch64 systems without support for EOR3, assembly kernels aes_gcm_enc_kernel and aes_gcm_dec_kernel from aesv8-gcm-armv8.pl are used for the bulk of AES-GCM processing. These kernels have dedicated tail code for handling inputs whose size is not a multiple of the block size (16 bytes).

However, the unique call-sites for aes_gcm_enc_kernel and aes_gcm_dec_kernel in gcm.c only invoke them with data of size a multiple of 16 bytes: See the masking here here and here. This renders the tail code in aesv8-gcm-armv8.pl dead.

Simply removing the truncation to 16-byte aligned data in gcm.c -- that is, attempting to let aes_gcm_{dec,enc}_kernel process the entire data -- leads to tests failing. It is not clear to me why that is, and in particular the tail code could be faulty. OpenSSL seems to behave similarly and call the AArch64 AES-GCM kernels for block-sized data only.

This PR removes the dead tail code from the non-SHA3 AES-GCM kernels aes_gcm_enc_kernel and aes_gcm_dec_kernel. In a first commit, the code is annotated to explain the effect of the tail code in case of block-aligned data. In the second commit, the tail code is removed.

It seems that a similar change can be made for the AES-GCM kernels leveraging SHA3 instructions, but is not attempted here. (The primary motivation for this change is to facilitate @pennyannn's verification work, which focuses on the non-SHA3 kernels).

@hanno-becker hanno-becker requested a review from a team as a code owner June 17, 2024 14:08
@codecov-commenter
Copy link

codecov-commenter commented Jun 17, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 78.20%. Comparing base (4368aaa) to head (1ed3d56).
Report is 18 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1639      +/-   ##
==========================================
- Coverage   78.22%   78.20%   -0.02%     
==========================================
  Files         566      571       +5     
  Lines       95165    95458     +293     
  Branches    13661    13706      +45     
==========================================
+ Hits        74442    74657     +215     
- Misses      20127    20191      +64     
- Partials      596      610      +14     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@hanno-becker
Copy link
Contributor Author

@torben-hansen Did I understand correctly that the CI failures are unrelated to this PR?

@hanno-becker
Copy link
Contributor Author

Review note: Make sure to review commit 12ac5a2 separately as it provides some explanation of why the lines that are removed are indeed dispensable.

On AArch64 systems without support for EOR3, assembly kernels
`aes_gcm_enc_kernel` and `aes_gcm_dec_kernel` from `aesv8-gcm-armv8.pl`
are used for the bulk of AES-GCM processing. These kernels have dedicated
tail code for handling inputs whose size is not a multiple of the
block size (16 bytes).

However, the unique call-site for `aes_gcm_enc_kernel` and
`aes_gcm_dec_kernel` in gcm.c only invokes them with data of
size a multiple of 16 bytes. This renders the tail code in
`aesv8-gcm-armv8.pl` dead.

Moreover, simply removing the truncation to 16-byte aligned data
in `gcm.c` -- that is, attempting to let `aes_gcm_{dec,enc}_kernel`
process the entire data -- leads to tests failing, which begs the
question whether the assembly is correct in this case.

This commit is a first step towards removing the tail code from
`aes_gcm_enc_kernel` and `aes_gcm_dec_kernel`, by reading it carefully
and checking how instructions simplify in the case of block-sized
data.
@hanno-becker hanno-becker force-pushed the aes_gcm_non_sha3_dead_tail_code branch from 4bc1aae to 19018fa Compare June 22, 2024 03:32
dkostic
dkostic previously approved these changes Jul 2, 2024
Copy link
Contributor

@dkostic dkostic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not material for this change, but I'm really curious why are the tests failing when you try to process non-block aligned data with the removed code.

Copy link
Contributor

@nebeid nebeid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @hanno-becker, for this contribution. Just a nit comment.

crypto/fipsmodule/modes/asm/aesv8-gcm-armv8.pl Outdated Show resolved Hide resolved
@hanno-becker hanno-becker force-pushed the aes_gcm_non_sha3_dead_tail_code branch from bef1489 to 1ed3d56 Compare July 6, 2024 04:28
@hanno-becker
Copy link
Contributor Author

hanno-becker commented Jul 7, 2024

@dkostic not material for this change, but I'm really curious why are the tests failing when you try to process non-block aligned data with the removed code.

It turns out that the tail code isn't buggy, it just presents a different interface to AES: For non-block aligned data, the tail code pads the data with 0 in the last block and computes AES + GHASH. That's different from the incremental behaviour of functions like CRYPTO_gcm128_encrypt_ctr32: When those find tail data smaller than a block, they just stash it in the AES context, until (a) further data completes the block, or (b) the entire computation is concluded, forcing the padding + final GHASH.

Rewriting the tail code assembly to suit the incremental AES interface is doable but not trivial (the tail code computes (((t3*H) + t2)*H + t1)*H=t3 * H^3 + t2 * H^2 + t1*H, so if you instead want to compute ((t3*H) + t2)*H + t1, you have to change precomputed H-factors in earlier parts of the tail code, or switch to a simpler tail code which does not use H^2, H^3), so I'd suggest to revisit this alongside the integration of the clean and SLOTHY-optimized AES-GCM at a later stage.

@hanno-becker
Copy link
Contributor Author

hanno-becker commented Jul 8, 2024

Tested on Graviton2 to replace unrelated CI failure:

mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Release .. && make -j8
./crypto/crypto_test
./ssl/ssl_test

All successful.

@nebeid nebeid merged commit 00fcba4 into aws:main Jul 8, 2024
97 of 99 checks passed
pennyannn added a commit to pennyannn/LNSym-public that referenced this pull request Aug 1, 2024
shigoel added a commit to leanprover/LNSym that referenced this pull request Aug 5, 2024
### Description:

The AES-GCM programs are updated in the following two PRs,
aws/aws-lc#1403 and PR
aws/aws-lc#1639. Updating them in LNSym as well.

### Testing:

Make all succeeds and conformance testing is successful. 

### License:

By submitting this pull request, I confirm that my contribution is
made under the terms of the Apache 2.0 license.

Co-authored-by: Shilpi Goel <shigoel@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants