Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support changing OPENSSL_armcap with environment variable on Apple 64-bit ARM systems #1045

Merged
merged 8 commits into from
Jul 31, 2023

Conversation

billbo-yang
Copy link
Contributor

@billbo-yang billbo-yang commented Jun 5, 2023

Issues:

Resolves CryptoAlg-1873

Description of changes:

AWS-LC currently uses a static ARM capabilities variable for 64 bit ARM Apple devices, as we know that certain hardware features are always present on them. This change gives Apple ARM users an option to define at runtime the CPU capabilities through an environment variable to determine whether to run the hardware vs. non-hardware implementation of an algorithm is used.

Call-outs:

Refactored handle_cpu_env() function in cpu_aarch64_linux.c to a general cpu_aarch64.c file to avoid duplicating the code in cpu_aarch64_apple.c. Modified CMakeFIle accordingly.

Testing:

Tests pass locally, confirmed that non-hardware APIs are used when OPENSSL_armcap=0:0 and that the correct hardware APIs are used when it is left unset. Behavior for testing this already exists in all_tests.json, used by all_tests.go.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and the ISC license.

@billbo-yang billbo-yang requested review from dkostic June 5, 2023 21:48
@billbo-yang billbo-yang marked this pull request as draft June 5, 2023 21:50
@billbo-yang billbo-yang marked this pull request as ready for review June 5, 2023 23:45
@billbo-yang billbo-yang changed the title Expose non-HW APIs on Apple by adding support for processing OPENSSL_armcap Support changing OPENSSL_armcap with environment variable on Apple ARM systems Jun 6, 2023
@skmcgrail skmcgrail requested a review from nebeid June 6, 2023 17:42
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In internal.h:L137
https://github.com/aws/aws-lc/pull/1045/files#diff-f86d797a597d637dd551bbb942147bbb4184579acc54cf0a01bc5831b3a0e0a6R137,
OPENSSL_STATIC_ARMCAP is define for Apple. Should it be removed?

How were the changes in this PR tested?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe this needs to be removed. I've tested the changes on both a m1 macbook as well as the iOS device we are concerned about by stepping through the code with a debugger and confirming that the non-hw implementation is used when OPENSSL_armcap is set to 0:0. I think this line is setting it for 32-bit Apple processors, which at this point is limited to very old (pre-2007 iirc) machines. I'll update the title and related tickets to reflect that we're only interested in aarch64

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed that it's for ARM (32-bit). Thank you for the clarification.

crypto/fipsmodule/cpucap/cpu_aarch64.c Outdated Show resolved Hide resolved
@billbo-yang billbo-yang changed the title Support changing OPENSSL_armcap with environment variable on Apple ARM systems Support changing OPENSSL_armcap with environment variable on Apple 64-bit ARM systems Jun 9, 2023
@billbo-yang billbo-yang force-pushed the expose_apple_nonhw_apis branch from 802056d to c1b91bc Compare June 9, 2023 22:36
@billbo-yang billbo-yang requested a review from nebeid June 9, 2023 23:20
@@ -26,7 +26,7 @@ void handle_cpu_env(uint32_t *out, const char *in) {

// Detect if the user is trying to use the environment variable to set
// a capability that is _not_ available on the CPU:
// If getauxval() returned a non-zero hwcap in `armcap` (out)
// If the runtime capability check (e.g via getauxval() on Linux) returned a non-zero hwcap in `armcap` (out)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line length

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes, I misunderstood the previous commit suggestion for this line! Will add 👍

@@ -167,7 +167,7 @@ extern uint32_t OPENSSL_armcap_P;
// |CRYPTO_is_ARMv8_AES_capable| and |CRYPTO_is_ARMv8_PMULL_capable|
// for checking the support for AES and PMULL instructions, respectively.
OPENSSL_INLINE int CRYPTO_is_NEON_capable(void) {
#if defined(OPENSSL_STATIC_ARMCAP_NEON) || defined(__ARM_NEON)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why were the second parts of the checks removed? Aren't these possible compiler flags that can set the capability statically?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to ACLE, __ARM_NEON is always set to 1 for AARCH64 platforms. As such, if we want OPENSSL_armcap to determine use of the C implementation, we can't check for this flag since it will always be set. We could use the -march flag when compiling to disable NEON, but since OPENSSL_armcap is supposed to determine processor capability at runtime I decided on this approach to reduce confusion.

cpu_aarch64_apple.c (as well as the counterpart files for other CPUs) still set OPENSSL_armcap to be the correct value for the processor in OPENSSL_cpuid_setup(), so removing the check doesn't disable NEON entirely, either.

This applies to __ARM_FEATURE_AES as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't these values be used on ARM processors (32-bit)? We have users who build AWS-LC on them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cpucap.c detects if LC is built on 32-bit ARM and sets OPENSSL_armcap accordingly, so when we call a function like CRYPTO_is_NEON_capable(), even without a check for these values, it will correctly detect the CPU capabilities.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be missing something, but I think if OPENSSL_STATIC_ARMCAP is defined but not OPENSSL_STATIC_ARMCAP_NEON due to an old compiler that defines __ARM_NEON, it will not reach the check for OPENSSL_armcap and will return 0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the functions to no longer include the ifdef statements, similar to how the capability functions for x86 processors in the same file do it. Since OPENSSL_armcap_P is set correctly even when OPENSSL_STATIC_ARMCAP is defined, the functions should still return the right result.

Either way, a BORINGSSL_DISPATCH_TEST to ensure that this doesn't break anything will be introduced in a separate PR (which we should merge before this one)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'd also need to measure the performance in FIPS build with and without the ifdefs for static build (possibly for dynamic as well) when the STATIC_ARMCAPs are defined. Fetching the OPENSSL_armcap_P variable had some implications on the static FIPS build in x86 #856.

Copy link
Contributor Author

@billbo-yang billbo-yang Jul 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performed a benchmark on both the dynamic and static builds comparing performance of AES on main to the PR branch:

  • FIPS mode set
  • OPENSSL_STATIC_ARMCAP and other static macros set
  • Graviton2 on AL2

No performance differences between main & PR for either the dynamic nor static builds.

Full results can be seen here:
aws-lc-dynamic-staticarmcap-aes_main_vs_pr1045_bm.csv
aws-lc-static-staticarmcap-aes_main_vs_pr1045_bm.csv

@dkostic
Copy link
Contributor

dkostic commented Jun 12, 2023

Could we add the same BORINGSLL_DISPATCH_TEST as we do for x86?

@billbo-yang
Copy link
Contributor Author

Could we add the same BORINGSLL_DISPATCH_TEST as we do for x86?

Discussed offline, will submit another PR with these changes as the scope is larger than the context of this one.

dkostic
dkostic previously approved these changes Jun 20, 2023
nebeid
nebeid previously approved these changes Jul 25, 2023
return (OPENSSL_armcap_P & ARMV8_PMULL) != 0;
#endif
}

OPENSSL_INLINE int CRYPTO_is_ARMv8_GCM_8x_capable(void) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about this and the next function? Should we remove the #if defined(OPENSSL_STATIC_ARMCAP) branch in those too?

@billbo-yang billbo-yang requested a review from dkostic July 26, 2023 19:30
@billbo-yang billbo-yang merged commit 4d89fae into aws:main Jul 31, 2023
billbo-yang added a commit to billbo-yang/aws-lc that referenced this pull request Jul 31, 2023
…-bit ARM systems (aws#1045)

* use handle_cpu_env from cpu_aarch64_linux.c to cpu_aarch64_apple.c

* always use OPENSSL_armcap() when detecting hardware capabilities on arm

* refactor duplicated into cpu_aarch64.h/c

* add ifdefs to cpu_aarch64.h/c

* Update comments

Co-authored-by: Nevine Ebeid <66388554+nebeid@users.noreply.github.com>

* adjust line length of comment

* set arm capabilities functions to fillow format of x86 ones

* make rest of the arm capability functions in line with x86 ones

---------

Co-authored-by: Nevine Ebeid <66388554+nebeid@users.noreply.github.com>
Co-authored-by: Bill Yang <wbyang@amazon.com>
billbo-yang added a commit to billbo-yang/aws-lc that referenced this pull request Jul 31, 2023
…-bit ARM systems (aws#1045)

* use handle_cpu_env from cpu_aarch64_linux.c to cpu_aarch64_apple.c

* always use OPENSSL_armcap() when detecting hardware capabilities on arm

* refactor duplicated into cpu_aarch64.h/c

* add ifdefs to cpu_aarch64.h/c

* Update comments

Co-authored-by: Nevine Ebeid <66388554+nebeid@users.noreply.github.com>

* adjust line length of comment

* set arm capabilities functions to fillow format of x86 ones

* make rest of the arm capability functions in line with x86 ones

---------

Co-authored-by: Nevine Ebeid <66388554+nebeid@users.noreply.github.com>
Co-authored-by: Bill Yang <wbyang@amazon.com>
@skmcgrail skmcgrail mentioned this pull request Aug 1, 2023
billbo-yang added a commit that referenced this pull request Aug 1, 2023
* Add BoringSSL Dispatch Test for aarch64

* Make dispatch tests use corruptible registers on aarch64. (#1118)

* Support changing OPENSSL_armcap with environment variable on Apple 64-bit ARM systems (#1045)

* use handle_cpu_env from cpu_aarch64_linux.c to cpu_aarch64_apple.c

* always use OPENSSL_armcap() when detecting hardware capabilities on arm

* fix device farm CI build with boto3 (#990)

---------

Co-authored-by: Bill Yang <wbyang@amazon.com>
Co-authored-by: Nevine Ebeid <66388554+nebeid@users.noreply.github.com>
Co-authored-by: Samuel Chiang <sachiang@amazon.com>
@billbo-yang billbo-yang deleted the expose_apple_nonhw_apis branch January 11, 2024 18:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants