Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Universal abi used in amm functions, no need for OS differentiation #1869

Merged
merged 5 commits into from
Dec 10, 2024

Conversation

pittma
Copy link
Contributor

@pittma pittma commented Sep 20, 2024

During the review process, I mistakenly believed that the @_6_arg_universal_ABI did not account for Windows, as it conspicuously matches the Linux calling convention. Additionally, Windows does not have 6 registers in its calling convention, only 4—args 5 and 6 would be pushed onto the stack, so what I had here was wrong in two ways.

While debugging, I could see that this was segfaulting during a read of a pointer in %r11; this was what tipped me off.

Reading symbols from .\crypto\crypto_test.exe...
(gdb) run
Starting program: C:\Users\sdp\src\aws-lc\build\crypto\crypto_test.exe "--gtest_filter=BNTest.ModExp2TestVectors"
[New Thread 29840.0x700c]
[New Thread 29840.0x1b54]
[New Thread 29840.0x64b8]
Debugger detected. Disabling unwind tests.
Note: Google Test filter = BNTest.ModExp2TestVectors
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from BNTest
[ RUN      ] BNTest.ModExp2TestVectors
[New Thread 29840.0x64b4]

Thread 1 received signal SIGSEGV, Segmentation fault.
0x00007ff6cb416280 in L$loop5 ()
(gdb) disassemble 0x00007ff6cb416280
Dump of assembler code for function L$loop5:
=> 0x00007ff6cb416280 <+0>:     mov    (%r11),%r13

Running crypto_test was successful locally on:

Microsoft Windows Server
Version 21H2 (OS Build 20348.405)

On an SPR-based system. I have not tested this change on Linux locally yet, but I will in the next few hours.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and the ISC license.

@pittma pittma requested a review from a team as a code owner September 20, 2024 22:28
@dkostic
Copy link
Contributor

dkostic commented Sep 20, 2024

Hey Dan, thanks for looking into this! Could you please re-enable the code path (https://github.com/aws/aws-lc/blob/main/crypto/fipsmodule/cpucap/internal.h#L144) to test the change in the CI?

@pittma
Copy link
Contributor Author

pittma commented Sep 20, 2024

Hey Dan, thanks for looking into this! Could you please re-enable the code path (https://github.com/aws/aws-lc/blob/main/crypto/fipsmodule/cpucap/internal.h#L144) to test the change in the CI?

2e8d424 does include that change!

@codecov-commenter
Copy link

codecov-commenter commented Sep 20, 2024

Codecov Report

Attention: Patch coverage is 0% with 21 lines in your changes missing coverage. Please review.

Project coverage is 78.67%. Comparing base (b77a698) to head (f9d008d).

Files with missing lines Patch % Lines
crypto/fipsmodule/bn/bn_test.cc 0.00% 21 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1869      +/-   ##
==========================================
- Coverage   78.68%   78.67%   -0.02%     
==========================================
  Files         585      585              
  Lines      100854   100861       +7     
  Branches    14298    14298              
==========================================
- Hits        79357    79351       -6     
- Misses      20863    20875      +12     
- Partials      634      635       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

dkostic
dkostic previously approved these changes Sep 23, 2024
justsmth added a commit that referenced this pull request Sep 25, 2024
### Description of changes: 
This extends the work done on #1869.
* Updates to latest version of Intel SDE;
* Fixes minor bug in Go test script.
* Disables use of AVX512 IFMA on Windows.

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license and the ISC license.

---------

Co-authored-by: Dan Pittman <dan@dpitt.me>
Co-authored-by: Justin Smith <justsmth@amazon.com>
@justsmth justsmth requested a review from nebeid October 16, 2024 13:46
@pittma
Copy link
Contributor Author

pittma commented Oct 28, 2024

I have been able to reproduce the issue, but it remains inconsistent. Sometimes it even happens with MSVC 2022, but not always; I’ve never seen it happen with MinGW. I have stepped through the code with Windbg and gdb, depending on which I compiler I use, and when using a MSVC compiler, these lines in bn_test.cc cause the tables’ addresses to get mangled[1][2]:

CHECK_ABI(rsaz_amm52x20_x2_ifma256, &res, &a, &b, &m, k2);
CHECK_ABI(extract_multiplier_2x20_win5, &red_Y, red_table2k, idx1, idx2);

When I hit the first line I copied above, this is what red_table2k’s address and content look like:

0x27328334898 : 0xcdcdcdcdcdcdcdcd

What I would expect for un-zeroed malloc’d memory. When I step over that line, despite it not using red_table2k, this is what I see for that variable:

0xa8ea26dd20138 : Unable to read memory at Address 0xa8ea26dd20138

The address is garbage now, and when we hit the first access[3], SEH c0..5 is raised—an access fault. Through the help of a data breakpoint, I’ve found that, with MSVC compilers, the RSAZABI test is trashing the stack because the variables on the stack do not match the sizes that the assembly routines expect. GCC must be protecting me from shooting myself in the foot in some way, but MSVC is perfectly happy to let me do the dumb thing.

  1. One of the things I’ve changed here is to use heap memory instead of allocating these arrays on the stack.
  2. CHECK_ABI(rsaz_amm52x20_x1_ifma256, &res, &a, &b, &m, k0);
    CHECK_ABI(rsaz_amm52x20_x2_ifma256, &res, &a, &b, &m, k2);
    CHECK_ABI(extract_multiplier_2x20_win5, &red_Y, red_table2k, idx1, idx2);
  3. $code.="vmovdqa64 $t0, $t[$_] \n";

TL;DR

The SDE tests that hung were because I blew the stack and that specific type of failure was not caught by the Go automation. When I moved to using heap memory for the tables, but nothing else, I no longer blew the stack, but the assembly routines were wrecking it near the res and red_Y variables. The new change allocates the correct sizes for the tables, and each res and red_y variable on the heap, similar to the manner memory is handled in rsaz_exp_x2.c.

@justsmth justsmth requested a review from dkostic October 29, 2024 12:56
@torben-hansen
Copy link
Contributor

On a c7i instance with Amazon Linux 2023 (meaning GCC 11.4.1 tooling and ifma instructions), we ran into this issue! However, on an Ubuntu box with GCC 11.4.0 tooling couldn't reproduce...

@dkostic dkostic merged commit 302e539 into aws:main Dec 10, 2024
115 of 116 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants