Implement SSL_MODE_AUTO_RETRY #1333

WillChilds-Klein · 2023-12-01T19:36:43Z

Issues:

Addresses CryptoAlg-2136

Description of changes:

Notes

This change implements SSL_MODE_AUTO_RETRY, which was previously ignored. When this option (off by default) is set, SSL_get_errors behavior hews a little more closely to OpenSSL when processing a return code of 0 (i.e. empty I/O). With this option set, we allow error processing for empty reads if we're in a retryable state. With this option unset, we treat (most) 0-valued return codes as SSL_ERROR_SYSCALL, using it as a sort-of alias for EOF in the underlying transport. Internal calls to SSL_get_error (such as those in bio_ssl.cc) then observe the SSL_ERROR_SYSCALL and fail to process potential retries.

OpenSSL implements this mode differently, only checking its value in its main read loop. Neither OpenSSL 1.1.1 nor OpenSSL 3.x return SSL_ERROR_SYSCALL early for empty reads, instead allowing them to be processed for retryable state. BoringSSL appears to have diverged from this behavior in 2015, but that commit was just a revert of a regression committed ~1 month prior. Their previous implementation closely resembles that of OpenSSL 1.0.2, guarding retryable error processing with i < 0 checks. SSL_get_error in OpenSSL 1.1.1 and OpenSSL 3.0, however, have no such guards. This change resembles the latter two.

Why is it important that we process 0-valued return codes more fully? The recommended SSL IO functions (SSL_read, SSL_write, etc.) are, by contract, allowed to return negative numbers to indicate an error or non-success condition. However, their size_t-clean counterparts (SSL_read_ex, SSL_write_ex, etc.) must, by contract, only return 0 or 1. This means that passing the return code from the various _ex functions to SSL_get_error loses error reporting information, resulting in inconsistent handling of otherwise identical error cases between the two function families.

Another solution to the _ex functions' loss of error reporting information would be to change the contract of those functions to allow negative return values, directly returning the return code from internal calls to SSL_read or SSL_write. I decided not to take this approach as it would break callers who do not account for negative return values in their error handling.

Relevant Links

Call-outs:

n/a

Testing:

CI tests
PR Add cpython integration test #1359 runs relevant CPython module tests against this change

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and the ISC license.

codecov-commenter · 2023-12-01T19:59:17Z

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (d9caaca) 76.85% compared to head (85c669f) 76.84%.

Files	Patch %	Lines
ssl/ssl_lib.cc	60.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1333      +/-   ##
==========================================
- Coverage   76.85%   76.84%   -0.02%     
==========================================
  Files         425      425              
  Lines       71498    71502       +4     
==========================================
- Hits        54948    54943       -5     
- Misses      16550    16559       +9

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

samuel40791765 · 2023-12-14T23:31:54Z

This seemed relevant to the test failures I'm running into with Bind9, so I tried applying the change to see if it would resolve my issues. It results in the tests from Bind9 hanging forever though (the tests used to directly fail) :(.
I'm pasting this code block from bind9 here, it seems they were dependent on detecting the original behaviour:

https://gitlab.isc.org/isc-projects/bind9/-/blob/4330014fce5ab2ac1a6e58cd330762318a34f8d9/lib/isc/netmgr/tlsstream.c#L731-750

ssl/ssl_test.cc

andrewhop · 2024-01-09T23:13:06Z

ssl/ssl_test.cc

+  ret = SSL_read(client.get(), buf, sizeof(buf));
+  EXPECT_EQ(ret, (int) sizeof(buf));
+  EXPECT_EQ(SSL_get_error(client.get(), ret), SSL_ERROR_NONE);
+}


If you tried to read another byte here what do you get?

we get SSL_ERROR_WANT_READ from SSL_get_error, but this time with a ret val of -1 (as opposed to 0 retval on ssl_test.cc L10502). SSL_ERROR_WANT_READ is retryable, that doesn't seem right....

moving PR back to draft pending further investigation.

well, I confirmed that even before this change, trying to read additional bytes after exhausting the rbio resulted in SSL_ERROR_WANT_READ. so, it looks like this change doesn't affect that specific behavior.

This reverts commit 5626fe8.

This reverts commit 0ddfd60.

This reverts commit d54629b.

ssl/ssl_lib.cc

WillChilds-Klein · 2024-01-17T18:10:27Z

This seemed relevant to the test failures I'm running into with Bind9, so I tried applying the change to see if it would resolve my issues. It results in the tests from Bind9 hanging forever though (the tests used to directly fail) :(. I'm pasting this code block from bind9 here, it seems they were dependent on detecting the original behaviour:
* https://gitlab.isc.org/isc-projects/bind9/-/blob/4330014fce5ab2ac1a6e58cd330762318a34f8d9/lib/isc/netmgr/tlsstream.c#L731-750

@samuel40791765 -- from what I can tell, bind9 does not set SSL_MODE_AUTO_RETRY, so this change should not affect that integration.

@davidben

This change implements `SSL_MODE_AUTO_RETRY`, which was previously ignored. When this option (off by default) is set, `SSL_get_error`s behavior hews a little more closely to OpenSSL when processing a return code of 0 (i.e. empty I/O). With this option set, we allow error processing for empty reads if we're in a retryable state. With this option unset, we treat (most) 0-valued return codes as `SSL_ERROR_SYSCALL`, using it as a sort-of alias for EOF in the underlying transport. Internal calls to `SSL_get_error` (such as [those in `bio_ssl.cc`](https://github.com/aws/aws-lc/blob/main/ssl/bio_ssl.cc#L29)) then observe the `SSL_ERROR_SYSCALL` and fail to process potential retries. OpenSSL implements this mode differently, only checking its value in its [main read loop](https://github.com/openssl/openssl/blob/398011848468c7e8e481b295f7904afc30934217/ssl/record/rec_layer_s3.c#L992). Neither OpenSSL [1.1.1][6] nor [OpenSSL 3.x][10] return `SSL_ERROR_SYSCALL` early for empty reads, instead allowing them to be processed for retryable state. BoringSSL [appears to have diverged][7] from this behavior in 2015, but that commit was just a revert of a regression committed ~1 month prior. Their [previous implementation][18] closely resembles [that of OpenSSL 1.0.2][13], guarding retryable error processing with `i < 0` checks. `SSL_get_error` in [OpenSSL 1.1.1][12] and [OpenSSL 3.0][11], however, have no such guards. This change resembles the latter two. Why is it important that we process 0-valued return codes more fully? The recommended SSL IO functions (`SSL_read`, `SSL_write`, etc.) are, by contract, allowed to return negative numbers to indicate an error or non-success condition. However, their `size_t`-clean counterparts (`SSL_read_ex`, `SSL_write_ex`, etc.) must, by contract, only return 0 or 1. This means that passing the return code from the various `_ex` functions to `SSL_get_error` loses error reporting information, resulting in inconsistent handling of otherwise identical error cases between the two function families. Another solution to the `_ex` functions' loss of error reporting information would be to change the contract of those functions to allow negative return values, directly returning the return code from internal calls to `SSL_read` or `SSL_write`. I decided not to take this approach as it would break callers who do not account for negative return values in their error handling. - [OpenSSL 1.0.2 documentation for `SSL_get_error`][3] - [OpenSSL 1.1.1 documentation for `SSL_get_error`][4] - [OpenSSL 3.0 documentation for `SSL_get_error`][5] - [OpenSSL 1.0.2 implementation for `SSL_get_error`][13] - [OpenSSL 1.1.1 implementation for `SSL_get_error`][12] - [OpenSSL 3.0 implementation for `SSL_get_error`][11] - [OpenSSL master (3.2) implementation for `SSL_get_error`][10] - [OpenSSL 1.1.1 commit dropping `ret < 0` guards for hint processing][6] - [Unresolved OpenSSL issue for distinguishing EOF from failure in `BIO_read_ex`][1] - [BoringSSL commit introducing `SSL_ERROR_SYSCALL` for `ret == 0` case][7] - [BoringSSL Issue 503: Stop preserving arbitrary BIO error return values][2] - [BoringSSL Issue 24: Sanitize SSL and BIO return codes.][15] - [OpenSSL 3.0 commit returning `SSL_ERROR_SSL`, add reason to err stack on EOF][14] - [OpenSSL 3.0 commit introducing `SSL_OP_IGNORE_UNEXPECTED_EOF` option][8] - [CPython commit from @davidben removing CPython's use of `SSL_OP_IGNORE_UNEXPECTED_EOF`][9] - [CPython commit switching from `SSL_read` and `SSL_write` to `SSL_read_ex` and `SSL_write_ex`][16] - [OpenSSL 1.1.1 documentation for `SSL_MODE_AUTO_RETRY`][17] [1]: openssl/openssl#8208 [2]: https://bugs.chromium.org/p/boringssl/issues/detail?id=503 [3]: https://www.openssl.org/docs/man1.0.2/man3/SSL_get_error.html [4]: https://www.openssl.org/docs/man1.1.1/man3/SSL_get_error.html [5]: https://www.openssl.org/docs/man3.0/man3/SSL_get_error.html [6]: openssl/openssl@8051ab2 [7]: google/boringssl@9a38e92 [8]: openssl/openssl@09b90e0 [9]: python/cpython#95495 [10]: https://github.com/openssl/openssl/blob/master/ssl/ssl_lib.c#L4601 [11]: https://github.com/openssl/openssl/blob/openssl-3.0/ssl/ssl_lib.c#L3839 [12]: https://github.com/openssl/openssl/blob/OpenSSL_1_1_1-stable/ssl/ssl_lib.c#L3617 [13]: https://github.com/openssl/openssl/blob/OpenSSL_1_0_2-stable/ssl/ssl_lib.c#L2709 [14]: openssl/openssl@d924dbf [15]: https://bugs.chromium.org/p/boringssl/issues/detail?id=24 [16]: python/cpython@89d1550 [17]: https://www.openssl.org/docs/man1.1.1/man3/SSL_CTX_set_mode.html [18]: google/boringssl@fcf2583#diff-38e1eecb6013c2fe8bf194e5020ce1afd6e31c371d7849e041464ad822fe745bL2271

@davidben

# Notes This change implements `SSL_MODE_AUTO_RETRY`, which was previously ignored. When this option (off by default) is set, `SSL_get_error`s behavior hews a little more closely to OpenSSL when processing a return code of 0 (i.e. empty I/O). With this option set, we allow error processing for empty reads if we're in a retryable state. With this option unset, we treat (most) 0-valued return codes as `SSL_ERROR_SYSCALL`, using it as a sort-of alias for EOF in the underlying transport. Internal calls to `SSL_get_error` (such as [those in `bio_ssl.cc`](https://github.com/aws/aws-lc/blob/main/ssl/bio_ssl.cc#L29)) then observe the `SSL_ERROR_SYSCALL` and fail to process potential retries. OpenSSL implements this mode differently, only checking its value in its [main read loop](https://github.com/openssl/openssl/blob/398011848468c7e8e481b295f7904afc30934217/ssl/record/rec_layer_s3.c#L992). Neither OpenSSL [1.1.1][6] nor [OpenSSL 3.x][10] return `SSL_ERROR_SYSCALL` early for empty reads, instead allowing them to be processed for retryable state. BoringSSL [appears to have diverged][7] from this behavior in 2015, but that commit was just a revert of a regression committed ~1 month prior. Their [previous implementation][18] closely resembles [that of OpenSSL 1.0.2][13], guarding retryable error processing with `i < 0` checks. `SSL_get_error` in [OpenSSL 1.1.1][12] and [OpenSSL 3.0][11], however, have no such guards. This change resembles the latter two. Why is it important that we process 0-valued return codes more fully? The recommended SSL IO functions (`SSL_read`, `SSL_write`, etc.) are, by contract, allowed to return negative numbers to indicate an error or non-success condition. However, their `size_t`-clean counterparts (`SSL_read_ex`, `SSL_write_ex`, etc.) must, by contract, only return 0 or 1. This means that passing the return code from the various `_ex` functions to `SSL_get_error` loses error reporting information, resulting in inconsistent handling of otherwise identical error cases between the two function families. Another solution to the `_ex` functions' loss of error reporting information would be to change the contract of those functions to allow negative return values, directly returning the return code from internal calls to `SSL_read` or `SSL_write`. I decided not to take this approach as it would break callers who do not account for negative return values in their error handling. # Relevant Links - [OpenSSL 1.0.2 documentation for `SSL_get_error`][3] - [OpenSSL 1.1.1 documentation for `SSL_get_error`][4] - [OpenSSL 3.0 documentation for `SSL_get_error`][5] - [OpenSSL 1.0.2 implementation for `SSL_get_error`][13] - [OpenSSL 1.1.1 implementation for `SSL_get_error`][12] - [OpenSSL 3.0 implementation for `SSL_get_error`][11] - [OpenSSL master (3.2) implementation for `SSL_get_error`][10] - [OpenSSL 1.1.1 commit dropping `ret < 0` guards for hint processing][6] - [Unresolved OpenSSL issue for distinguishing EOF from failure in `BIO_read_ex`][1] - [BoringSSL commit introducing `SSL_ERROR_SYSCALL` for `ret == 0` case][7] - [BoringSSL Issue 503: Stop preserving arbitrary BIO error return values][2] - [BoringSSL Issue 24: Sanitize SSL and BIO return codes.][15] - [OpenSSL 3.0 commit returning `SSL_ERROR_SSL`, add reason to err stack on EOF][14] - [OpenSSL 3.0 commit introducing `SSL_OP_IGNORE_UNEXPECTED_EOF` option][8] - [CPython commit from @davidben removing CPython's use of `SSL_OP_IGNORE_UNEXPECTED_EOF`][9] - [CPython commit switching from `SSL_read` and `SSL_write` to `SSL_read_ex` and `SSL_write_ex`][16] - [OpenSSL 1.1.1 documentation for `SSL_MODE_AUTO_RETRY`][17] [1]: openssl/openssl#8208 [2]: https://bugs.chromium.org/p/boringssl/issues/detail?id=503 [3]: https://www.openssl.org/docs/man1.0.2/man3/SSL_get_error.html [4]: https://www.openssl.org/docs/man1.1.1/man3/SSL_get_error.html [5]: https://www.openssl.org/docs/man3.0/man3/SSL_get_error.html [6]: openssl/openssl@8051ab2 [7]: google/boringssl@9a38e92 [8]: openssl/openssl@09b90e0 [9]: python/cpython#95495 [10]: https://github.com/openssl/openssl/blob/master/ssl/ssl_lib.c#L4601 [11]: https://github.com/openssl/openssl/blob/openssl-3.0/ssl/ssl_lib.c#L3839 [12]: https://github.com/openssl/openssl/blob/OpenSSL_1_1_1-stable/ssl/ssl_lib.c#L3617 [13]: https://github.com/openssl/openssl/blob/OpenSSL_1_0_2-stable/ssl/ssl_lib.c#L2709 [14]: openssl/openssl@d924dbf [15]: https://bugs.chromium.org/p/boringssl/issues/detail?id=24 [16]: python/cpython@89d1550 [17]: https://www.openssl.org/docs/man1.1.1/man3/SSL_CTX_set_mode.html [18]: google/boringssl@fcf2583#diff-38e1eecb6013c2fe8bf194e5020ce1afd6e31c371d7849e041464ad822fe745bL2271

1. This adds an integration CI dimension for Bind9 2. Resolved "cmocka unit tests" for Bind9 * Additional <openssl/asn1.h> import in <openssl/objects.h>: Bind depends on some ASN1 functions, but does not directly import the corresponding header. OpenSSL imports the asn1 header file in objects.h (which Bind is pulling these symbols from), so I've added the header file reference to objects.h. * SSL_get_error error anticipation fixing: There were several failures discovered to be related this, thanks to research done in Implement SSL_MODE_AUTO_RETRY #1333. The issue was pinned down the check implemented in google/boringssl@9a38e92. This check used to exist before the final return of SSL_get_error in OpenSSL. Upstream moved this earlier in the function with google/boringssl@fcf2583. However, much of the functions guards for i < 0 checks have been removed since OpenSSL 1.1.1, so the early logic no longer applies. This check has evolved into SSL_ERROR_ZERO_RETURN in our code. Moving the check further down helps us gain better parity with OpenSSL 1.1.1. Doing so passes the bind test failures for proxystream_test, tls_test, and doh_test. This also happens to help our integration with CPython, so I've reconfigured that patch. We actually already use SSL_AUTO_RETRY by default in AWS-LC. The recent change mentioned in the point above surrounding the flag (208327e) was just to make some of the errors consistent in CPython when the flag was used. I've reverted the special behavior surrounding it since it should no longer be needed. * Assertion for SSL_set_shutdown: The assertion was added in 63006a9, where it’s stated that we didn’t want SSL_set_shutdown messing up the state machine. This assertion is causing failures in tlsdns_test for Bind9, so it appears that we'll have to remove this to gain better OpenSSL parity. 3. Patch file needed for Bind seems to be slight bug in their build configuration. This was from a fairly recent commit. We can look to contribute this sometime soon.

WillChilds-Klein force-pushed the libssl-handle-EAGAIN branch 2 times, most recently from 9607aca to 06f9cc4 Compare December 6, 2023 20:43

WillChilds-Klein changed the title ~~Handle EAGAIN in SSL_get_error~~ Don't return SSL_ERROR_SYSCALL for potential EOF Dec 6, 2023

WillChilds-Klein force-pushed the libssl-handle-EAGAIN branch 5 times, most recently from 82eae99 to 6065eed Compare December 13, 2023 17:26

WillChilds-Klein force-pushed the libssl-handle-EAGAIN branch 4 times, most recently from 0ddfd60 to 6dd64f7 Compare December 20, 2023 18:24

WillChilds-Klein force-pushed the libssl-handle-EAGAIN branch 3 times, most recently from e335d43 to 3531382 Compare January 4, 2024 15:44

WillChilds-Klein marked this pull request as ready for review January 4, 2024 16:29

WillChilds-Klein requested a review from a team as a code owner January 4, 2024 16:29

WillChilds-Klein enabled auto-merge (squash) January 4, 2024 16:30

justsmth self-requested a review January 4, 2024 16:34

WillChilds-Klein changed the title ~~Don't return SSL_ERROR_SYSCALL for potential EOF~~ Guard EOF "error" return in SSL_get_error Jan 5, 2024

WillChilds-Klein requested a review from andrewhop January 8, 2024 19:00

WillChilds-Klein force-pushed the libssl-handle-EAGAIN branch from 3531382 to 5cbb984 Compare January 9, 2024 23:29

andrewhop reviewed Jan 9, 2024

View reviewed changes

WillChilds-Klein disabled auto-merge January 10, 2024 00:39

WillChilds-Klein marked this pull request as draft January 10, 2024 00:40

WillChilds-Klein force-pushed the libssl-handle-EAGAIN branch from b5772a5 to d3c39a9 Compare January 10, 2024 21:38

WillChilds-Klein added the automation-exempt This issue or pull request is exempt from automatic closure. label Jan 11, 2024

WillChilds-Klein force-pushed the libssl-handle-EAGAIN branch from d3c39a9 to 98736bd Compare January 12, 2024 00:47

WillChilds-Klein requested a review from andrewhop January 12, 2024 17:23

WillChilds-Klein and others added 13 commits January 16, 2024 16:13

Don't return SSL_ERROR_SYSCALL for EOF

cfc19c3

Try returning SSL_ERROR_SSL and set reason

c7a55aa

Revert "Try returning SSL_ERROR_SSL and set reason"

a06c57b

This reverts commit 5626fe8.

Move SSL_ERROR_ZERO_RETURN check to match OSSL 1.1.1

8d9b2fb

Revert "Move SSL_ERROR_ZERO_RETURN check to match OSSL 1.1.1"

871c8f1

This reverts commit 0ddfd60.

Revert "Don't return SSL_ERROR_SYSCALL for EOF"

97e0230

This reverts commit d54629b.

Set out len before returning on err

a8829dc

Return raw retval on error

19d26a0

Revert other changes, guard ERR_SYSCALL return

0aa0834

Revert guard, impl. SSL_MODE_AUTO_RETRY

8b160bf

Reinstate guard, abandon SSL_MODE_AUTO_RETRY

1683627

Condition SSL_ERROR_SYSCALL suppression on SSL_MODE_AUTO_RETRY

a866c11

Add assertions around BIO method setters

6f4c696

WillChilds-Klein force-pushed the libssl-handle-EAGAIN branch from 6b4891e to 6f4c696 Compare January 16, 2024 21:13

Consolidate test logic

0457256

justsmth previously approved these changes Jan 16, 2024

View reviewed changes

ssl/ssl_lib.cc Outdated Show resolved Hide resolved

Make source change more readable

85c669f

WillChilds-Klein dismissed justsmth’s stale review via 85c669f January 16, 2024 23:40

justsmth approved these changes Jan 16, 2024

View reviewed changes

andrewhop approved these changes Jan 18, 2024

View reviewed changes

WillChilds-Klein enabled auto-merge (squash) January 18, 2024 22:27

WillChilds-Klein removed the automation-exempt This issue or pull request is exempt from automatic closure. label Jan 18, 2024

WillChilds-Klein merged commit 208327e into aws:main Jan 19, 2024
35 of 36 checks passed

samuel40791765 mentioned this pull request Jan 26, 2024

Slight tweaks and integration CI to support Bind9 #1423

Merged

andrewhop mentioned this pull request May 3, 2024

Add support for BIO_read/write_ex; Update MySQL CI to 8.4; #1568

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement SSL_MODE_AUTO_RETRY #1333

Implement SSL_MODE_AUTO_RETRY #1333

WillChilds-Klein commented Dec 1, 2023 •

edited

Loading

codecov-commenter commented Dec 1, 2023 •

edited

Loading

samuel40791765 commented Dec 14, 2023 •

edited

Loading

andrewhop Jan 9, 2024

WillChilds-Klein Jan 10, 2024

WillChilds-Klein Jan 12, 2024 •

edited

Loading

WillChilds-Klein commented Jan 17, 2024

Implement SSL_MODE_AUTO_RETRY #1333

Implement SSL_MODE_AUTO_RETRY #1333

Conversation

WillChilds-Klein commented Dec 1, 2023 • edited Loading

Issues:

Description of changes:

Notes

Relevant Links

Call-outs:

Testing:

codecov-commenter commented Dec 1, 2023 • edited Loading

Codecov Report

samuel40791765 commented Dec 14, 2023 • edited Loading

andrewhop Jan 9, 2024

Choose a reason for hiding this comment

WillChilds-Klein Jan 10, 2024

Choose a reason for hiding this comment

WillChilds-Klein Jan 12, 2024 • edited Loading

Choose a reason for hiding this comment

WillChilds-Klein commented Jan 17, 2024

WillChilds-Klein commented Dec 1, 2023 •

edited

Loading

codecov-commenter commented Dec 1, 2023 •

edited

Loading

samuel40791765 commented Dec 14, 2023 •

edited

Loading

WillChilds-Klein Jan 12, 2024 •

edited

Loading