-
-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace built-in hashlib with verified implementations from HACL* #99108
Comments
(missing context: there was a private email thread cc'ing ~7 of us including myself, @tiran, @alex, @msprotz & others that led to this issue and PR) in response to #98517 CVE happening with our old XKCP sha3 vendored code. One reason I might support this work across all of our required always supported hashes is that if these perform well, I'd be happy to see the If this sounds like the opposite of my statements over buried in the blake3 rejection thread - it's because I did not seek an alternative to the variety of vendored fallback hash algorithm implementations we carry there. So the goal in those statements was to lean towards simplicity instead of complexity. (which has already paid off via our XKCP removal in favor of tiny_sha3 by not having Python 3.11+ suffer from the sha3 XKCP CVE that the larger complex third party code in 3.6-3.10 did) We provide and ship native implementations for the hash algorithms anyways as we support builds without OpenSSL present, and support building on systems with libssl versions or variants that do not provide everything. Q: Would adopting this code increase our maintenance burden? A: Probably, yes. We'd presumably want to update it with more recent versions from hacl-star from time to time. Though the theory is that there should never be a security need to do so? In the past we've been asked to update vendored hash algorithm code from time to time but have hesitated due to the complexity and perceived risk, and assumed lack of reward given people who care about performance would presumably have a modern OpenSSL to get that from. Though if we did get rid of the OpenSSL wrapper code that'd also be a reduced burden. Q: What about performance? A: Good question. That needs continual measuring on our Tiered supported platforms. Don't focus solely on thruput. OpenSSL's EVP APIs are known to have silly-slow setup time. So something hashing a pile of 234 byte objects may be slow no matter how good OpenSSL's bulk data thruput is. Benchmark small message thruput as well as bulk data thruput. Parting thought: There is a broader desire to wean us off of our OpenSSL dependency. I realize that's a hard thing so long as the |
Thanks for your comment, Greg. Some related thoughts.
None of this is infeasible -- it's just a matter of deciding what is good for a Python-in-the-future where hashlib no longer depends on OpenSSL. For instance, you might say "let's maintain a portable C version, and one single ASM version that uses SHAEXT when available (fastest)", and this is something we could make happen within HACL*. Just to give you a sense of what we have right now, allow me to go into greater detail:
Hope this helps. As you said, this kind of change is a first step in the right direction. |
replacing hashlib primitives (for the non-OpenSSL case) with verified implementations from HACL*. This is the first PR in the series, and focuses specifically on SHA2-256 and SHA2-224. This PR imports Hacl_Streaming_SHA2 into the Python tree. This is the HACL* implementation of SHA2, which combines a core implementation of SHA2 along with a layer of buffer management that allows updating the digest with any number of bytes. This supersedes the previous implementation in the tree. @franziskuskiefer was kind enough to benchmark the changes: in addition to being verified (thus providing significant safety and security improvements), this implementation also provides a sizeable performance boost! ``` --------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------- Sha2_256_Streaming 3163 ns 3160 ns 219353 // this PR LibTomCrypt_Sha2_256 5057 ns 5056 ns 136234 // library used by Python currently ``` The changes in this PR are as follows: - import the subset of HACL* that covers SHA2-256/224 into `Modules/_hacl` - rewire sha256module.c to use the HACL* implementation Co-authored-by: Gregory P. Smith [Google LLC] <greg@krypto.org> Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com>
Alright, the first PR is in. Checkbox time:
|
Great! sha384/sha512 is a no-brainer and I'll send a followup PR very soon. We have SHA3 as well, so I can send that in, too, with the caveat that we have a few performance optimizations for sha3 in the works, so we can either wait for those to land in HACL* then submit the Python PR, or first land SHA3 in Python, then send a followup PR to refresh the HACL* code once the performance tweaks have landed. Happy to hear your thoughts on this. |
Replace the builtin hashlib implementations of SHA2-384 and SHA2-512 originally from LibTomCrypt with formally verified, side-channel resistant code from the [HACL*](https://github.com/hacl-star/hacl-star/) project. The builtins remain a fallback only used when OpenSSL does not provide them.
Given that the SHA3 implementation we currently ship is a tiny poor performing one, no need to wait. I expect we'll pull in updates to the HACL* implementations as needed in the future when there is a motivating reason. |
This builds HACL* as a library in one place. A followup to #101707 which broke some WASM builds. This fixes 2/4 of them, but the enscripten toolchain in the others don't deduplicate linker arguments and error out. A follow-on PR will address those.
This merges their code. They're backed by the same single HACL* star static library, having them be a single module simplifies maintenance. This should unbreak the wasm enscripten builds that currently fail due to linking in --whole-archive mode and the HACL* library appearing twice. Long unnoticed error fixed: `_sha512.SHA384Type` was doubly assigned and was actually SHA512Type. Nobody depends on those internal names.
This merges their code. They're backed by the same single HACL* static library, having them be a single module simplifies maintenance. This should unbreak the wasm enscripten builds that currently fail due to linking in --whole-archive mode and the HACL* library appearing twice. Long unnoticed error fixed: _sha512.SHA384Type was doubly assigned and was actually SHA512Type. Nobody depends on those internal names. Also rename LIBHACL_ make vars to LIBHACL_SHA2_ in preperation for other future HACL things.
Replaces our fallback non-OpenSSL MD5 and SHA1 implementations with those from HACL* as we've already done with SHA2.
* main: (76 commits) Fix syntax error in struct doc example (python#102160) pythongh-99108: Import MD5 and SHA1 from HACL* (python#102089) pythonGH-101777: `queue.rst`: use 2 spaces after a period to be consistent. (python#102143) Few coverage nitpicks for the cmath module (python#102067) pythonGH-100982: Restrict `FOR_ITER_RANGE` to a single instruction to allow instrumentation. (pythonGH-101985) pythongh-102135: Update turtle docs to rename wikipedia demo to rosette (python#102137) pythongh-99942: python.pc on android/cygwin should link to libpython per configure.ac (pythonGH-100356) pythongh-95672 fix typo SkitTest to SkipTest (pythongh-102119) pythongh-101936: Update the default value of fp from io.StringIO to io.BytesIO (pythongh-102100) pythongh-102008: simplify test_except_star by using sys.exception() instead of sys.exc_info() (python#102009) pythongh-101903: Remove obsolete undefs for previously removed macros Py_EnterRecursiveCall and Py_LeaveRecursiveCall (python#101923) pythongh-100556: Improve clarity of `or` docs (python#100589) pythongh-101777: Make `PriorityQueue` docs slightly clearer (python#102026) pythongh-101965: Fix usage of Py_EnterRecursiveCall return value in _bisectmodule.c (pythonGH-101966) pythongh-101578: Amend exception docs (python#102057) pythongh-101961 fileinput.hookcompressed should not set the encoding value for the binary mode (pythongh-102068) pythongh-102056: Fix a few bugs in error handling of exception printing code (python#102078) pythongh-102011: use sys.exception() instead of sys.exc_info() in docs where possible (python#102012) pythongh-101566: Sync with zipp 3.14. (pythonGH-102018) pythonGH-99818: improve the documentation for zipfile.Path and Traversable (pythonGH-101589) ...
Automerge-Triggered-By: GH:erlend-aasland
@gpshead I'll let you check MD5/SHA1 in the checklist above... quick question: what is the timeline for landing sha3? it requires a little bit of work on my side and I'll be traveling all March, so ideally I would submit the SHA3 PR in April. But if it's a dealbreaker because of e.g. the Python release schedule, I can scramble to try to submit it before then. Let me know. |
So long as we can get it committed before 3.12beta1 at the beginning of May (see https://peps.python.org/pep-0693/) we're good. So April for that is fine by me. No need to scramble. |
* main: (67 commits) pythongh-99108: Add missing md5/sha1 defines to Modules/Setup (python#102308) pythongh-100227: Move _str_replace_inf to PyInterpreterState (pythongh-102333) pythongh-100227: Move the dtoa State to PyInterpreterState (pythongh-102331) pythonGH-102305: Expand some macros in generated_cases.c.h (python#102309) Migrate to new PSF mailgun account (python#102284) pythongh-102192: Replace PyErr_Fetch/Restore etc by more efficient alternatives (in Python/) (python#102193) pythonGH-90744: Fix erroneous doc links in the sys module (python#101319) pythongh-87092: Make jump target label equal to the offset of the target in the instructions sequence (python#102093) pythongh-101101: Unstable C API tier (PEP 689) (pythonGH-101102) IDLE: Simplify DynOptionsMenu __init__code (python#101371) pythongh-101561: Add typing.override decorator (python#101564) pythongh-101825: Clarify that as_integer_ratio() output is always normalized (python#101843) pythongh-101773: Optimize creation of Fractions in private methods (python#101780) pythongh-102251: Updates to test_imp Toward Fixing Some Refleaks (pythongh-102254) pythongh-102296 Document that inspect.Parameter kinds support ordering (pythonGH-102297) pythongh-102250: Fix double-decref in COMPARE_AND_BRANCH error case (pythonGH-102287) pythongh-101100: Fix sphinx warnings in `types` module (python#102274) pythongh-91038: Change default argument value to `False` instead of `0` (python#31621) pythongh-101765: unicodeobject: use Py_XDECREF correctly (python#102283) [doc] Improve grammar/fix missing word (pythonGH-102060) ...
…119316) This replaces the existing hashlib Blake2 module with a single implementation that uses HACL\*'s Blake2b/Blake2s implementations. We added support for all the modes exposed by the Python API, including tree hashing, leaf nodes, and so on. We ported and merged all of these changes upstream in HACL\*, added test vectors based on Python's existing implementation, and exposed everything needed for hashlib. This was joint work done with @R1kM. See the PR for much discussion and benchmarking details. TL;DR: On many systems, 8-50% faster (!) than `libb2`, on some systems it appeared 10-20% slower than `libb2`.
…nGH-123027) Inform HACL whether explicit_bzero is available
Replaces our fallback non-OpenSSL MD5 and SHA1 implementations with those from HACL* as we've already done with SHA2.
Automerge-Triggered-By: GH:erlend-aasland
Disable HACL SIMD code on older versions of Android
CPython now fails to build on x86 Debian, see: https://buildbot.python.org/#/builders/1244/builds/3176 |
Ok, I'm surprised this wasn't caught earlier. Thanks for reporting. We could either disable vectorized versions on x86 builds (i.e. edit autoconf), or fix the vectorized headers so that they also work for an x86 toolchain (i.e. relax this ifdef to also work on x86: https://github.com/python/cpython/blob/main/Modules/_hacl/libintvector.h#L22) Maybe you could try the latter and see if that would fix the Debian build? (I don't have a Debian x86 handy.) Thanks. |
I don't have it handy either, but if you make a PR, I can ask Buildbot to test it. |
Not sure how we missed this before; I believe 32-bit x86 should have access to at least SSE4.2. I've never looked to see if more recent additions like AVX2 or SHA NI are available in 32-bit mode or not. Test changing that build conditional in a PR and we'll find out. It is a legacy platform so if we wind up just leaving the accelerated instructions turned off there, no big deal. |
…125244) Accomplished by updating HACL* vendored code from hacl-star/hacl-star@a6a0949 to hacl-star/hacl-star@315a9e4 Co-authored-by: Victor Stinner <vstinner@python.org> Co-authored-by: Zachary Ware <zach@python.org>
FYI: I'm currently working on adding HACL* HMAC implementation for the known hash functions (SHA1/2 + blake2b/2s). There are certain limitations on the input data (the length of the inputs is expected to fit on The portage is quite straightforward since everything operates on |
I'd say go ahead and do it, no need to wait for others. |
Also there's a macro we added for the discrepancy between uint32_t and ssize_t so you might be able to reuse that. Which hash algorithms are you missing? CC @R1kM |
I expected SHA-3 to be supported as well since we have a |
Which variants do you need? Is there a link towards the expected API? |
Python's HMAC expects any algorithm that can be passed to
I'm using the Technically, all the names that are in |
It's perfectly fine for Python to fall back to the existing implementation when accelerated HMAC does not exist. It's a nice to have for the most widely used variants. |
Great! I'll probably have something by the end of next week (I don't have much time in the next few days). |
This was fairly easy to add and @R1kM and I just have a PR (see above) that adds all of the required algorithms. We'll merge as soon as the CI comes back green and then you'll be able to pull from that directly. Could you confirm this completely eliminates the need for fallback implementations? Thanks! |
I think it should completely eliminates those needs. I can confirm it tomorrow or when I'll submit the PR but I think this should be good. Thank you for your quick reply! |
FTR: As mentioned in #72570, not supporting HMAC+SHAKE was deliberate but we'll probably need to improve the docs for that as well (since it only says "any name accepted by hashlib.new()"). (well, the current implementation works for sha3 family although keccak-based hash functions are already protected against LE-attacks). |
Feature or enhancement
We propose to replace the non-OpenSSL cryptographic primitives in hashlib with high-assurance, verified versions from the HACL* project.
Pitch
As evidenced by the recent SHA3 buffer overflow, cryptographic primitives are tricky to implement correctly. There might be issues with memory management, exceeding lengths, incorrect buffer management, or worse, incorrect implementations in corner cases.
The HACL* project https://github.com/hacl-star/hacl-star provides verified implementations of cryptographic primitives. These implementations are mathematically shown to be:
See https://hacl-star.github.io/Overview.html#what-is-verified-software for a longer description of how formal methods can help write high-assurance software and rule out entire classes of bugs.
The performance of HACL* is competitive with, and sometimes exceeds, that of OpenSSL. HACL* is distributed as pure C, and therefore is portable. Parts of HACL* have been adopted in Mozilla, Linux, the Tezos blockchain, and many more, thereby demonstrating that formally verified code is ready for production-time.
Previous discussion
Tagging @alex with whom I've informally discussed this.
Modules/_blake2
. #126270The text was updated successfully, but these errors were encountered: