Hash of concatenated SHA2-256 digests of 8*2^n MiB source chunks where n = ceil(log2(source_size / (10^4 * 8MiB))
This variant of sha2-256 is designed to enable large files to be efficiently uploaded and hashed in parallel using fixed size chunks (8 MiB to start with), with the final result being a "top hash" that doesn't depend on the upload order.
The algorithm has an upper limit of 10,000 chunks. If the file is larger than 80,000 MiB, it will double the chunk size until the number of chunks is under that limit.
n = ceil(log2(source_size / (10_000 * 8 MiB)))
chunk_size = 8 MiB * 2^n
This algorithm is inspired by Amazon's S3 Checksums implementation.
The key differences are:
- Fixing the chunk size as (starting from) 8 MiB.
- Always hashing the result (even if source_size < 8 MiB).
It can reuse hashes generated by create_multipart_upload as of at least boto3 1.34.44 (2024-02-16), simply by rehashing the value if source_size < 8 MiB.
It has been submitted for draft
multiformats registration
under the name sha2-256-chunked
using the prefix 0xb510
.