Skip to content

Latest commit

 

History

History
43 lines (31 loc) · 1.5 KB

sha2-256-chunked.md

File metadata and controls

43 lines (31 loc) · 1.5 KB

sha2-256-chunked

DRAFT multihash codec 0xb510

Hash of concatenated SHA2-256 digests of 8*2^n MiB source chunks where n = ceil(log2(source_size / (10^4 * 8MiB))

This variant of sha2-256 is designed to enable large files to be efficiently uploaded and hashed in parallel using fixed size chunks (8 MiB to start with), with the final result being a "top hash" that doesn't depend on the upload order.

The algorithm has an upper limit of 10,000 chunks. If the file is larger than 80,000 MiB, it will double the chunk size until the number of chunks is under that limit.

n = ceil(log2(source_size / (10_000 * 8 MiB)))
chunk_size = 8 MiB * 2^n

Inspiration

This algorithm is inspired by Amazon's S3 Checksums implementation.

The key differences are:

  • Fixing the chunk size as (starting from) 8 MiB.
  • Always hashing the result (even if source_size < 8 MiB).

It can reuse hashes generated by create_multipart_upload as of at least boto3 1.34.44 (2024-02-16), simply by rehashing the value if source_size < 8 MiB.

Status 2024-02-21

It has been submitted for draft multiformats registration under the name sha2-256-chunked using the prefix 0xb510.