BIP: Ecmh Title: Elliptic Curve Multiset definition Author: Tomas van der Wansem <tomas@bitcrust.org> Created: 2018-02-01 License: PD
This BIP describes the construction of a multiset hash using elliptic curves, called ECMH. The ECMH is a 32-byte value that is uniquely and determinstically defined for a set of data elements, regardless of their order.
This spec does not in itself propose changes but can be referenced by other specification that rely on ECMH.
We find the 32-byte ECMH value of a multiset of elements. The multiset can be of any size and the elements are binary sequences of any size. The order of the elements of the set does not matter. Duplicate elements are possible, such that {a} is distinct from {a,a}.
We will use the secp256k1 elliptic curve [1], and further refer to this as the curve, and its field as the field. The point of a multiset, P(A), is a point on the curve uniquelly defined for the multiset A as follows:
The point for an empty multiset, P({}), is defined to be the infinity point of the curve.
The point for a multiset with a single element P({d}) is calculated using the algorithm:
- Let n = 0
- Let x be SHA256 of the concatenation of 64-bits little-endian encoding of n and SHA256(d), thus SHA256(n,SHA256(d))
- If x is an element of the field and x3+7 is a quadratic residue, then P({d}) = (x, (x3+7)1/2)
- Otherwise, increment n and continue from 2
P(A ∪ B) = P(A) * P(B)
The ECMH of an empty multiset is the 32-byte value of zero. The ECMH of a non-empty set is SHA256 value of the 64-byte value composed of the 32-byte big-endian x-coordinate followed by the 32-byte big-endian y-coordinate of its point on the curve.
- The verification step 2 of the algorithm succeeds ~50% of the time, so the algorithm loops on average e times.
- The trivial implementation of the algorithm has no fixed runtime, and therefore does not attempt to hide the underlying data. It only cryptographically secures against finding sets a,b such that EMCH(a) == ECMH(b) and a != b.
- Faster algorithms have been proposed but only for different (char-2) curves which have weaker security properties.
- The recommended implementation uses non-normalized intermediate 96-byte values. Normalization of these values takes about twice as long as the ECMH operations itself. Hence, finding the ECMH of 2 values is relatively slow compared to finding the ECMH of a large set.
An implementation of the algoritm can be found here:
https://github.com/tomasvdw/secp256k1/tree/multiset/src/modules/multiset
// ** Test-vectors d1,d2,d3. These are the serialized UTXO's from block 1,2,3 of mainnet: d1=982051fd1e4ba744bbbe680e1fee14677ba1a3c3540bf7b1cdb606e857233e0e00000000010000000100f2052a0100000043410496b538e853519c726a2c91e61ec11600ae1390813a627c66fb8be7947be63c52da7589379515d4e0a604f8141781e62294721166bf621e73a82cbf2342c858eeac d2=d5fdcc541e25de1c7a5addedf24858b8bb665c9f36ef744ee42c316022c90f9b00000000020000000100f2052a010000004341047211a824f55b505228e4c3d5194c1fcfaa15a456abdf37f9b9d97a4040afc073dee6c89064984f03385237d92167c13e236446b417ab79a0fcae412ae3316b77ac d3=44f672226090d85db9a9f2fbfe5f0f9609b387af7be5b7fbb7a1767c831c9e9900000000030000000100f2052a0100000043410494b9d3e76c5b1629ecf97fff95d7a4bbdac87cc26099ada28066c6ff1eb9191223cd897194a08d0c2726c5747f1db49e8cf90e75dc3e3550ae9b30086f3cd5aaac // ** finding EC-point EC(d1) and Multiset Hash M(d1): SHA(d1) = d63b1784e706370a88f0e110526a18f7bb567f50763856f4b19f094cfb89dd45 // Trial 0 x = SHA(00000000,SHA(d1)) = 45582bd9b9ed8df5fb3fb30babfa424d1e9ad20bbcf90c691203b74523b941b4 // x^3+7 is not a quadratic residue // Trial 1 x = SHA(01000000,SHA(d1)) = abdef482c7910bbfa428f81d136573698160883cc092569c5a5a57d64402abbf // x^3+7 is not a quadratic residue // Trial 2 x = SHA(02000000,SHA(d1)) = 4f9a5dce69067bf28603e73a7af4c3650b16539b95bad05eee95dfc94d1efe2c // x^3+7 is a quadratic residue EC(d1) = (x,y) = (4f9a5dce69067bf28603e73a7af4c3650b16539b95bad05eee95dfc94d1efe2c,346d5b777881f2729e7f89b2de4e8e79c7f2f42d1a0b25a8f10becb66e2d0f98) M(d1) = SHA(EC(d1)) = f883195933a687170c34fa1adec66fe2861889279fb12c03a3fb0ca68ad87893 // ** finding EC-point EC(d2) and Multiset Hash M(d2): SHA(d2) = 626f788a3ecd1f0e966bc4bb37ca7b74424f98687a748bdcfa7cd51bc926f3b4 // Trial 0 x = SHA(00000000,SHA(d2)) = 68cf91eb2388a0287c13d46011c73fb8efb6be89c0867a47feccb2d11c390d2d // x^3+7 is a quadratic residue EC(d2) = (x,y) = (68cf91eb2388a0287c13d46011c73fb8efb6be89c0867a47feccb2d11c390d2d,f42ba72b1079d3d941881836f88b5dcd7c207a6a4839f129272c77ebb7194d42) M(d2) = SHA(EC(d2)) = ef85d123a15da95d8aff92623ad1e1c9fcda3baa801bd40bc567a83a6fdcf3e2 // ** finding EC-point EC(d3) and Multiset Hash M(d3): SHA(d3) = dcdc8ef900b933854f7302a92e3d8ce05eff332425c736d9169b014cca31bc79 // Trial 0 x = SHA(00000000,SHA(d3)) = 359c6f59859d1d5af8e7081905cb6bb734c010be8680c14b5a89ee315694fc2b // x^3+7 is a quadratic residue EC(d3) = (x,y) (359c6f59859d1d5af8e7081905cb6bb734c010be8680c14b5a89ee315694fc2b,fb6ba531d4bd83b14c970ad1bec332a8ae9a05706cd5df7fd91a2f2cc32482fe) M(d3) = SHA(EC(d3)) = cfadf40fc017faff5e04ccc0a2fae0fd616e4226dd7c03b1334a7a610468edff // ** Finding combination M(d1,d2) = SHA(EC(d1,d2)) = SHA(EC(d1)+EC(d2)) EC(d1,d2) = EC(d1)+EC(d2) = (x,y) = (e6d4318be2f5b4ca8a024d87228fe4da14f9a5d7f94cd0a3a468dc0a3d0f1f5b,622ed4e52333a25e704e03ab0b50d7bfb78f98803eae97b1145a50f825bd5bae) M(d1,d2) = SHA(EC(d1,d2)) = fabafd38d07370982a34547daf5b57b8a4398696d6fd2294788abda07b1faaaf // ** Finding combination M(d1,d2,d3) = SHA(EC(d1,d2,d3)) = SHA(EC(d1,d2)+EC(d3)) EC(d1,d2,d3) = EC(d1,d2)+EC(d3) = (x,y) = (c11d50cd42ef5dd8dcd9a3d721e8155424b09cd3af313a4f99400e4e0adcae28,60f46f0b64a100694b2661eb279b29c54e459e0738a97ab5ca753678f95536ca) M(d1,d2,d3) = SHA(EC(d1,d2,d3)) = 1cbccda23d7ce8c5a8b008008e1738e6bf9cffb1d5b86a92a4e62b5394a636e2 Generate with https://github.com/tomasvdw/bitcoin-abc/commits/ecmh_dumpvectors
This document is placed in the public domain.
[1] https://en.bitcoin.it/wiki/Secp256k1
[2] https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2017-May/014337.html
[3] https://lists.linuxfoundation.org/pipermail/bitcoin-ml/2017-September/000240.html