Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

blockchain, integration, mining, main: Rolling merkle root calculation #1979

Merged
merged 5 commits into from
Aug 10, 2023

Conversation

kcalvinalvin
Copy link
Collaborator

I saw #1421 and noticed that it was slower than the current function for calculating merkle roots. Just looking to see if the direction I took here is ok.

I replaced everything in rolling_merkle.go with Utreexo structs/functions. The comments and struct names should probably change. FWIW, the main function add is using the same algorithm as the one found in blake3 hash.

Here's the benchstat results. It's about the same/slightly faster than the current merkle root function and allocates even less memory than the function in #1421. The added code is a lot smaller as well.

[I] calvin@bitcoin ~/b/g/u/g/s/g/b/b/blockchain (merkle-calc-fast)> benchstat merkle.txt rolling_merkle.txt
goos: linux
goarch: amd64
pkg: github.com/btcsuite/btcd/blockchain
cpu: AMD Ryzen 5 3600 6-Core Processor              
                │  merkle.txt  │         rolling_merkle.txt          │
                │    sec/op    │    sec/op     vs base               │
Merkle/1000-10    539.0µ ±  2%   542.6µ ±  6%       ~ (p=0.165 n=10)
Merkle/2000-10    1.081m ±  1%   1.078m ±  1%       ~ (p=0.436 n=10)
Merkle/4000-10    2.189m ± 29%   2.180m ±  2%       ~ (p=0.315 n=10)
Merkle/8000-10    4.410m ±  6%   4.352m ±  1%       ~ (p=0.105 n=10)
Merkle/16000-10   8.764m ±  2%   8.859m ± 34%       ~ (p=0.247 n=10)
Merkle/32000-10   17.60m ±  5%   17.29m ±  2%  -1.78% (p=0.007 n=10)
geomean           3.088m         3.078m        -0.34%

                │   merkle.txt   │         rolling_merkle.txt         │
                │      B/op      │    B/op     vs base                │
Merkle/1000-10      48416.0 ± 0%   320.0 ± 0%  -99.34% (p=0.000 n=10)
Merkle/2000-10      96800.0 ± 0%   352.0 ± 0%  -99.64% (p=0.000 n=10)
Merkle/4000-10     193568.0 ± 0%   384.0 ± 0%  -99.80% (p=0.000 n=10)
Merkle/8000-10     387104.0 ± 0%   416.0 ± 0%  -99.89% (p=0.000 n=10)
Merkle/16000-10    774176.0 ± 0%   448.0 ± 0%  -99.94% (p=0.000 n=10)
Merkle/32000-10   1548320.0 ± 0%   480.0 ± 0%  -99.97% (p=0.000 n=10)
geomean             267.3Ki        396.2       -99.86%

                │   merkle.txt   │         rolling_merkle.txt          │
                │   allocs/op    │ allocs/op   vs base                 │
Merkle/1000-10     1002.000 ± 0%   1.000 ± 0%   -99.90% (p=0.000 n=10)
Merkle/2000-10     2002.000 ± 0%   1.000 ± 0%   -99.95% (p=0.000 n=10)
Merkle/4000-10     4002.000 ± 0%   1.000 ± 0%   -99.98% (p=0.000 n=10)
Merkle/8000-10     8002.000 ± 0%   1.000 ± 0%   -99.99% (p=0.000 n=10)
Merkle/16000-10   16002.000 ± 0%   1.000 ± 0%   -99.99% (p=0.000 n=10)
Merkle/32000-10   32002.000 ± 0%   1.000 ± 0%  -100.00% (p=0.000 n=10)
geomean              5.661k        1.000        -99.98%

@coveralls
Copy link

coveralls commented May 9, 2023

Pull Request Test Coverage Report for Build 5612329239

  • 87 of 132 (65.91%) changed or added relevant lines in 6 files are covered.
  • 9 unchanged lines in 2 files lost coverage.
  • Overall coverage increased (+0.02%) to 55.258%

Changes Missing Coverage Covered Lines Changed/Added Lines %
blockchain/merkle.go 4 5 80.0%
rpcserver.go 0 2 0.0%
blockchain/rolling_merkle.go 81 88 92.05%
mining/mining.go 0 35 0.0%
Files with Coverage Reduction New Missed Lines %
mining/mining.go 2 8.08%
blockchain/merkle.go 7 31.34%
Totals Coverage Status
Change from base Build 5525483102: 0.02%
Covered Lines: 26767
Relevant Lines: 48440

💛 - Coveralls

@kcalvinalvin
Copy link
Collaborator Author

kcalvinalvin commented May 10, 2023

There was still some Utreexo specific stuff that made it calculate wrong roots. I fixed it in the latest force push.

Still the same results from the benchmarks so the benchstat results are valid.

EDIT: Later push was for witness merkle root calculation.

@kcalvinalvin kcalvinalvin force-pushed the merkle-calc-fast branch 3 times, most recently from f6c011a to 99e6b94 Compare May 11, 2023 11:49
@Roasbeef
Copy link
Member

Concept ACK, great work building on the other optimization with some of the utreexo derived fine tuning!

@kcalvinalvin kcalvinalvin changed the title (WIP: looking for concept ACKs) Rolling merkle root calculation blockchain, integration, mining, main: Rolling merkle root calculation Jun 3, 2023
@kcalvinalvin
Copy link
Collaborator Author

Ready for reviews now!

@Roasbeef

Copy link
Collaborator

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dope🤙 Excited to see this performance gain! Left some comments, mostly about the new algo.

)

// parentHash returns the hash of the left and right hashes passed in.
func parentHash(l, r chainhash.Hash) chainhash.Hash {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is the same as HashMerkleBranches?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah you're right, it serves the same functionality.

Though the parentHash function is faster because there's no allocation being done vs HashMerkleBranches which does a single allocation. I've tested and verified that this is because HashMerkleBranches is returning a pointer.

Would it be ok if I were to change the HashMerkleBranches function to return a chainhash.Hash instead of a pointer? Then parentHash and HashMerkleBranches would be equal.

BenchmarkHashMerkleBranches-10           2279146               526.5 ns/op            32 B/op          1 allocs/op
BenchmarkParentHash-10                   2392340               497.1 ns/op             0 B/op          0 allocs/op

I'll modify the

Code used for the above benchmark:

func BenchmarkHashMerkleBranches(b *testing.B) {
       var aHash, bHash chainhash.Hash
       for i := 0; i < b.N; i++ {
               HashMerkleBranches(&aHash, &bHash)
       }
}

func BenchmarkParentHash(b *testing.B) {       
       var aHash, bHash chainhash.Hash
       for i := 0; i < b.N; i++ {
               parentHash(aHash, bHash)
       }
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah why not🤓 Guess pointer stuff is tricky as it puts data on stack, and the performance gain is better in all aspects so I don't see why we shouldn't.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a commit to change the HashMerkleBranches to return a chainhash.Hash. Removed parentHash.

// allocated based on the passed in size.
func newRollingMerkleTreeStore(size int) rollingMerkleTreeStore {
var alloc int
if size != 0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the size be 0?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can't be 0 since this is only called in CalcMerkleRoot and the passed in size is len(transactions) in CalcMerkleRoot.

I put this check here since if the code were to be called elsewhere with 0, there'd be an overflow. It's not too bad since you'd be allocating 64 for []chainhash.Hash but I thought it'd be better not to allow for that.

I can remove the branch if need be.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I like this check. Asked that question to gain more understanding and possibly seek ways to make this more defensive. Since this won't result in any error state I think it's ok. But maybe we could add an info or warning log here since the performance won't be as good?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment on the function in the latest push.

I think it'd be sorta weird to tell the end user why an unexported function is throwing a warning or an info in the logs.

blockchain/rolling_merkle.go Show resolved Hide resolved
blockchain/rolling_merkle.go Show resolved Hide resolved
blockchain/rolling_merkle.go Show resolved Hide resolved
}

// Add on the last tx again if there's an odd number of txs.
if len(adds) > 0 && len(adds)&1 == 1 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder what's the difference between 2&2 == 1 and 2%2 != 0 performance-wise🧐

Copy link
Collaborator Author

@kcalvinalvin kcalvinalvin Jun 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like bitcompare is slightly faster. It's almost the same. Is the modulo more readable? I could change it here if so.

Benchstat results:

goos: linux
goarch: amd64
pkg: github.com/btcsuite/btcd/blockchain
cpu: AMD Ryzen 5 3600 6-Core Processor              
       │ modulo.txt  │           bitcompare.txt           │
       │   sec/op    │   sec/op     vs base               │
Odd-10   2.478µ ± 3%   2.464µ ± 2%  -0.59% (p=0.041 n=10)

Code used:

+var varbool bool
+
+func BenchmarkOdd(b *testing.B) {
+       for i := 0; i < b.N; i++ {
+               benchmarkOdd(10_000)
+       }
+}
+
+func benchmarkOdd(n int) {
+       for i := 0; i < n; i++ {
+               // varbool = 2&1 == 1
+               varbool = 2%2 != 0
+       }
+}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool thanks for the test! If the performance is the same I'd say we always prefer the more readable approach. Non-blocking tho.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to %2 == 2 in the latest push


// newRollingMerkleTreeStore returns a rollingMerkleTreeStore with the roots
// allocated based on the passed in size.
func newRollingMerkleTreeStore(size int) rollingMerkleTreeStore {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use uint64 here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in the latest push

blockchain/rolling_merkle.go Show resolved Hide resolved
blockchain/merkle_test.go Show resolved Hide resolved
merkleStoreTree := BuildMerkleTreeStore(block.Transactions(), false)
merkleStoreRoot := merkleStoreTree[len(merkleStoreTree)-1]

if calcMerkleRoot != *merkleStoreRoot {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use require.Equal instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in the latest push

@kcalvinalvin kcalvinalvin force-pushed the merkle-calc-fast branch 2 times, most recently from 55da398 to f4eedd1 Compare June 26, 2023 08:49
@kcalvinalvin kcalvinalvin requested a review from yyforyongyu June 26, 2023 23:45
Copy link
Contributor

@ProofOfKeags ProofOfKeags left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I really like this PR. I do think there are certain API changes that are necessary for the RMTS to be broadly useful, rather that solely tailored to the block tree verification process. As I understand it, this PR is ultimately in service of this which is for optimizing sync times, so I am well aware that API ergonomics may not have been an original design goal. I do think that we can take this opportunity to have a broadly useful piece of machinery, though.

The specific changes I am requesting are to clean up the calcMerkleRoot function so that it doesn't have special provisioning for Bitcoin specific structures and correspondingly allows you to clean up the internal implementation logic and simplifying redundant checks. Everything else I've left comments wise is non-blocking or adds to the documentation to help future readers of this code.

blockchain/rolling_merkle.go Show resolved Hide resolved
blockchain/rolling_merkle.go Outdated Show resolved Hide resolved
blockchain/rolling_merkle.go Outdated Show resolved Hide resolved
blockchain/rolling_merkle.go Show resolved Hide resolved
blockchain/rolling_merkle.go Outdated Show resolved Hide resolved
blockchain/rolling_merkle.go Show resolved Hide resolved
blockchain/rolling_merkle.go Show resolved Hide resolved
blockchain/rolling_merkle.go Show resolved Hide resolved
blockchain/rolling_merkle.go Outdated Show resolved Hide resolved
blockchain/merkle.go Show resolved Hide resolved
@kcalvinalvin
Copy link
Collaborator Author

I do think there are certain API changes that are necessary for the RMTS to be broadly useful, rather that solely tailored to the block tree verification process.

I strongly disagree making this function more broadly useful as the merkle tree function used for calculating the tx commitment is flawed and has a vulnerability (CVE-2012-2459). It's possible to create a collision using duplicate leaves and it's described in detail at: https://github.com/bitcoin/bitcoin/blob/79e8247ddb166f9b980f40249b7372a502402a4d/src/consensus/merkle.cpp#L8-L41

If anyone wants to use the following code for merkle tree stuff, they should just use https://github.com/utreexo/utreexo as it's essentially the same.

Copy link
Collaborator Author

@kcalvinalvin kcalvinalvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed a few things. Left comments on comment changes.

blockchain/rolling_merkle.go Show resolved Hide resolved
blockchain/rolling_merkle.go Show resolved Hide resolved
blockchain/rolling_merkle.go Outdated Show resolved Hide resolved
blockchain/rolling_merkle.go Show resolved Hide resolved
blockchain/rolling_merkle.go Show resolved Hide resolved
blockchain/rolling_merkle.go Outdated Show resolved Hide resolved
blockchain/merkle.go Show resolved Hide resolved
blockchain/rolling_merkle.go Outdated Show resolved Hide resolved
blockchain/rolling_merkle.go Outdated Show resolved Hide resolved
blockchain/rolling_merkle.go Show resolved Hide resolved
Copy link
Collaborator

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest round looks good🎉 Left some comments re style nits and tests. Think once addressed we are good to go🏖

)

// parentHash returns the hash of the left and right hashes passed in.
func parentHash(l, r chainhash.Hash) chainhash.Hash {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah why not🤓 Guess pointer stuff is tricky as it puts data on stack, and the performance gain is better in all aspects so I don't see why we shouldn't.

blockchain/rolling_merkle.go Show resolved Hide resolved
type rollingMerkleTreeStore struct {
// roots are where the temporary merkle roots get stored while the
// merkle root is being calculated. Every root has 2^n leaves and the
// tallest tree is furthest to the left and the shortest tree is furthest to
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: line too long, happens at some other places too. Guess we don't enforce the 80 column-size limits in btcd yet so this is non-blocking. Will submit a PR to enforce it later.

// allocated based on the passed in size.
func newRollingMerkleTreeStore(size int) rollingMerkleTreeStore {
var alloc int
if size != 0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I like this check. Asked that question to gain more understanding and possibly seek ways to make this more defensive. Since this won't result in any error state I think it's ok. But maybe we could add an info or warning log here since the performance won't be as good?

blockchain/rolling_merkle.go Show resolved Hide resolved
s.add(leaf)
}

require.Equal(t, s.roots, test.expectedRoots)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets also check s.numLeaves?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added check for s.numLeaves in the latest push

{0x00},
},
},

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to add two more cases, one with 2 leaves and one with 4 leaves. It helps with understanding, also shows how the roots are "rolling".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the test cases in the latest push

}

// Add on the last tx again if there's an odd number of txs.
if len(adds) > 0 && len(adds)&1 == 1 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool thanks for the test! If the performance is the same I'd say we always prefer the more readable approach. Non-blocking tho.

blockchain/rolling_merkle.go Show resolved Hide resolved
mining/mining.go Outdated
@@ -804,6 +804,38 @@ mempoolLoop:
var witnessCommitment []byte
if witnessIncluded {
witnessCommitment = AddWitnessCommitment(coinbaseTx, blockTxns)
// The witness of the coinbase transaction MUST be exactly 32-bytes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make the whole logical branch into its own method to stop growing the already very long method?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that seemed like it was refactored away to a separate function a while ago. It's just bad git rebasing on my end.

Got rid of the entire branch as it's a duplicate of AddWitnessCommitment

kcalvinalvin and others added 5 commits August 3, 2023 15:23

Verified

This commit was signed with the committer’s verified signature. The key has expired.
sputn1ck Konstantin Nick
BuildMerkleTreeStore used to return a pointer, but it is changed to
return a chainhash.Hash directly.  This allows the compiler to make
optimizations in some cases and avoids a memory allocation.

Verified

This commit was signed with the committer’s verified signature. The key has expired.
sputn1ck Konstantin Nick
RollingMerkleTree is a much more memory efficient way of calculating the
merkle root of a tx commitment inside the bitcoin block header.  The
current way of calculating the merkle root allocates 2*N elements. With
the RollingMerkleTree, we are able to reduce the memory allocated to
log2(N).

This results in significant memory savings (99.9% in an average block),
allowing for a faster block verification.

Verified

This commit was signed with the committer’s verified signature. The key has expired.
sputn1ck Konstantin Nick
CalcMerkleRoot uses the rolling merkle root algorithm to calculate the
merkle root commitment inside the Bitcoin block header.  It allocates
significantly less memory than the BuildMerkleTreeStore function that's
currently in use (99.9% in an average block with 2000 txs).
Copy link
Collaborator

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM⛵️

Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🌭

@Roasbeef
Copy link
Member

Completed sync tests on mainnet+testnet as a final sanity check!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants