Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
This PR adds two different algorithms to create snapshots of merkle trees. ## Notation N - number of non-zero nodes H - height of tree M - number of snapshots ## ~Incremental~ Full snapshots This algorithm stores the trees in a database instance in the same way they would be stored using pointers in memory. Each node has two key-value entries in the database: one for its left child and one for its right child. Taking a snapshot means just walking the tree (BF order), storing any new nodes. If we hit a node that already exists in the database, that means the subtree rooted at that node has not changed and so we can skip checking it. Building a sibling path is just a traversal of the tree from historic root to historic leaf. Pros: 1. its generic enough that it works with any type of merkle tree (one caveat: we'll need an extension to store historic leaf values for indexed-trees) 2. it shares structure with other versions of the tree: if some subtree hasn't changed between two snapshots then it will be reused for both snapshots (would work well for append-only trees) 3. getting a historic sibling path is extremely cheap only requiring O(H) database reads Cons: 1. it takes up a space to store every snapshot. Worst case scenario it would take up an extra O(N * M) space (ie. the tree changes entirely after every snapshot). For append-only trees it would be more space-efficient (e.g. once we start filling the right subtree, the left subtree will be shared by every future snapshot). More details in this comment #3207 (comment). ## Append-only snapshots For append-only trees we can have an infinite number of snapshots with just O(N + M) extra space (N - number of nodes in the tree, M - number of snapshots) at the cost of sibling paths possibly requiring O(H) hashes. This way of storing snapshots only works for `AppendOnlyTree` and exploits two properties of this type of tree: 1. once a leaf is set, it can never be changed (i.e. the tree is append-only). This property can be generalised to: once a subtree is completely filled then none of its nodes can ever change. 2. there are no gaps in the leaves: at any moment in time T we can say that the first K leaves of the tree are filled and the last 2^H-K leaves are zeroes. The algorithm stores a copy of the tree in the database: - the last snapshot of that node (ie v1, v2, etc) - the node value at that snapshot Taking the snapshot is also a BF traversal of the tree, comparing the current value of nodes with its previously stored value. If the value is different we update both entries. If the values are the same then the node hasn't changed and by the first property, none of the nodes in its subtree have changed so it returns early. For each snapshot we also store how many leaves have been filled (e.g. at v1 we had 10 leaves, at v2 33 leaves, etc) Building a sibling path is more involved and could potentially require O(H) hashes: 1. walk the historic tree from leaf to root 2. check the sibling's last snapshot - if it's before the target snapshot version then we can safely use the node (e.g. node A was last changed at time T3 and we're building a proof for the tree at time T10, then by property 1 we can conclude that neither node A nor the subtree rooted at A will ever change at any moment in time Tn where n > 3) - if the node has changed then we have to rebuild its historic value at the snapshot we're interested in - to do this we check how "wide" the subtree rooted at that node is (e.g. what's the first leaf index of the subtree) and if it intersects with the filled leaf set at that snapshot. If it doesn't intersect at all (e.g. my subtree's first leaf is 11 but only 5 leaves were filled at the time) then we can safely conclude that the whole subtree has some hash of zeros - if it does intersect go down one level and level apply step 2 again. - the invariant is: we will either reach a leaf and that leaf was changed in the version we're interested in, or we reach a node that was changed before the version we're interested in (and we return early) or we reach a node that was historically a hash of zero Two (big) SVGs showing the sibling path algorithm step-by-step [Average case](https://github.com/AztecProtocol/aztec-packages/assets/3816165/b87fa6eb-bcf4-42ca-879d-173a76d802bb) [Drawing of sibling path algorithm in the worst case](https://github.com/AztecProtocol/aztec-packages/assets/3816165/dd3788ec-6357-4fab-bf78-3496a2948040) Pros: - low space requirements only needing O(N+M) extra space. Cons: - generating a sibling path involves mixed workload: partly reading from the database and partly hashing. Worst case scenario O(H) db reads + O(H) hashing - doesn't work for `IndexedTree`s because even though it only supports appending new leaves, internally it updates its leaf values. Fix #3207 --------- Co-authored-by: PhilWindle <60546371+PhilWindle@users.noreply.github.com>
- Loading branch information