feat: add tree snapshots (#3468)

This PR adds two different algorithms to create snapshots of merkle trees. ## Notation N - number of non-zero nodes H - height of tree M - number of snapshots ## ~Incremental~ Full snapshots This algorithm stores the trees in a database instance in the same way they would be stored using pointers in memory. Each node has two key-value entries in the database: one for its left child and one for its right child. Taking a snapshot means just walking the tree (BF order), storing any new nodes. If we hit a node that already exists in the database, that means the subtree rooted at that node has not changed and so we can skip checking it. Building a sibling path is just a traversal of the tree from historic root to historic leaf. Pros: 1. its generic enough that it works with any type of merkle tree (one caveat: we'll need an extension to store historic leaf values for indexed-trees) 2. it shares structure with other versions of the tree: if some subtree hasn't changed between two snapshots then it will be reused for both snapshots (would work well for append-only trees) 3. getting a historic sibling path is extremely cheap only requiring O(H) database reads Cons: 1. it takes up a space to store every snapshot. Worst case scenario it would take up an extra O(N * M) space (ie. the tree changes entirely after every snapshot). For append-only trees it would be more space-efficient (e.g. once we start filling the right subtree, the left subtree will be shared by every future snapshot). More details in this comment #3207 (comment). ## Append-only snapshots For append-only trees we can have an infinite number of snapshots with just O(N + M) extra space (N - number of nodes in the tree, M - number of snapshots) at the cost of sibling paths possibly requiring O(H) hashes. This way of storing snapshots only works for `AppendOnlyTree` and exploits two properties of this type of tree: 1. once a leaf is set, it can never be changed (i.e. the tree is append-only). This property can be generalised to: once a subtree is completely filled then none of its nodes can ever change. 2. there are no gaps in the leaves: at any moment in time T we can say that the first K leaves of the tree are filled and the last 2^H-K leaves are zeroes. The algorithm stores a copy of the tree in the database: - the last snapshot of that node (ie v1, v2, etc) - the node value at that snapshot Taking the snapshot is also a BF traversal of the tree, comparing the current value of nodes with its previously stored value. If the value is different we update both entries. If the values are the same then the node hasn't changed and by the first property, none of the nodes in its subtree have changed so it returns early. For each snapshot we also store how many leaves have been filled (e.g. at v1 we had 10 leaves, at v2 33 leaves, etc) Building a sibling path is more involved and could potentially require O(H) hashes: 1. walk the historic tree from leaf to root 2. check the sibling's last snapshot - if it's before the target snapshot version then we can safely use the node (e.g. node A was last changed at time T3 and we're building a proof for the tree at time T10, then by property 1 we can conclude that neither node A nor the subtree rooted at A will ever change at any moment in time Tn where n > 3) - if the node has changed then we have to rebuild its historic value at the snapshot we're interested in - to do this we check how "wide" the subtree rooted at that node is (e.g. what's the first leaf index of the subtree) and if it intersects with the filled leaf set at that snapshot. If it doesn't intersect at all (e.g. my subtree's first leaf is 11 but only 5 leaves were filled at the time) then we can safely conclude that the whole subtree has some hash of zeros - if it does intersect go down one level and level apply step 2 again. - the invariant is: we will either reach a leaf and that leaf was changed in the version we're interested in, or we reach a node that was changed before the version we're interested in (and we return early) or we reach a node that was historically a hash of zero Two (big) SVGs showing the sibling path algorithm step-by-step [Average case](https://github.com/AztecProtocol/aztec-packages/assets/3816165/b87fa6eb-bcf4-42ca-879d-173a76d802bb) [Drawing of sibling path algorithm in the worst case](https://github.com/AztecProtocol/aztec-packages/assets/3816165/dd3788ec-6357-4fab-bf78-3496a2948040) Pros: - low space requirements only needing O(N+M) extra space. Cons: - generating a sibling path involves mixed workload: partly reading from the database and partly hashing. Worst case scenario O(H) db reads + O(H) hashing - doesn't work for `IndexedTree`s because even though it only supports appending new leaves, internally it updates its leaf values. Fix #3207 --------- Co-authored-by: PhilWindle <60546371+PhilWindle@users.noreply.github.com>
AztecProtocol · Dec 1, 2023 · 7a86bb3 · 7a86bb3
1 parent 8b6688a
commit 7a86bb3
Show file tree

Hide file tree

Showing 21 changed files with 1,235 additions and 8 deletions.
diff --git a/yarn-project/merkle-tree/src/index.ts b/yarn-project/merkle-tree/src/index.ts
@@ -4,8 +4,12 @@ export * from './interfaces/merkle_tree.js';
 export * from './interfaces/update_only_tree.js';
 export * from './pedersen.js';
 export * from './sparse_tree/sparse_tree.js';
-export * from './standard_indexed_tree/standard_indexed_tree.js';
+export { LowLeafWitnessData, StandardIndexedTree } from './standard_indexed_tree/standard_indexed_tree.js';
 export * from './standard_tree/standard_tree.js';
 export { INITIAL_LEAF } from './tree_base.js';
 export { newTree } from './new_tree.js';
 export { loadTree } from './load_tree.js';
+export * from './snapshots/snapshot_builder.js';
+export * from './snapshots/full_snapshot.js';
+export * from './snapshots/append_only_snapshot.js';
+export * from './snapshots/indexed_tree_snapshot.js';
diff --git a/yarn-project/merkle-tree/src/interfaces/append_only_tree.ts b/yarn-project/merkle-tree/src/interfaces/append_only_tree.ts
@@ -1,9 +1,10 @@
+import { TreeSnapshotBuilder } from '../snapshots/snapshot_builder.js';
 import { MerkleTree } from './merkle_tree.js';
 
 /**
  * A Merkle tree that supports only appending leaves and not updating existing leaves.
  */
-export interface AppendOnlyTree extends MerkleTree {
+export interface AppendOnlyTree extends MerkleTree, TreeSnapshotBuilder {
   /**
    * Appends a set of leaf values to the tree.
    * @param leaves - The set of leaves to be appended.

diff --git a/yarn-project/merkle-tree/src/interfaces/update_only_tree.ts b/yarn-project/merkle-tree/src/interfaces/update_only_tree.ts
@@ -1,11 +1,12 @@
 import { LeafData } from '@aztec/types';
 
+import { TreeSnapshotBuilder } from '../snapshots/snapshot_builder.js';
 import { MerkleTree } from './merkle_tree.js';
 
 /**
  * A Merkle tree that supports updates at arbitrary indices but not appending.
  */
-export interface UpdateOnlyTree extends MerkleTree {
+export interface UpdateOnlyTree extends MerkleTree, TreeSnapshotBuilder {
   /**
    * Updates a leaf at a given index in the tree.
    * @param leaf - The leaf value to be updated.

diff --git a/yarn-project/merkle-tree/src/snapshots/append_only_snapshot.test.ts b/yarn-project/merkle-tree/src/snapshots/append_only_snapshot.test.ts
@@ -0,0 +1,28 @@
+import levelup, { LevelUp } from 'levelup';
+
+import { Pedersen, StandardTree, newTree } from '../index.js';
+import { createMemDown } from '../test/utils/create_mem_down.js';
+import { AppendOnlySnapshotBuilder } from './append_only_snapshot.js';
+import { describeSnapshotBuilderTestSuite } from './snapshot_builder_test_suite.js';
+
+describe('AppendOnlySnapshot', () => {
+  let tree: StandardTree;
+  let snapshotBuilder: AppendOnlySnapshotBuilder;
+  let db: LevelUp;
+
+  beforeEach(async () => {
+    db = levelup(createMemDown());
+    const hasher = new Pedersen();
+    tree = await newTree(StandardTree, db, hasher, 'test', 4);
+    snapshotBuilder = new AppendOnlySnapshotBuilder(db, tree, hasher);
+  });
+
+  describeSnapshotBuilderTestSuite(
+    () => tree,
+    () => snapshotBuilder,
+    async tree => {
+      const newLeaves = Array.from({ length: 2 }).map(() => Buffer.from(Math.random().toString()));
+      await tree.appendLeaves(newLeaves);
+    },
+  );
+});
diff --git a/yarn-project/merkle-tree/src/snapshots/append_only_snapshot.ts b/yarn-project/merkle-tree/src/snapshots/append_only_snapshot.ts
@@ -0,0 +1,232 @@
+import { Hasher, SiblingPath } from '@aztec/types';
+
+import { LevelUp } from 'levelup';
+
+import { AppendOnlyTree } from '../interfaces/append_only_tree.js';
+import { TreeBase } from '../tree_base.js';
+import { TreeSnapshot, TreeSnapshotBuilder } from './snapshot_builder.js';
+
+// stores the last block that modified this node
+const nodeModifiedAtBlockKey = (treeName: string, level: number, index: bigint) =>
+  `snapshot:node:${treeName}:${level}:${index}:block`;
+
+// stores the value of the node at the above block
+const historicalNodeKey = (treeName: string, level: number, index: bigint) =>
+  `snapshot:node:${treeName}:${level}:${index}:value`;
+
+// metadata for a snapshot
+const snapshotRootKey = (treeName: string, block: number) => `snapshot:root:${treeName}:${block}`;
+const snapshotNumLeavesKey = (treeName: string, block: number) => `snapshot:numLeaves:${treeName}:${block}`;
+
+/**
+ * A more space-efficient way of storing snapshots of AppendOnlyTrees that trades space need for slower
+ * sibling path reads.
+ *
+ * Complexity:
+ *
+ * N - count of non-zero nodes in tree
+ * M - count of snapshots
+ * H - tree height
+ *
+ * Space complexity: O(N + M) (N nodes - stores the last snapshot for each node and M - ints, for each snapshot stores up to which leaf its written to)
+ * Sibling path access:
+ *  Best case: O(H) database reads + O(1) hashes
+ *  Worst case: O(H) database reads + O(H) hashes
+ */
+export class AppendOnlySnapshotBuilder implements TreeSnapshotBuilder {
+  constructor(private db: LevelUp, private tree: TreeBase & AppendOnlyTree, private hasher: Hasher) {}
+  async getSnapshot(block: number): Promise<TreeSnapshot> {
+    const meta = await this.#getSnapshotMeta(block);
+
+    if (typeof meta === 'undefined') {
+      throw new Error(`Snapshot for tree ${this.tree.getName()} at block ${block} does not exist`);
+    }
+
+    return new AppendOnlySnapshot(this.db, block, meta.numLeaves, meta.root, this.tree, this.hasher);
+  }
+
+  async snapshot(block: number): Promise<TreeSnapshot> {
+    const meta = await this.#getSnapshotMeta(block);
+    if (typeof meta !== 'undefined') {
+      // no-op, we already have a snapshot
+      return new AppendOnlySnapshot(this.db, block, meta.numLeaves, meta.root, this.tree, this.hasher);
+    }
+
+    const batch = this.db.batch();
+    const root = this.tree.getRoot(false);
+    const depth = this.tree.getDepth();
+    const treeName = this.tree.getName();
+    const queue: [Buffer, number, bigint][] = [[root, 0, 0n]];
+
+    // walk the tree in BF and store latest nodes
+    while (queue.length > 0) {
+      const [node, level, index] = queue.shift()!;
+
+      const historicalValue = await this.db.get(historicalNodeKey(treeName, level, index)).catch(() => undefined);
+      if (!historicalValue || !node.equals(historicalValue)) {
+        // we've never seen this node before or it's different than before
+        // update the historical tree and tag it with the block that modified it
+        batch.put(nodeModifiedAtBlockKey(treeName, level, index), String(block));
+        batch.put(historicalNodeKey(treeName, level, index), node);
+      } else {
+        // if this node hasn't changed, that means, nothing below it has changed either
+        continue;
+      }
+
+      if (level + 1 > depth) {
+        // short circuit if we've reached the leaf level
+        // otherwise getNode might throw if we ask for the children of a leaf
+        continue;
+      }
+
+      // these could be undefined because zero hashes aren't stored in the tree
+      const [lhs, rhs] = await Promise.all([
+        this.tree.getNode(level + 1, 2n * index),
+        this.tree.getNode(level + 1, 2n * index + 1n),
+      ]);
+
+      if (lhs) {
+        queue.push([lhs, level + 1, 2n * index]);
+      }
+
+      if (rhs) {
+        queue.push([rhs, level + 1, 2n * index + 1n]);
+      }
+    }
+
+    const numLeaves = this.tree.getNumLeaves(false);
+    batch.put(snapshotNumLeavesKey(treeName, block), String(numLeaves));
+    batch.put(snapshotRootKey(treeName, block), root);
+    await batch.write();
+
+    return new AppendOnlySnapshot(this.db, block, numLeaves, root, this.tree, this.hasher);
+  }
+
+  async #getSnapshotMeta(block: number): Promise<
+    | {
+        /** The root of the tree snapshot */
+        root: Buffer;
+        /** The number of leaves in the tree snapshot */
+        numLeaves: bigint;
+      }
+    | undefined
+  > {
+    try {
+      const treeName = this.tree.getName();
+      const root = await this.db.get(snapshotRootKey(treeName, block));
+      const numLeaves = BigInt(await this.db.get(snapshotNumLeavesKey(treeName, block)));
+      return { root, numLeaves };
+    } catch (err) {
+      return undefined;
+    }
+  }
+}
+
+/**
+ * a
+ */
+class AppendOnlySnapshot implements TreeSnapshot {
+  constructor(
+    private db: LevelUp,
+    private block: number,
+    private leafCount: bigint,
+    private historicalRoot: Buffer,
+    private tree: TreeBase & AppendOnlyTree,
+    private hasher: Hasher,
+  ) {}
+
+  public async getSiblingPath<N extends number>(index: bigint): Promise<SiblingPath<N>> {
+    const path: Buffer[] = [];
+    const depth = this.tree.getDepth();
+    let level = depth;
+
+    while (level > 0) {
+      const isRight = index & 0x01n;
+      const siblingIndex = isRight ? index - 1n : index + 1n;
+
+      const sibling = await this.#getHistoricalNodeValue(level, siblingIndex);
+      path.push(sibling);
+
+      level -= 1;
+      index >>= 1n;
+    }
+
+    return new SiblingPath<N>(depth as N, path);
+  }
+
+  getDepth(): number {
+    return this.tree.getDepth();
+  }
+
+  getNumLeaves(): bigint {
+    return this.leafCount;
+  }
+
+  getRoot(): Buffer {
+    // we could recompute it, but it's way cheaper to just store the root
+    return this.historicalRoot;
+  }
+
+  async getLeafValue(index: bigint): Promise<Buffer | undefined> {
+    const leafLevel = this.getDepth();
+    const blockNumber = await this.#getBlockNumberThatModifiedNode(leafLevel, index);
+
+    // leaf hasn't been set yet
+    if (typeof blockNumber === 'undefined') {
+      return undefined;
+    }
+
+    // leaf was set some time in the past
+    if (blockNumber <= this.block) {
+      return this.db.get(historicalNodeKey(this.tree.getName(), leafLevel, index));
+    }
+
+    // leaf has been set but in a block in the future
+    return undefined;
+  }
+
+  async #getHistoricalNodeValue(level: number, index: bigint): Promise<Buffer> {
+    const blockNumber = await this.#getBlockNumberThatModifiedNode(level, index);
+
+    // node has never been set
+    if (typeof blockNumber === 'undefined') {
+      return this.tree.getZeroHash(level);
+    }
+
+    // node was set some time in the past
+    if (blockNumber <= this.block) {
+      return this.db.get(historicalNodeKey(this.tree.getName(), level, index));
+    }
+
+    // the node has been modified since this snapshot was taken
+    // because we're working with an AppendOnly tree, historical leaves never change
+    // so what we do instead is rebuild this Merkle path up using zero hashes as needed
+    // worst case this will do O(H) hashes
+    //
+    // we first check if this subtree was touched by the block
+    // compare how many leaves this block added to the leaf interval of this subtree
+    // if they don't intersect then the whole subtree was a hash of zero
+    // if they do then we need to rebuild the merkle tree
+    const depth = this.tree.getDepth();
+    const leafStart = index * 2n ** BigInt(depth - level);
+    if (leafStart >= this.leafCount) {
+      return this.tree.getZeroHash(level);
+    }
+
+    const [lhs, rhs] = await Promise.all([
+      this.#getHistoricalNodeValue(level + 1, 2n * index),
+      this.#getHistoricalNodeValue(level + 1, 2n * index + 1n),
+    ]);
+
+    return this.hasher.hash(lhs, rhs);
+  }
+
+  async #getBlockNumberThatModifiedNode(level: number, index: bigint): Promise<number | undefined> {
+    try {
+      const value: Buffer | string = await this.db.get(nodeModifiedAtBlockKey(this.tree.getName(), level, index));
+      return parseInt(value.toString(), 10);
+    } catch (err) {
+      return undefined;
+    }
+  }
+}