Skip to content

Commit

Permalink
feat: add tree snapshots (#3468)
Browse files Browse the repository at this point in the history
This PR adds two different algorithms to create snapshots of merkle
trees.

## Notation

N - number of non-zero nodes
H - height of tree
M - number of snapshots

## ~Incremental~ Full snapshots

This algorithm stores the trees in a database instance in the same way
they would be stored using pointers in memory. Each node has two
key-value entries in the database: one for its left child and one for
its right child.

Taking a snapshot means just walking the tree (BF order), storing any
new nodes. If we hit a node that already exists in the database, that
means the subtree rooted at that node has not changed and so we can skip
checking it.

Building a sibling path is just a traversal of the tree from historic
root to historic leaf.

Pros:
1. its generic enough that it works with any type of merkle tree (one
caveat: we'll need an extension to store historic leaf values for
indexed-trees)
2. it shares structure with other versions of the tree: if some subtree
hasn't changed between two snapshots then it will be reused for both
snapshots (would work well for append-only trees)
3. getting a historic sibling path is extremely cheap only requiring
O(H) database reads

Cons:
1. it takes up a space to store every snapshot. Worst case scenario it
would take up an extra O(N * M) space (ie. the tree changes entirely
after every snapshot). For append-only trees it would be more
space-efficient (e.g. once we start filling the right subtree, the left
subtree will be shared by every future snapshot).

More details in this comment
#3207 (comment).

## Append-only snapshots

For append-only trees we can have an infinite number of snapshots with
just O(N + M) extra space (N - number of nodes in the tree, M - number
of snapshots) at the cost of sibling paths possibly requiring O(H)
hashes.

This way of storing snapshots only works for `AppendOnlyTree` and
exploits two properties of this type of tree:

1. once a leaf is set, it can never be changed (i.e. the tree is
append-only). This property can be generalised to: once a subtree is
completely filled then none of its nodes can ever change.
2. there are no gaps in the leaves: at any moment in time T we can say
that the first K leaves of the tree are filled and the last 2^H-K leaves
are zeroes.

The algorithm stores a copy of the tree in the database:
- the last snapshot of that node (ie v1, v2, etc)
- the node value at that snapshot

Taking the snapshot is also a BF traversal of the tree, comparing the
current value of nodes with its previously stored value. If the value is
different we update both entries. If the values are the same then the
node hasn't changed and by the first property, none of the nodes in its
subtree have changed so it returns early. For each snapshot we also
store how many leaves have been filled (e.g. at v1 we had 10 leaves, at
v2 33 leaves, etc)

Building a sibling path is more involved and could potentially require
O(H) hashes:
1. walk the historic tree from leaf to root
2. check the sibling's last snapshot
- if it's before the target snapshot version then we can safely use the
node (e.g. node A was last changed at time T3 and we're building a proof
for the tree at time T10, then by property 1 we can conclude that
neither node A nor the subtree rooted at A will ever change at any
moment in time Tn where n > 3)
- if the node has changed then we have to rebuild its historic value at
the snapshot we're interested in
- to do this we check how "wide" the subtree rooted at that node is
(e.g. what's the first leaf index of the subtree) and if it intersects
with the filled leaf set at that snapshot. If it doesn't intersect at
all (e.g. my subtree's first leaf is 11 but only 5 leaves were filled at
the time) then we can safely conclude that the whole subtree has some
hash of zeros
- if it does intersect go down one level and level apply step 2 again.
- the invariant is: we will either reach a leaf and that leaf was
changed in the version we're interested in, or we reach a node that was
changed before the version we're interested in (and we return early) or
we reach a node that was historically a hash of zero

Two (big) SVGs showing the sibling path algorithm step-by-step

[Average
case](https://github.com/AztecProtocol/aztec-packages/assets/3816165/b87fa6eb-bcf4-42ca-879d-173a76d802bb)


[Drawing of sibling path algorithm in the worst
case](https://github.com/AztecProtocol/aztec-packages/assets/3816165/dd3788ec-6357-4fab-bf78-3496a2948040)


Pros:
- low space requirements only needing O(N+M) extra space.

Cons:
- generating a sibling path involves mixed workload: partly reading from
the database and partly hashing. Worst case scenario O(H) db reads +
O(H) hashing
- doesn't work for `IndexedTree`s because even though it only supports
appending new leaves, internally it updates its leaf values.

Fix #3207

---------

Co-authored-by: PhilWindle <60546371+PhilWindle@users.noreply.github.com>
  • Loading branch information
alexghr and PhilWindle authored Dec 1, 2023
1 parent 8b6688a commit 7a86bb3
Show file tree
Hide file tree
Showing 21 changed files with 1,235 additions and 8 deletions.
6 changes: 5 additions & 1 deletion yarn-project/merkle-tree/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,12 @@ export * from './interfaces/merkle_tree.js';
export * from './interfaces/update_only_tree.js';
export * from './pedersen.js';
export * from './sparse_tree/sparse_tree.js';
export * from './standard_indexed_tree/standard_indexed_tree.js';
export { LowLeafWitnessData, StandardIndexedTree } from './standard_indexed_tree/standard_indexed_tree.js';
export * from './standard_tree/standard_tree.js';
export { INITIAL_LEAF } from './tree_base.js';
export { newTree } from './new_tree.js';
export { loadTree } from './load_tree.js';
export * from './snapshots/snapshot_builder.js';
export * from './snapshots/full_snapshot.js';
export * from './snapshots/append_only_snapshot.js';
export * from './snapshots/indexed_tree_snapshot.js';
3 changes: 2 additions & 1 deletion yarn-project/merkle-tree/src/interfaces/append_only_tree.ts
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
import { TreeSnapshotBuilder } from '../snapshots/snapshot_builder.js';
import { MerkleTree } from './merkle_tree.js';

/**
* A Merkle tree that supports only appending leaves and not updating existing leaves.
*/
export interface AppendOnlyTree extends MerkleTree {
export interface AppendOnlyTree extends MerkleTree, TreeSnapshotBuilder {
/**
* Appends a set of leaf values to the tree.
* @param leaves - The set of leaves to be appended.
Expand Down
3 changes: 2 additions & 1 deletion yarn-project/merkle-tree/src/interfaces/update_only_tree.ts
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
import { LeafData } from '@aztec/types';

import { TreeSnapshotBuilder } from '../snapshots/snapshot_builder.js';
import { MerkleTree } from './merkle_tree.js';

/**
* A Merkle tree that supports updates at arbitrary indices but not appending.
*/
export interface UpdateOnlyTree extends MerkleTree {
export interface UpdateOnlyTree extends MerkleTree, TreeSnapshotBuilder {
/**
* Updates a leaf at a given index in the tree.
* @param leaf - The leaf value to be updated.
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
import levelup, { LevelUp } from 'levelup';

import { Pedersen, StandardTree, newTree } from '../index.js';
import { createMemDown } from '../test/utils/create_mem_down.js';
import { AppendOnlySnapshotBuilder } from './append_only_snapshot.js';
import { describeSnapshotBuilderTestSuite } from './snapshot_builder_test_suite.js';

describe('AppendOnlySnapshot', () => {
let tree: StandardTree;
let snapshotBuilder: AppendOnlySnapshotBuilder;
let db: LevelUp;

beforeEach(async () => {
db = levelup(createMemDown());
const hasher = new Pedersen();
tree = await newTree(StandardTree, db, hasher, 'test', 4);
snapshotBuilder = new AppendOnlySnapshotBuilder(db, tree, hasher);
});

describeSnapshotBuilderTestSuite(
() => tree,
() => snapshotBuilder,
async tree => {
const newLeaves = Array.from({ length: 2 }).map(() => Buffer.from(Math.random().toString()));
await tree.appendLeaves(newLeaves);
},
);
});
232 changes: 232 additions & 0 deletions yarn-project/merkle-tree/src/snapshots/append_only_snapshot.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,232 @@
import { Hasher, SiblingPath } from '@aztec/types';

import { LevelUp } from 'levelup';

import { AppendOnlyTree } from '../interfaces/append_only_tree.js';
import { TreeBase } from '../tree_base.js';
import { TreeSnapshot, TreeSnapshotBuilder } from './snapshot_builder.js';

// stores the last block that modified this node
const nodeModifiedAtBlockKey = (treeName: string, level: number, index: bigint) =>
`snapshot:node:${treeName}:${level}:${index}:block`;

// stores the value of the node at the above block
const historicalNodeKey = (treeName: string, level: number, index: bigint) =>
`snapshot:node:${treeName}:${level}:${index}:value`;

// metadata for a snapshot
const snapshotRootKey = (treeName: string, block: number) => `snapshot:root:${treeName}:${block}`;
const snapshotNumLeavesKey = (treeName: string, block: number) => `snapshot:numLeaves:${treeName}:${block}`;

/**
* A more space-efficient way of storing snapshots of AppendOnlyTrees that trades space need for slower
* sibling path reads.
*
* Complexity:
*
* N - count of non-zero nodes in tree
* M - count of snapshots
* H - tree height
*
* Space complexity: O(N + M) (N nodes - stores the last snapshot for each node and M - ints, for each snapshot stores up to which leaf its written to)
* Sibling path access:
* Best case: O(H) database reads + O(1) hashes
* Worst case: O(H) database reads + O(H) hashes
*/
export class AppendOnlySnapshotBuilder implements TreeSnapshotBuilder {
constructor(private db: LevelUp, private tree: TreeBase & AppendOnlyTree, private hasher: Hasher) {}
async getSnapshot(block: number): Promise<TreeSnapshot> {
const meta = await this.#getSnapshotMeta(block);

if (typeof meta === 'undefined') {
throw new Error(`Snapshot for tree ${this.tree.getName()} at block ${block} does not exist`);
}

return new AppendOnlySnapshot(this.db, block, meta.numLeaves, meta.root, this.tree, this.hasher);
}

async snapshot(block: number): Promise<TreeSnapshot> {
const meta = await this.#getSnapshotMeta(block);
if (typeof meta !== 'undefined') {
// no-op, we already have a snapshot
return new AppendOnlySnapshot(this.db, block, meta.numLeaves, meta.root, this.tree, this.hasher);
}

const batch = this.db.batch();
const root = this.tree.getRoot(false);
const depth = this.tree.getDepth();
const treeName = this.tree.getName();
const queue: [Buffer, number, bigint][] = [[root, 0, 0n]];

// walk the tree in BF and store latest nodes
while (queue.length > 0) {
const [node, level, index] = queue.shift()!;

const historicalValue = await this.db.get(historicalNodeKey(treeName, level, index)).catch(() => undefined);
if (!historicalValue || !node.equals(historicalValue)) {
// we've never seen this node before or it's different than before
// update the historical tree and tag it with the block that modified it
batch.put(nodeModifiedAtBlockKey(treeName, level, index), String(block));
batch.put(historicalNodeKey(treeName, level, index), node);
} else {
// if this node hasn't changed, that means, nothing below it has changed either
continue;
}

if (level + 1 > depth) {
// short circuit if we've reached the leaf level
// otherwise getNode might throw if we ask for the children of a leaf
continue;
}

// these could be undefined because zero hashes aren't stored in the tree
const [lhs, rhs] = await Promise.all([
this.tree.getNode(level + 1, 2n * index),
this.tree.getNode(level + 1, 2n * index + 1n),
]);

if (lhs) {
queue.push([lhs, level + 1, 2n * index]);
}

if (rhs) {
queue.push([rhs, level + 1, 2n * index + 1n]);
}
}

const numLeaves = this.tree.getNumLeaves(false);
batch.put(snapshotNumLeavesKey(treeName, block), String(numLeaves));
batch.put(snapshotRootKey(treeName, block), root);
await batch.write();

return new AppendOnlySnapshot(this.db, block, numLeaves, root, this.tree, this.hasher);
}

async #getSnapshotMeta(block: number): Promise<
| {
/** The root of the tree snapshot */
root: Buffer;
/** The number of leaves in the tree snapshot */
numLeaves: bigint;
}
| undefined
> {
try {
const treeName = this.tree.getName();
const root = await this.db.get(snapshotRootKey(treeName, block));
const numLeaves = BigInt(await this.db.get(snapshotNumLeavesKey(treeName, block)));
return { root, numLeaves };
} catch (err) {
return undefined;
}
}
}

/**
* a
*/
class AppendOnlySnapshot implements TreeSnapshot {
constructor(
private db: LevelUp,
private block: number,
private leafCount: bigint,
private historicalRoot: Buffer,
private tree: TreeBase & AppendOnlyTree,
private hasher: Hasher,
) {}

public async getSiblingPath<N extends number>(index: bigint): Promise<SiblingPath<N>> {
const path: Buffer[] = [];
const depth = this.tree.getDepth();
let level = depth;

while (level > 0) {
const isRight = index & 0x01n;
const siblingIndex = isRight ? index - 1n : index + 1n;

const sibling = await this.#getHistoricalNodeValue(level, siblingIndex);
path.push(sibling);

level -= 1;
index >>= 1n;
}

return new SiblingPath<N>(depth as N, path);
}

getDepth(): number {
return this.tree.getDepth();
}

getNumLeaves(): bigint {
return this.leafCount;
}

getRoot(): Buffer {
// we could recompute it, but it's way cheaper to just store the root
return this.historicalRoot;
}

async getLeafValue(index: bigint): Promise<Buffer | undefined> {
const leafLevel = this.getDepth();
const blockNumber = await this.#getBlockNumberThatModifiedNode(leafLevel, index);

// leaf hasn't been set yet
if (typeof blockNumber === 'undefined') {
return undefined;
}

// leaf was set some time in the past
if (blockNumber <= this.block) {
return this.db.get(historicalNodeKey(this.tree.getName(), leafLevel, index));
}

// leaf has been set but in a block in the future
return undefined;
}

async #getHistoricalNodeValue(level: number, index: bigint): Promise<Buffer> {
const blockNumber = await this.#getBlockNumberThatModifiedNode(level, index);

// node has never been set
if (typeof blockNumber === 'undefined') {
return this.tree.getZeroHash(level);
}

// node was set some time in the past
if (blockNumber <= this.block) {
return this.db.get(historicalNodeKey(this.tree.getName(), level, index));
}

// the node has been modified since this snapshot was taken
// because we're working with an AppendOnly tree, historical leaves never change
// so what we do instead is rebuild this Merkle path up using zero hashes as needed
// worst case this will do O(H) hashes
//
// we first check if this subtree was touched by the block
// compare how many leaves this block added to the leaf interval of this subtree
// if they don't intersect then the whole subtree was a hash of zero
// if they do then we need to rebuild the merkle tree
const depth = this.tree.getDepth();
const leafStart = index * 2n ** BigInt(depth - level);
if (leafStart >= this.leafCount) {
return this.tree.getZeroHash(level);
}

const [lhs, rhs] = await Promise.all([
this.#getHistoricalNodeValue(level + 1, 2n * index),
this.#getHistoricalNodeValue(level + 1, 2n * index + 1n),
]);

return this.hasher.hash(lhs, rhs);
}

async #getBlockNumberThatModifiedNode(level: number, index: bigint): Promise<number | undefined> {
try {
const value: Buffer | string = await this.db.get(nodeModifiedAtBlockKey(this.tree.getName(), level, index));
return parseInt(value.toString(), 10);
} catch (err) {
return undefined;
}
}
}
Loading

0 comments on commit 7a86bb3

Please sign in to comment.