State trie storage #1063

Mirko-von-Leipzig · 2023-05-10T13:29:23Z

Mirko-von-Leipzig
May 10, 2023

Background

Starknet state is made provable using patricia-merkle tries. This trie is DAG containing two internal node variants - Binary and Edge. These get used to traverse a key's path along the DAG until one arrives at the leaf node containing the value. At its heart is it a kv-store with merkle paths i.e. its hashes all the way down.

Currently storing the trie's account for ~70% of our storage aka its huge.

Starknet has several such tries who's root node becomes a commitment to the data it is trie'ing over.

Transactions. The trie over all the transactions in a block, forming the transaction commitment used in the block header.
Contract storage. Each contract has a trie over its storage elements, key = storage address, value = storage value.
Global storage. A trie consolidating every contract storage trie, key = contract address, value = state hash. state hash is a mapping involving class, contract storage commitment and nonce. It is not entirely relevant to this discussion so I won't go into more detail.
Class trie. A trie over all declared Cairo 1 classes. key = class hash, value = H(CLASS_LEAF_VERSION, casm hash)

Status Quo

The transaction trie is never persisted, but is calculated ephemerally for verification and then discarded.

Each other type of trie has its own table, i.e. there are three tables storing node data. This means all contract storage tries share a single table.

We also store the tries for all time. i.e. we so far have never removed any. This is mostly due to us using them for our RPC methods, however with #1038 we no longer have this requirement. This discussion is about how we can leverage this new freedom to reduce storage requirements.

Use cases

The class and storage commitment tries have one main purpose -- trustless verification of the starknet state. For this purpose we only need the very latest state trie to build on or to reorg from.

The other purpose is to provide merkle proofs. Merkle proofs are currently only used by pathfinder_getProof, but a more important future use-case is to provide state trie chunk proofs for nodes trying to fast sync via p2p.

What do we want to provide

We want access to the latest state trie set in order to sync.
We want to support pathfinder_getProof for some N most blocks.
We want to support chunk proofs for p2p such as they are useful for nodes that don't fully trust us or the network.
Less storage used and less reads / writes per block

Proposal

To support (1) and reduce our storage IO, we only persist to disk every Nth block. The latest N diffs are stored in memory until it is time to persist them. Alternatively, we can persist every time some maximum diff size is achieved i.e. limit by how-much memory is used.
We prune (delete) older versions of the tries (currently we persist them all). This will reduce our storage requirements dramatically, and possibly speed up lookups due to smaller tables etc.

pathfinder_getProof archive support can be configured by specifying how quickly to prune older tries. For (1) we only need a single trie copy, but the more block data we retain, the less we have to re-calculate for older proofs. We can make this configurable as well.

For chunk proofs, I think a good middle-ground is to enforce the storage of at least one trie that is proven on L1. This can then be very trusted by peers with low chance of failure. Alternatively, depending on how large L1 to L2 gap is, maybe we just rely on some latest-N again.

Pruning & actual storage

Turns out pruning a forest of DAGs is not a lot of fun ito disk IO. Let's say you want to remove a specific trie. You start at the root node, traverse it and delete its children recursively. The issue is that these nodes may be used in tries.

Here are some known strategies that work.

Perform an offline prune operation (used by geth I think). This essentially performs a reachability scan of the nodes and deletes those that are no longer reachable from any roots. This takes long, so we perform it offline. In theory, one could probably do a smart system that utilizes block downtime, but that seems dodgy to me. A advantage is that no extra DAG meta-data needs to be stored, so this is the most compact format ito storage space.
Stop node
Start pathfinder --prune
Wait several hours
Use absolute addressing. Here each node gets a fully unique ID, even if it has the same hash. A node's ID might include: trie ID, block number, bit path. This uses the most space, but the benefit is you can simply delete a node without any additional checks as every "duplicate" node will have its own ID and be a separate entity. This is used by juno atm.
Use reference counting. Requires an additional count column, and makes deleting a recursive operation - decrement ref count, if zero, decrement children, repeat. A benefit is that all tries can share the same table without much additional though required.

We used to have (3) for inserting (but no deleting, since that wasn't done). However I removed it, thinking we could simplify without it. Turns out we cannot 😬 I suggest we do reference counting again, with less disk IO by only persisting at selected blocks. And somehow smarter sql queries? We previously found that persisting trie data was causing huge disk IO.

Storage considerations

We should consider using a more hash-lookup specialized database for storing trie node data. However, this can be left for a future exercise, unless somehow has some good ideas right away.

The other consideration is whether storing the full path can somehow be made more storage efficient - I don't know how kv stores work wrt storing a key like "{block_number}_{contract_address}_{node_path}" = {node_data}, but if they hash the key to a const u64 size or something then maybe this way is even better?

How does reference counting work?

Inserting or removing an instance of a state trie is a recursive procedure. Note that there is no difference between doing so for a full trie, or a subtrie within the trie -- both are just about inserting / removing a root and its children.

For insertion:

if the root does not exist, recurse for each of its children
increment the root's reference count, or insert with reference count set to 1 if it does not exist

For deletion:

Decrement the reference count
If reference count is zero, recurse for each of its children

Note this procedure also applies if we are storing multiple such trie's in memory.

Mirko-von-Leipzig · 2024-05-23T06:31:23Z

Mirko-von-Leipzig
May 23, 2024
Author

Implemented :) And even better - no reference counting required!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

State trie storage #1063

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

State trie storage #1063

Mirko-von-Leipzig May 10, 2023

Background

Status Quo

Use cases

What do we want to provide

Proposal

Pruning & actual storage

Storage considerations

How does reference counting work?

Replies: 1 comment

Mirko-von-Leipzig May 23, 2024 Author

Mirko-von-Leipzig
May 10, 2023

Mirko-von-Leipzig
May 23, 2024
Author