Skip to content

Commit

Permalink
docs: add an overview of the creation and querying of snapshots (#5270)
Browse files Browse the repository at this point in the history
  • Loading branch information
joshieDo authored Nov 2, 2023
1 parent 3fc776e commit 9a56e4b
Showing 1 changed file with 88 additions and 0 deletions.
88 changes: 88 additions & 0 deletions crates/snapshot/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
# Snapshot

## Overview

Data that has reached a finalized state and won't undergo further changes (essentially frozen) should be read without concerns of modification. This makes it unsuitable for traditional databases.

This crate aims to copy this data from the current database to multiple static files, aggregated by block ranges. At every 500_000th block new static files are created.

Below are two diagrams illustrating the processes of creating static files (custom format: `NippyJar`) and querying them. A glossary is also provided to explain the different (linked) components involved in these processes.

<details>
<summary>Creation diagram (<code>Snapshotter</code>)</summary>

```mermaid
graph TD;
I("BLOCK_HEIGHT % 500_000 == 0")--triggers-->SP(Snapshotter)
SP --> |triggers| SH["create_snapshot(block_range, SnapshotSegment::Headers)"]
SP --> |triggers| ST["create_snapshot(block_range, SnapshotSegment::Transactions)"]
SP --> |triggers| SR["create_snapshot(block_range, SnapshotSegment::Receipts)"]
SP --> |triggers| ETC["create_snapshot(block_range, ...)"]
SH --> CS["create_snapshot::&lt; T &gt;(DatabaseCursor)"]
ST --> CS
SR --> CS
ETC --> CS
CS --> |create| IF(NippyJar::InclusionFilters)
CS -- iterates --> DC(DatabaseCursor) -->HN{HasNext}
HN --> |true| NJC(NippyJar::Compression)
NJC --> HN
NJC --store--> NJ
HN --> |false| NJ
IF --store--> NJ(NippyJar)
NJ --freeze--> F(File)
F--"on success"--> SP1(Snapshotter)
SP1 --"sends BLOCK_HEIGHT"--> HST(HighestSnapshotTracker)
HST --"read by"-->Pruner
HST --"read by"-->DatabaseProvider
HST --"read by"-->SnapsotProvider
HST --"read by"-->ProviderFactory
```
</details>


<details>
<summary>Query diagram (<code>Provider</code>)</summary>

```mermaid
graph TD;
RPC-->P
P("Provider::header(block_number)")-->PF(ProviderFactory)
PF--shares-->SP1("Arc(SnapshotProvider)")
SP1--shares-->PD(DatabaseProvider)
PF--creates-->PD
PD--check `HighestSnapshotTracker`-->PD
PD-->DC1{block_number <br> > <br> highest snapshot block}
DC1 --> |true| PD1("DatabaseProvider::header(block_number)")
DC1 --> |false| ASP("SnapshotProvider::header(block_number)")
PD1 --> MDBX
ASP --find correct jar and creates--> JP("SnapshotJarProvider::header(block_number)")
JP --"creates"-->SC(SnapshotCursor)
SC --".get_one&lt; HeaderMask&lt; Header &gt; &gt;(number)"--->NJC("NippyJarCursor")
NJC--".row_by_number(row_index, mask)"-->NJ[NippyJar]
NJ--"&[u8]"-->NJC
NJC--"&[u8]"-->SC
SC--"Header"--> JP
JP--"Header"--> ASP
```
</details>


### Glossary
In descending order of abstraction hierarchy:

[`Snapshotter`](../../crates/snapshot/src/snapshotter.rs#L20): A `reth` background service that **copies** data from the database to new snapshot files when the block height reaches a certain threshold (e.g., `500_000th`). Upon completion, it dispatches a notification about the higher snapshotted block to `HighestSnapshotTracker` channel. **It DOES NOT remove data from the database.**

[`HighestSnapshotTracker`](../../crates/snapshot/src/snapshotter.rs#L22): A channel utilized by `Snapshotter` to announce the newest snapshot block to all components with a listener: `Pruner` (to know which additional tables can be pruned) and `DatabaseProvider` (to know which data can be queried from the snapshots).

[`SnapshotProvider`](../../crates/storage/provider/src/providers/snapshot/manager.rs#L15) A provider similar to `DatabaseProvider`, **managing all existing snapshot files** and selecting the optimal one (by range and segment type) to fulfill a request. **A single instance is shared across all components and should be instantiated only once within `ProviderFactory`**. An immutable reference is given everytime `ProviderFactory` creates a new `DatabaseProvider`.

[`SnapshotJarProvider`](../../crates/storage/provider/src/providers/snapshot/jar.rs#L42) A provider similar to `DatabaseProvider` that provides access to a **single snapshot file**.

[`SnapshotCursor`](../../crates/storage/db/src/snapshot/cursor.rs#L12) An elevated abstraction of `NippyJarCursor` for simplified access. It associates the bitmasks with type decoding. For instance, `cursor.get_two::<TransactionMask<Tx, Signature>>(tx_number)` would yield `Tx` and `Signature`, eliminating the need to manage masks or invoke a decoder/decompressor.

[`SnapshotSegment`](../../crates/primitives/src/snapshot/segment.rs#L10) Each snapshot file only contains data of a specific segment, e.g., `Headers`, `Transactions`, or `Receipts`.

[`NippyJarCursor`](../../crates/storage/nippy-jar/src/cursor.rs#L12) Accessor of data in a `NippyJar` file. It enables queries either by row number (e.g., block number 1) or by a predefined key not part of the file (e.g., transaction hashes). If a file has multiple columns (e.g., `Tx | TxSender | Signature`), and one wishes to access only one of the column values, this can be accomplished by bitmasks. (e.g., for `TxSender`, the mask would be `0b010`).

[`NippyJar`](../../crates/storage/nippy-jar/src/lib.rs#57) A create-only file format. No data can be appended after creation. It supports multiple columns, compression (e.g., Zstd (with and without dictionaries), lz4, uncompressed) and inclusion filters (e.g., cuckoo filter: `is hash X part of this dataset`). Snapshots are organized by block ranges. (e.g., `TransactionSnapshot_499_999.jar` contains a transaction per row for all transactions from block `0` to block `499_999`). For more check the struct documentation.

0 comments on commit 9a56e4b

Please sign in to comment.