Skip to content

Commit

Permalink
Add documentation for state snapshot operations
Browse files Browse the repository at this point in the history
  • Loading branch information
erikd committed Jun 3, 2021
1 parent 41fd5c3 commit 632993b
Show file tree
Hide file tree
Showing 2 changed files with 54 additions and 0 deletions.
1 change: 1 addition & 0 deletions Readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,7 @@ possible solutions.
* [Example SQL queries][ExampleQueries]: Some example SQL and Haskell/Esqueleto queries.
* [PoolOfflineData][PoolOfflineData]: Explanation of how stake pool offline data is retried.
* [SchemaManagement][Schema Management]: How the database schema is managed and modified.
* [StateSnapshot][StateSnapshot]: Document the creation and restoration of state snapshot files.
* [SQL DB Schema][DB Schema]: The current PostgreSQL DB schema, as generated by the code.
* [Validation][Validation]: Explanation of validation done by the db-sync node and assumptions made.

Expand Down
53 changes: 53 additions & 0 deletions doc/state-snapshot.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# State Snapshot

As the size of the blockchain itself and the number of transactions and other data on the chain
increases, the time required to sync the full chain increases. At epoch 266 it was about 18 hours.
The other issue is that most major upgrades also update the database schema meaning the database
needs to be synced from scatch.

To overcome these issues, we are providing a `cardano-db-sync` state snapshot, which should
drastically reduce the time required to get `db-sync` back up and running after the database is
dropped and recreated. This snaopshot is compatible with both `cardano-db-sync` and
`cardano-db-sync-extended` (which maintains an extra `epoch` table).

**Note:** It is not possible to create a snapshot from one version of the database schema and
restore it so it can be used with a `db-sync` that uses another version of the schema.

All of the following assumes that the executable `cardano-db-tool` and the script
`postgresql-setup.sh` is available on the machine where the snapshot is being created or restored.

Currently (at epoch 269), creating a snapshot takes about 15 minutes and restoring one takes about
45 minutes.

## Things to note:
* Snapshots (because they depend on the database schema) are not portable across `db-sync` versions.
* Snapshots (because they include a snapshot of the ledger state) are not portable across CPU
architectures (ie it is not possible to create a snapshot on `x86_64` and expect it to work
correctly on say `arm64`).
* Creating and restoring snapshots requires significant amounts of free disk space (at epoch 269
it required about 10G). If there is insufficient disk space, `gzip` can give some odd error
messages.

# Creating a Snapshot

To create a snapshot, the `cardano-db-sync` executable should be stopped. Taking a snapshot is
then a two step process:

```
PGPASSFILE=config/pgpass-mainnet cardano-db-tool prepare-snapshot --state-dir ledger-state/mainnet/
```
which will then print out the command (combining the database schema version with the block number
in the database with the slot number used by the ledger state and the ) required to generated the snapshot:
```
PGPASSFILE=config/pgpass-mainnet scripts/postgresql-setup.sh --create-snapshot \
db-sync-snapshot-schema-9-block-5796064-x86_64 ledger-state/mainnet/31021676-f3873e4bec.lstate
```

# Restoring from a Snapshot

Restoring the state from a snapsot will drop the current database, recreate the tables and then
populate them. It can be done as simply as:
```
PGPASSFILE=config/pgpass-mainnet scripts/postgresql-setup.sh --restore-snapshot \
db-sync-snapshot-schema-9-block-5796064-x86_64 ledger-state/mainnet
```

0 comments on commit 632993b

Please sign in to comment.