From 632993b139d5ffd8510d7060e7f1fc995b45b7aa Mon Sep 17 00:00:00 2001 From: Erik de Castro Lopo Date: Thu, 3 Jun 2021 15:56:36 +1000 Subject: [PATCH] Add documentation for state snapshot operations --- Readme.md | 1 + doc/state-snapshot.md | 53 +++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 54 insertions(+) create mode 100644 doc/state-snapshot.md diff --git a/Readme.md b/Readme.md index 657463fa7..0f16cb57f 100644 --- a/Readme.md +++ b/Readme.md @@ -75,6 +75,7 @@ possible solutions. * [Example SQL queries][ExampleQueries]: Some example SQL and Haskell/Esqueleto queries. * [PoolOfflineData][PoolOfflineData]: Explanation of how stake pool offline data is retried. * [SchemaManagement][Schema Management]: How the database schema is managed and modified. +* [StateSnapshot][StateSnapshot]: Document the creation and restoration of state snapshot files. * [SQL DB Schema][DB Schema]: The current PostgreSQL DB schema, as generated by the code. * [Validation][Validation]: Explanation of validation done by the db-sync node and assumptions made. diff --git a/doc/state-snapshot.md b/doc/state-snapshot.md new file mode 100644 index 000000000..3e402c1a1 --- /dev/null +++ b/doc/state-snapshot.md @@ -0,0 +1,53 @@ +# State Snapshot + +As the size of the blockchain itself and the number of transactions and other data on the chain +increases, the time required to sync the full chain increases. At epoch 266 it was about 18 hours. +The other issue is that most major upgrades also update the database schema meaning the database +needs to be synced from scatch. + +To overcome these issues, we are providing a `cardano-db-sync` state snapshot, which should +drastically reduce the time required to get `db-sync` back up and running after the database is +dropped and recreated. This snaopshot is compatible with both `cardano-db-sync` and +`cardano-db-sync-extended` (which maintains an extra `epoch` table). + +**Note:** It is not possible to create a snapshot from one version of the database schema and +restore it so it can be used with a `db-sync` that uses another version of the schema. + +All of the following assumes that the executable `cardano-db-tool` and the script +`postgresql-setup.sh` is available on the machine where the snapshot is being created or restored. + +Currently (at epoch 269), creating a snapshot takes about 15 minutes and restoring one takes about +45 minutes. + +## Things to note: +* Snapshots (because they depend on the database schema) are not portable across `db-sync` versions. +* Snapshots (because they include a snapshot of the ledger state) are not portable across CPU + architectures (ie it is not possible to create a snapshot on `x86_64` and expect it to work + correctly on say `arm64`). +* Creating and restoring snapshots requires significant amounts of free disk space (at epoch 269 + it required about 10G). If there is insufficient disk space, `gzip` can give some odd error + messages. + +# Creating a Snapshot + +To create a snapshot, the `cardano-db-sync` executable should be stopped. Taking a snapshot is +then a two step process: + +``` +PGPASSFILE=config/pgpass-mainnet cardano-db-tool prepare-snapshot --state-dir ledger-state/mainnet/ +``` +which will then print out the command (combining the database schema version with the block number +in the database with the slot number used by the ledger state and the ) required to generated the snapshot: +``` +PGPASSFILE=config/pgpass-mainnet scripts/postgresql-setup.sh --create-snapshot \ + db-sync-snapshot-schema-9-block-5796064-x86_64 ledger-state/mainnet/31021676-f3873e4bec.lstate +``` + +# Restoring from a Snapshot + +Restoring the state from a snapsot will drop the current database, recreate the tables and then +populate them. It can be done as simply as: +``` +PGPASSFILE=config/pgpass-mainnet scripts/postgresql-setup.sh --restore-snapshot \ + db-sync-snapshot-schema-9-block-5796064-x86_64 ledger-state/mainnet +```