This BEP describes state sync enhancement on the BNB Beacon Chain.
State sync is a way to help newly-joined users sync the latest status of the BNB Beacon Chain. It syncs the latest sync-able peer's status so that fullnode user (who wants to catch up with chain as soon as possible with a cost that discards all historical blocks locally) doesn't need sync from block height 0.
BEP-18 Proposal describes an enhancement of existing state sync implementation to improve user experience. The status of the blockchain that can be synced is represented in a "snapshot", which consists of a manifest file and a bunch of snapshot chunk files. The manifest file summarizes version, height, and checksums of snapshot chunk files of this snapshot. The snapshot chunk files contain encoded essential state data to recover a full node.
This BEP introduces the following details:
- What's the procedure to take a snapshot
- What's the procedure to sync snapshot from other peers
- Snapshot (manifest, snapshot chunks) format
This BEP is already implemented.
We propose this BEP to enhance full node user experience (and ease their pain) on using state sync because of the following implementation limitations.
-
Users complain most about state syncing testnet is very slow and usually stuck on some requests.
In this enhancement, we want data to respond more evenly across peers so that syncing can continuously make progress and the overall syncing time can reduce from 30 - 45 min to around 5 min. -
Interruption during state sync (node process get killed because of reboot computer or user impatience) would make already synced data in vain (because the current full node doesn't persist synced part on disk). Worsely it mistakenly writes a lock file prevents user state sync again.
In this enhancement, we want support break-resume downloading and keep the consistent status for arbitrarily restart.
State sync will download manifest and snapshot chunks from other peers.
There are two ways to take snapshots from a fullnode: automatically or manually. Snapshots will be put under $HOME/data/snapshot/<height>
. All types involved in the snapshot are encoded by go-amino and compressed by snappy. More details will be explained later.
- To make fullnode automatically take snapshots, just make sure
state_sync_reactor
in$HOME/config.toml
is set to true. When set automatically snapshot, the fullnode will take a snapshot for blocks with a blocking time of 00:00 UTC each day. No snapshot will be taken for any other blocks during the day. - To manually take snapshots, stop the node if it is running, then run
./bnbchaind snapshot --home <home> --height <height>
.
If the snapshot taking procedure is interrupted, the node will be still in good status, but it cannot provide the interrupted height for other peers to sync.
Note: Automatic snapshot files will keep occupying disk space. Fullnode would not delete them automatically, so the user should periodically delete unneeded snapshots manually if they want to save disk space.
Syncing snapshot is designed to be only run once during full node first start. To enable state sync from others, state_sync_reactor
should be true and state_sync_height
should be set to non-negative (default -1
means disable syncing from others).
If a user wants to sync from (majority) peers' latest sync-able height, they should set state_sync_height
to 0.
Stop and restart fullnode during state sync is allowed. The next time full node is started, it will resume by loading Manifest and downloaded snapshot chunks then download missing snapshot chunks.
Once state sync is successful, a STATESYNC.LOCK
file will be created under $HOME/data
to prevent state sync next time.
Manifest serves as a summary of snapshot chunks to be synced. It also maintains the order and types of snapshot chunks. Fullnode firstly asks peer's for the manifest file at the beginning of state sync and will trust majority peers with the same manifest.
SHA256 hash sum of each chunk synced will be checked against the hash declared within the manifest file.
Field | Type | Description |
---|---|---|
Version | int32 | snapshot version |
Height | int64 | height of this snapshot |
StateHashes | []SHA256Sum | hashes of tendermint state chunks |
AppStateHashes | []SHA256Sum | hashes of app state chunks |
BlockHashes | []SHA256Sum | hashes of the blocks in this snapshot, currently only the block of requested height is synced. This synced block is needed mainly to make sure local databases are consistent with each other after state sync. It also provides block metadata like a timestamp for tendermint abci application. |
NumKeys | []int64 | number of keys for each sub-store. |
App state chunk includes iavl tree nodes. Usually, each app state chunk takes up to 4MB serialized iavl tree nodes (before snappy compression).
Iavl tree node bigger than 4MB is split into different incomplete chunks, that's where Completeness
field effect.
Field | Type | Description |
---|---|---|
StartIdx | int64 | compare (startIdx and number of complete nodes) against (Manifest.NumKeys) we can know each node should be persisted to which application db's sub-store. For example, acc and token store each has 10 and 5 nodes (with NumKeys = []int64{10, 5} in Manifest). An app state chunk whose StartIdx is 0 and completeness is 0 (complete) with 12 nodes, the first of 10 nodes will be persisted to acc store and last 2 nodes will be persisted to token store. After above chunk, there might be 4 app chunks whose StartIdx are 12 , but Completeness would be 1 , 2 , 2 , 3 respectively and each chunk contains only one element in Nodes. The actual 3rd token store iavl tree node should be recovered by combining these 4 chunks' node elements together. At the recovering side, we know the order of 2 middle chunks because their order is kept in the Manifest file. |
Completeness | uint8 | flag of completeness of this chunk, not enum because of go-amino doesn't support enum encoding. possible values: 0 (Complete), 1 (InComplete_First), 2 (InComplete_Mid), 3 (InComplete_Last) the InComplete flags are used to identify continuous large nodes' boundary. |
Nodes | [][]byte | iavl tree serialized node, one big node (i.e. active orders and order book) might be split into different chunks (they share same StartIdx with different completeness flag), the order is ensured in the manifest file |
Field | Type | Description |
---|---|---|
Statepart | []byte | current tendermint state |
Field | Type | Description |
---|---|---|
Block | []byte | amino encoded block |
SeenCommit | []byte | amino encoded Commit we need this because Block keeps seen commit for the last block. To save this block, we need to load and pass it in the same way it was saved |
-
As mentioned in section 5.1 Take snapshot, fullnode cannot delete snapshot directories (
$HOME/data/snapshot/<height>
) automatically. This needs to be noticed by full node users who enabledstate_sync_reactor
. Either run a script periodically delete the snapshots or turn offstate_sync_reactor
(if they want to be selfish!) should be considered. -
Once state sync succeeds, later full node restart would not state sync anymore (in case the local blocks are not continuous).
But if users do want state sync again (don't care that there are missing blocks between last stop and latest state sync snapshot height) and he wants to keep already synced blocks, he should delete $BNCHOME/data/STATESYNC.LOCK
.
The content is licensed under CC0.