Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bonsai archive feature #7475

Open
wants to merge 43 commits into
base: main
Choose a base branch
from

Conversation

matthew1001
Copy link
Contributor

@matthew1001 matthew1001 commented Aug 16, 2024

PR description

Introduces a new (experimental) "Bonsai Archive" DB mode which creates a full archive of the chain it syncs with. This allows JSON/RPC calls to be made with historic blocks as context, for example eth_getBalance to get the balance of an account at a historic block, or eth_call to simulate a transaction at a given block in history.

The PR is intended to provide part of the function currently offered by the (now deprecated) FOREST DB mode. Specifically it allows state to be queried at an arbitrary block in history, but does not currently offer eth_getProof for said state. A subsequent PR will implement eth_getProof for historic blocks.

Summary of the overall design & changes

This PR builds on PR #5865 which proved the basic concept of archiving state in the Bonsai flat DB by suffixing entries with the block in which they were changed.

For example the state for account 0x0e79065B5F11b5BD1e62B935A600976ffF3754B9 at block 37834 is stored as

<account-hash><block-num-hex> = 0x9ab656e8fa2a1029964289c9a189083db258ca4b46ebaa374477e069b8f47dec00000000000093ca

In order to minimise performance degradation over time, historic state and storage entries in the DB are "archived" by moving them into a separate DB segment.

Where account state is stored in segment ACCOUNT_INFO_STATE, state that has been archived is stored in ACCOUNT_INFO_STATE_ARCHIVE. Likewise where storage is held in segment ACCOUNT_STORAGE_STORAGE, archived storage entries are stored in ACCOUNT_STORAGE_ARCHIVE.

An example Rocks DB query to retrieve the state of the example account above would be:

ldb --db=. get --column_family=ACCOUNT_INFO_STATE_ARCHIVE --key_hex --value_hex 0x9ab656e8fa2a1029964289c9a189083db258ca4b46ebaa374477e069b8f47dec00000000000093ca

Creating a Bonsai Archive node

The PR introduces an entirely new data storage format (as opposed to making it a configuration option of the existing BONSAI storage format.

To create a bonsai archive node simply set --data-storage-format=x_bonsai_archive when creating it.

An existing FOREST or BONSAI node cannot be migrated to BONSAI_ARCHIVE mode.

Storage requirements

An archive node intrinsically requires more storage space than a non-archive node. Every state update is retained in the archive DB segments as outlined above. An archive node for the holesky testnet as of the raising of this PR requires approximately 160Gi of storage.

Sync time

In order to create an archive of an entire chain, FULL sync mode must be used. This PR does not prevent SNAP syncing an archive node, but this will result in only a partial archive of the chain.

While the node is performing a FULL sync with the chain it is also migrating entries from the regular DB segments to the archive DB segments. Overall this increases the time to create the archive node. For a public chain this might require 1 week or more to complete syncing and archiving.

@matthew1001 matthew1001 force-pushed the multi-version-flat-db-rebase branch 2 times, most recently from 88f3968 to 7d4a524 Compare August 20, 2024 10:19
@matthew1001 matthew1001 changed the title Multi version flat db rebase Bonsai archive feature Sep 4, 2024
@matthew1001 matthew1001 force-pushed the multi-version-flat-db-rebase branch 7 times, most recently from 782ae60 to 5752732 Compare October 2, 2024 17:10
@matthew1001 matthew1001 force-pushed the multi-version-flat-db-rebase branch 4 times, most recently from 5b06b50 to dce531e Compare October 4, 2024 16:11
@matthew1001 matthew1001 marked this pull request as ready for review October 7, 2024 16:31
jframe and others added 15 commits October 8, 2024 08:38
Signed-off-by: Jason Frame <jason.frame@consensys.net>
…se constructor that reuses worldStateStorage so that we don't lose values in the EvmToolSpecTests

Signed-off-by: Jason Frame <jason.frame@consensys.net>
Signed-off-by: Jason Frame <jason.frame@consensys.net>
Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>
…d state, and freeze it

Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>
Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>
…ten for blocks and move account state to new DB segment

Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>
Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>
Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>
…t block state has been frozen for

Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>
Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>
Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>
…age from the freezer segment

Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>
Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>
Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>
Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>
…Use the term archive, not freezer

Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>
Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>
Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>
Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>
…to fail the block

Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>
Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>
Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>
Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>
Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>
Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>
Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>
Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>
Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>
Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>
matthew1001 and others added 5 commits November 5, 2024 16:37
Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>
Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>
Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>
@garyschulte
Copy link
Contributor

Reviewing now. What networks have you synced with this method? Do you have block execution performance metrics?

I will kick off a full sync of mainnet alongside a regular bonsai full sync so we can get a signal about performance comparison. 👍

@matthew1001
Copy link
Contributor Author

So far I've full synced holesky to completion and I'm currently full syncing mainnet. Mainnet is taking a long time, but I don't know if it's longer than it should be or if my Besu/Teku setup is sub-optimal.

Here's a snapshot of the current sync state with mainnet:
image

Yellow line at the top is number of blocks synced. Blue line is number of blocks archived.

@matthew1001
Copy link
Contributor Author

If you're able to try syncing as well it would be useful as a comparison.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants