Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persistent State Storage for Improved State Syncing #4

Open
Tracked by #56
bkchr opened this issue Jun 3, 2023 · 1 comment
Open
Tracked by #56

Persistent State Storage for Improved State Syncing #4

bkchr opened this issue Jun 3, 2023 · 1 comment
Labels
I5-enhancement An additional feature request.

Comments

@bkchr
Copy link
Member

bkchr commented Jun 3, 2023

Currently, during the state syncing process from the network, our node retains its internal state, including downloaded portions of the blockchain state and the target block, in memory. This poses a significant challenge: if the node restarts mid-sync, all the data must be re-downloaded.

To improve this process, I propose persistently storing the internal state outside of memory. This could be in a database or a plain file, allowing us to preserve the internal state across node restarts. This would be especially beneficial when handling a large blockchain state, which doesn't need to remain in memory.

However, one potential issue to consider is if the node is offline for an extended period, the state of the block may no longer be available on full nodes within the network, but only on archive nodes. To mitigate this, we could consider the implementation suggested in Issue paritytech/polkadot-sdk#523.

@the-right-joyce the-right-joyce transferred this issue from paritytech/substrate Aug 24, 2023
@the-right-joyce the-right-joyce added I5-enhancement An additional feature request. and removed J0-enhancement labels Aug 25, 2023
fixxxedpoint pushed a commit to fixxxedpoint/polkadot-sdk that referenced this issue Jun 19, 2024
…) (paritytech#4)

This PR is fixing a bug in the sync mechanism between wasmi and
pallet-contracts. This bug leads to essentially double charging all the
gas that was used during the execution of the host function. When the
`call` host function is used for recursion this will lead to a quadratic
amount of gas consumption with regard to the nesting depth.We also took
the chance to refactor the code in question and improve the rust docs.

The bug was caused by not updating `GasMeter::executor_consumed`
(previously `engine_consumed`) when leaving the host function. This lead
to the value being stale (too low) when entering another host function.

---------

Co-authored-by: Alexander Theißen <alex.theissen@me.com>
Co-authored-by: PG Herveou <pgherveou@gmail.com>
liuchengxu added a commit to subcoin-project/Grants-Program that referenced this issue Aug 22, 2024
In light of recent developments, it has become evident that fully syncing
to the tip of the Bitcoin network and enabling new nodes to perform fast
sync to the latest Bitcoin state is more challenging than initially anticipated,
caused by the huge state of UTXO set (over 12GiB). As a result, I propose
adjusting the delivery goal for this milestone.

The most significant known blocker is paritytech/polkadot-sdk#4.
Other underlying issues may also contribute to the difficulty. Recent experiments
have shown that fast sync from around block height 580,000 is currently infeasible,
succeeding only on machines with 128GiB of memory (paritytech/polkadot-sdk#5053 (comment)),
which is impractical for most users. Nevertheless, we have successfully demonstrated that
decentralized fast sync is possible within a prototype implementation.

While syncing to the Bitcoin network's tip remains a future target, addressing
the existing technical challenges will require substantial R&D efforts.
We remain committed to exploring potential solutions, including architectural
changes and contributing to resolving issue paritytech/polkadot-sdk#4,
Noc2 pushed a commit to w3f/Grants-Program that referenced this issue Aug 23, 2024
In light of recent developments, it has become evident that fully syncing
to the tip of the Bitcoin network and enabling new nodes to perform fast
sync to the latest Bitcoin state is more challenging than initially anticipated,
caused by the huge state of UTXO set (over 12GiB). As a result, I propose
adjusting the delivery goal for this milestone.

The most significant known blocker is paritytech/polkadot-sdk#4.
Other underlying issues may also contribute to the difficulty. Recent experiments
have shown that fast sync from around block height 580,000 is currently infeasible,
succeeding only on machines with 128GiB of memory (paritytech/polkadot-sdk#5053 (comment)),
which is impractical for most users. Nevertheless, we have successfully demonstrated that
decentralized fast sync is possible within a prototype implementation.

While syncing to the Bitcoin network's tip remains a future target, addressing
the existing technical challenges will require substantial R&D efforts.
We remain committed to exploring potential solutions, including architectural
changes and contributing to resolving issue paritytech/polkadot-sdk#4,
liuchengxu added a commit to subcoin-project/polkadot-sdk that referenced this issue Sep 20, 2024
* Set flag --execute-block as true by default

* Ignore *.log

* Finalize blocks with enough confirmations

It's observed that the memory usage could be extremely high without the
finalization, when the chain grows to 220000+. Concretely, the culprit
of the high memory usage is creating `NonCanonicalOverlay`.

There are also a few other improvements to import-blocks command.
@liuchengxu
Copy link
Contributor

liuchengxu commented Oct 26, 2024

Given that the initial step in #5053 (comment) is already underway, as implemented in #5956, I am planning to initiate work on the persistent state sync improvements.

My immediate focus will be on a pure refactoring of the existing state sync process to modularize and clarify the data processing logic. So that it's super clear what data are retained in memory and how they interact. This refactoring will streamline the addition of persistent state storage and ensure a clear, maintainable transition path for future updates.

Rough plan:

  1. Refactor to create a centralized handler for writing state. This handler will later be modified to forward the state key values directly to the DB layer.
  2. Encapsulate StateSyncMetadata, which will be modified for persistent storage later.

github-merge-queue bot pushed a commit that referenced this issue Nov 18, 2024
This pure refactoring of state sync is preparing for
#4. As the rough plan
in
#4 (comment),
there will be two PRs for the state sync refactoring.

This first PR focuses on isolating the function
`process_state_key_values()` as the central point for storing received
state data in memory. This function will later be adapted to forward the
state data directly to the DB layer for persistent sync. A follow-up PR
will handle the encapsulation of `StateSyncMetadata` to support this
persistent storage.

Although there are many commits in this PR, each commit is small and
intentionally incremental to facilitate a smoother review, please review
them commit by commit. Each commit should represent an equivalent
rewrite of the existing logic, with one exception
bb447b2,
which has a slight deviation from the original but is correct IMHO.
Please give this commit special attention during the review.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I5-enhancement An additional feature request.
Projects
Status: Backlog 🗒
Status: backlog
Development

No branches or pull requests

3 participants