Persistent State Storage for Improved State Syncing #4

bkchr · 2023-06-03T21:30:19Z

Currently, during the state syncing process from the network, our node retains its internal state, including downloaded portions of the blockchain state and the target block, in memory. This poses a significant challenge: if the node restarts mid-sync, all the data must be re-downloaded.

To improve this process, I propose persistently storing the internal state outside of memory. This could be in a database or a plain file, allowing us to preserve the internal state across node restarts. This would be especially beneficial when handling a large blockchain state, which doesn't need to remain in memory.

However, one potential issue to consider is if the node is offline for an extended period, the state of the block may no longer be available on full nodes within the network, but only on archive nodes. To mitigate this, we could consider the implementation suggested in Issue paritytech/polkadot-sdk#523.

…) (paritytech#4) This PR is fixing a bug in the sync mechanism between wasmi and pallet-contracts. This bug leads to essentially double charging all the gas that was used during the execution of the host function. When the `call` host function is used for recursion this will lead to a quadratic amount of gas consumption with regard to the nesting depth.We also took the chance to refactor the code in question and improve the rust docs. The bug was caused by not updating `GasMeter::executor_consumed` (previously `engine_consumed`) when leaving the host function. This lead to the value being stale (too low) when entering another host function. --------- Co-authored-by: Alexander Theißen <alex.theissen@me.com> Co-authored-by: PG Herveou <pgherveou@gmail.com>

In light of recent developments, it has become evident that fully syncing to the tip of the Bitcoin network and enabling new nodes to perform fast sync to the latest Bitcoin state is more challenging than initially anticipated, caused by the huge state of UTXO set (over 12GiB). As a result, I propose adjusting the delivery goal for this milestone. The most significant known blocker is paritytech/polkadot-sdk#4. Other underlying issues may also contribute to the difficulty. Recent experiments have shown that fast sync from around block height 580,000 is currently infeasible, succeeding only on machines with 128GiB of memory (paritytech/polkadot-sdk#5053 (comment)), which is impractical for most users. Nevertheless, we have successfully demonstrated that decentralized fast sync is possible within a prototype implementation. While syncing to the Bitcoin network's tip remains a future target, addressing the existing technical challenges will require substantial R&D efforts. We remain committed to exploring potential solutions, including architectural changes and contributing to resolving issue paritytech/polkadot-sdk#4,

* Set flag --execute-block as true by default * Ignore *.log * Finalize blocks with enough confirmations It's observed that the memory usage could be extremely high without the finalization, when the chain grows to 220000+. Concretely, the culprit of the high memory usage is creating `NonCanonicalOverlay`. There are also a few other improvements to import-blocks command.

liuchengxu · 2024-10-26T13:44:57Z

Given that the initial step in #5053 (comment) is already underway, as implemented in #5956, I am planning to initiate work on the persistent state sync improvements.

My immediate focus will be on a pure refactoring of the existing state sync process to modularize and clarify the data processing logic. So that it's super clear what data are retained in memory and how they interact. This refactoring will streamline the addition of persistent state storage and ensure a clear, maintainable transition path for future updates.

Rough plan:

Refactor to create a centralized handler for writing state. This handler will later be modified to forward the state key values directly to the DB layer.
Encapsulate StateSyncMetadata, which will be modified for persistent storage later.

This pure refactoring of state sync is preparing for #4. As the rough plan in #4 (comment), there will be two PRs for the state sync refactoring. This first PR focuses on isolating the function `process_state_key_values()` as the central point for storing received state data in memory. This function will later be adapted to forward the state data directly to the DB layer for persistent sync. A follow-up PR will handle the encapsulation of `StateSyncMetadata` to support this persistent storage. Although there are many commits in this PR, each commit is small and intentionally incremental to facilitate a smoother review, please review them commit by commit. Each commit should represent an equivalent rewrite of the existing logic, with one exception bb447b2, which has a slight deviation from the original but is correct IMHO. Please give this commit special attention during the review.

bkchr added the J0-enhancement label Jun 3, 2023

This was referenced Jun 3, 2023

Optimizing Warp Syncing Support and User Experience paritytech/roadmap#32

Open

warp sync paritytech/substrate#13202

Closed

the-right-joyce transferred this issue from paritytech/substrate Aug 24, 2023

the-right-joyce added I5-enhancement An additional feature request. and removed J0-enhancement labels Aug 25, 2023

bkchr mentioned this issue May 28, 2024

StateStrategy assumes all the state (of a block) fits into RAM #4608

Closed

2 tasks

bkchr mentioned this issue Jul 22, 2024

Using warp sync on parachain triggers OOM #5053

Open

2 tasks

liuchengxu mentioned this issue Aug 11, 2024

Compress the state response to reduce the state sync data transfer #5312

Open

2 tasks

liuchengxu mentioned this issue Aug 22, 2024

Amendment for Subcoin milestone3 w3f/Grants-Program#2376

Merged

liuchengxu mentioned this issue Sep 19, 2024

Tracking Issue: Full Fast Sync Support subcoin-project/subcoin#56

Open

5 tasks

nazar-pc mentioned this issue Sep 23, 2024

Add domain snap sync algorithm autonomys/subspace#3027

Closed

1 task

liuchengxu mentioned this issue Sep 29, 2024

Support updating the trie changes directly into the database #5862

Open

This was referenced Oct 27, 2024

Pure state sync refactoring (part-1) #6249

Merged

Compress the State Response Message in State Sync polkadot-fellows/RFCs#112

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Persistent State Storage for Improved State Syncing #4

Persistent State Storage for Improved State Syncing #4

bkchr commented Jun 3, 2023

liuchengxu commented Oct 26, 2024 •

edited

Loading

Persistent State Storage for Improved State Syncing #4

Persistent State Storage for Improved State Syncing #4

Comments

bkchr commented Jun 3, 2023

liuchengxu commented Oct 26, 2024 • edited Loading

liuchengxu commented Oct 26, 2024 •

edited

Loading