Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why request block body for fast sync? #5406

Open
2 tasks done
liuchengxu opened this issue Aug 19, 2024 · 10 comments
Open
2 tasks done

Why request block body for fast sync? #5406

liuchengxu opened this issue Aug 19, 2024 · 10 comments
Labels
I2-bug The node fails to follow expected behavior. I10-unconfirmed Issue might be valid, but it's not yet known.

Comments

@liuchengxu
Copy link
Contributor

Is there an existing issue?

  • I have searched the existing issues

Experiencing problems? Have you tried our Stack Exchange first?

  • This is not a support question.

Description of bug

If I read correctly, the docs https://github.com/paritytech/polkadot-sdk/blob/3fe22d17c4904c5ae016034b6ec2c335464b4165/substrate/client/network/README.md#fast-sync indicates that the fast sync is expected only to download and verify the header chain, and the subsequent state snapshot.

But the inline docs are contradictory, saying Downloading blocks:

/// Download blocks without executing them. Download latest state with proofs.
Fast,
/// Download blocks without executing them. Download latest state without proofs.
FastUnsafe,

and the block body is indeed requested for --sync fast and --sync fast-unsafe.

ChainSyncMode::LightState { storage_chain_mode: false, .. } =>
BlockAttributes::HEADER | BlockAttributes::JUSTIFICATION | BlockAttributes::BODY,
ChainSyncMode::LightState { storage_chain_mode: true, .. } =>
BlockAttributes::HEADER |
BlockAttributes::JUSTIFICATION |
BlockAttributes::INDEXED_BODY,

I'm looking for a sync mode that only downloads the headers and the state snapshot. I initially assumed that the --sync fast/fast-unsafe options would fulfill this requirement. This looks like a bug to me, could you please clarify this confusion?

Steps to reproduce

No response

@liuchengxu liuchengxu added I10-unconfirmed Issue might be valid, but it's not yet known. I2-bug The node fails to follow expected behavior. labels Aug 19, 2024
@liuchengxu
Copy link
Contributor Author

After looking into the original PR of fast sync paritytech/substrate#8884, I found this comment paritytech/substrate#8884 (comment), so it was intentionally to request the body in fast sync. However, this is a mismatch between the implementation and spec IMHO. The existing documentation always mentions downloading the header chain, but the block history is downloaded silently. While it may not be a big problem for the chain without a huge block history, for a project like subcoin, it matters about downloading hundreds of MiB or hundreds of GiB data before downloading the state. We should fix this mismatch in the implementation, what do you think? @arkpar

@arkpar
Copy link
Member

arkpar commented Aug 28, 2024

Indeed, fast sync is supposed to download and verify the header chain first, then the state, then block bodies in the background.

@liuchengxu
Copy link
Contributor Author

Great that we're on the same page! Any insights on the implementation? If I read the code correctly, the gap sync will be started automatically to download the block history once the block gap is detected in client.info().block_gap within the chain sync restart function. How about reusing the gap sync to download the block bodies by updating the block_gap for fast sync? @arkpar

@arkpar
Copy link
Member

arkpar commented Aug 29, 2024

Right, the backgroudn block body download should utilize the same mechanism as the warp sync. The gap in this case is in block bodies, rather than whole block. The Client should have no issue importing a body for an existing block header.
@dmitry-markin can give you further hints, as I have not touched that code for a while.

@dmitry-markin
Copy link
Contributor

dmitry-markin commented Aug 29, 2024

Going through Backend::try_commit_operation() it looks like if block header is imported, it is not considered to be a block gap anymore. So gap sync probably won't run after fast sync because no gap will be detected. Makes sense double checking by not requesting block bodies during fast sync and checking if gap sync was performed.

@liuchengxu
Copy link
Contributor Author

liuchengxu commented Aug 29, 2024

So I guess we need to either change the logic of detecting the block gap within Backend::try_commit_operation() or update the block gap manually after the state sync is finished (and ensure the Client can import the body with an existing header) so that the gap sync can be started as expected. The former seems to be more straightforward.

@dmitry-markin
Copy link
Contributor

Yes, the first approach is better. Worth checking where else the block gap is used and if it won't break anything.

@liuchengxu
Copy link
Contributor Author

We need a formal definition of a block gap before implementation. The concept of a block gap has evolved. Initially, it could only be created during warp sync; now, it can be created by both warp sync and fast sync. Warp sync creates a block gap by first downloading the finality proof and state at a specific height and then proceeding to download the block history. In contrast, fast sync downloads the header chain and state at a specific height before downloading the block history.

Block Gap Creation and Updates

  • Warp Sync
  • Fast Sync
    • Creating a Block Gap
      • A block gap is created when a new header is imported and the following condition is met: parent_header_exists && !body_exists && number == best_num + 1
    • Updating a Block Gap (During Header Chain and Block History Download)
      • Header Chain Download: The block gap is increased when a new header is downloaded:
        • Condition: number == gap.end + 1 && !body_exists
        • Action: gap.end += 1 => gap increased
      • Block History Download: The block gap is decreased when a block in the history is downloaded:
        • Condition number == block_gap.start
        • Action: gap.start += 1 => gap decreased

Block Gap Check

The logic for decreasing a block gap is consistent across both warp sync and fast sync, but fast sync can increase the gap during the header chain download.

Implementation Plan

  1. Refactor BlockGap.

The current BlockGap is represented as Option<(NumberFor<Block>, NumberFor<Block>)>. To improve extensibility, this tuple should be refactored into a struct. Compatibility concerns should be addressed by upgrading the old gap storage to the new structure.

pub enum BlockGapType {
    // Gap created by warp sync.
    MissingHeaderAndBody,
    // Gap created by fast sync.
    MissingBody,
}

pub struct BlockGap<N> {
    start: N,
    end: N,
    gap_type: BlockGapType,
}
  1. Add Block Gap Support for Fast Sync.

The primary changes will occur in try_commit_operation, with adjustments to the ChainSync logic as well.

I've implemented a working prototype locally based on the ideas above, and it appears to be functioning well. I would appreciate it if you could review the changes and provide feedback in case I missed something important. @arkpar @dmitry-markin

@dmitry-markin
Copy link
Contributor

@liuchengxu sure, feel free to publish your local changes as a PR / draft PR.

@liuchengxu
Copy link
Contributor Author

Please have a look at #5540 before I send a new PR @dmitry-markin .

github-merge-queue bot pushed a commit that referenced this issue Sep 4, 2024
There are basically three commits in this PR. Since all these commits
essentially have no logical changes, I packed them into one PR. Review
by per-commit is recommended.
- The first commit avoids unnecessarily updating the block gap storage
when the value remains unchanged, as discovered when I worked on
#5406.
- The second commit is purely about format string style changes but
deletes ~10 lines of code, which slightly helps me look into this file
:P
- The third commit is added to avoid the unnecessary block gap update in
`BlockchainDb`.

---------

Co-authored-by: Davide Galassi <davxy@datawok.net>
liuchengxu added a commit to liuchengxu/polkadot-sdk that referenced this issue Sep 4, 2024
Previously, block gaps could only be created by warp sync, but with upcoming
changes (paritytech#5406), block gaps will also be generated by fast sync, thus
`BlockGapType` is needed.

This refactor converts the existing `(NumberFor<Block>, NumberFor<Block>)`
into a dedicated `BlockGap<NumberFor<Block>>` struct. This change is purely
structural and does not alter existing logic, but it lays the groundwork for
future changes.
github-merge-queue bot pushed a commit that referenced this issue Sep 6, 2024
Previously, block gaps could only be created by warp sync, but block
gaps will also be generated by fast sync once #5406 is fixed. This PR is
part 1 of the detailed implementation plan in
#5406 (comment):
refactor `BlockGap`.

This refactor converts the existing `(NumberFor<Block>,
NumberFor<Block>)` into a dedicated `BlockGap<NumberFor<Block>>` struct.
This change is purely structural and does not alter existing logic, but
lays the groundwork for the follow-up PR.

The compatibility concern caused by the new structure is addressed in
the second commit.

cc @dmitry-markin

---------

Co-authored-by: Bastian Köcher <git@kchr.de>
github-merge-queue bot pushed a commit that referenced this issue Sep 10, 2024
This PR addresses an issue where state sync may fail to start if the
conditions required for its initiation are not met when a finalized
block notification is received. `pending_state_sync_attempt` is
introduced to trigger the state sync later when the conditions are
satisfied.

This issue was spotted when I worked on #5406, specifically,
`queue_blocks` was not empty when the finalized block notification was
received, and then the state sync was stalled. cc @dmitry-markin

---------

Co-authored-by: Dmitry Markin <dmitry@markin.tech>
Co-authored-by: Bastian Köcher <git@kchr.de>
nazar-pc pushed a commit to autonomys/polkadot-sdk that referenced this issue Sep 27, 2024
This PR addresses an issue where state sync may fail to start if the
conditions required for its initiation are not met when a finalized
block notification is received. `pending_state_sync_attempt` is
introduced to trigger the state sync later when the conditions are
satisfied.

This issue was spotted when I worked on paritytech#5406, specifically,
`queue_blocks` was not empty when the finalized block notification was
received, and then the state sync was stalled. cc @dmitry-markin

---------

Co-authored-by: Dmitry Markin <dmitry@markin.tech>
Co-authored-by: Bastian Köcher <git@kchr.de>
(cherry picked from commit 8236718)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I2-bug The node fails to follow expected behavior. I10-unconfirmed Issue might be valid, but it's not yet known.
Projects
None yet
Development

No branches or pull requests

3 participants