Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

overseer: signal completion of major sync #6694

Closed
ordian opened this issue Feb 9, 2023 · 3 comments · Fixed by #6727
Closed

overseer: signal completion of major sync #6694

ordian opened this issue Feb 9, 2023 · 3 comments · Fixed by #6727
Assignees
Labels
T0-node This PR/Issue is related to the topic “node”. T4-parachains_engineering This PR/Issue is related to Parachains performance, stability, maintenance.

Comments

@ordian
Copy link
Member

ordian commented Feb 9, 2023

The idea is to use a sync oracle in the overseer similar to #6691 and either add a separate InitialSyncComplete signal here, or add a bool is_major_syncing to ActiveLeaves and BlockFinalized signals.

Approval-voting deliberately made a choice to only track initial major sync. That is if a node is synced, but then goes offline or is slow somehow and major syncs again, it will not detect/react to that. To be consistent with this logic, a separate signal approach would be preferred.

Another consideration: some subsystems are awaiting on the first activated leaf. If for some reason block production stalls and a node restarts, it will not import new blocks. To address that concern, we send active leaves signal on startup based on the current db state here. Consider merging InitialSyncComplete and the initial leaves into one signal to simplify logic, i.e. adding InitialSyncComplete(ActiveLeaves), where the active leaves would be fetched from either the db or the latest imported block depending on whether we major sync on startup.

@ordian ordian added T4-parachains_engineering This PR/Issue is related to Parachains performance, stability, maintenance. J7-mentor An easy task where a mentor is available. Please indicate in the issue who the mentor could be. T0-node This PR/Issue is related to the topic “node”. labels Feb 9, 2023
@bkchr
Copy link
Member

bkchr commented Feb 9, 2023

#6689 this pr already goes a little bit into this direction.

@ordian ordian removed the J7-mentor An easy task where a mentor is available. Please indicate in the issue who the mentor could be. label Feb 9, 2023
@tdimitrov tdimitrov self-assigned this Feb 9, 2023
@tdimitrov
Copy link
Contributor

I stumbled upon this problem while investigating paritytech/polkadot-sdk#793. I think most of the 'state discarded' errors mentioned there are caused by doing work during sync.

#6689 was supposed to be my secret 'test on versi' PR but it's gaining popularity now :) For the moment I just try to delay ActiveLeavesUpdate there and it seems not to cause any side effects (for now). But this is not the best solution.

For now I think the best approach might be not to change any events logic. Meaning we'll broadcast initial ActiveLeavesUpdate with the heads from db and we'll broadcast BlockFinalized events during sync. But once the sync completes - we'll broadcast InitialSyncComplete. It will serve as an indication for the subsystems that they can start working with the runtime.

I see two benefits for this:

  1. There is a clear indication when the system is in sync and runtime calls will work as expected.
  2. Subsystems receive data in advance and can do work in advance, if they decide to and it is safe to (of course this is up to the developer).

I'm not completely rejecting the other approaches though. As @ordian said we can also:

  1. Add flag to ActiveLeavesUpdate, which has got all the benefits listed above (and no drawbacks).
  2. Don't broadcast any signals until initial sync is complete.

@tdimitrov
Copy link
Contributor

#6689 is ready for a first iteration of reviews.

What's done in nutshell:

  • No ActiveLeaves and BlockFinalized events are generated until full initial sync is complete.
  • No new events or flags are generated.

@ordian requested not to send any BlockFinalized events during sync so in this case I think there is no benefit in introducing new events.

What do you think about this approach?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
T0-node This PR/Issue is related to the topic “node”. T4-parachains_engineering This PR/Issue is related to Parachains performance, stability, maintenance.
Projects
None yet
3 participants