RFC: sync refactor #1073
Replies: 2 comments
-
Some more thoughts, noting them down before I keep re-writing them in differents forms again and again. Let's split our design into two modes, sync and tracking, where the latter is when we are following the tip of the chain. The reason to split these is because I think there are many different options for a sync mode and I don't want sync to constrain the design of tracking. Some potential sync possibilities:
These are all somewhat different, and should not be coupled to the design of when we are tracking -- which should probably be the most optimised part, since this is where we are going to be 99% of the time. Tracking modeThere are several ways to approach this design. Let's first consider what data events we will need to deal with. Note that because we are in tracking mode, we are currently "in sync" i.e. we are within some reasonable number of blocks of the known tip of the chain, both ito L1 and L2. And ideally we are literally at the tip. Data eventsThese are the possible data events as I see them. Maybe in the future there will be more. I am also assuming that the data coming in has been verified somehow, i.e. at this point it is canonical data.
ConcurrencyThe simplest solution here is to process all events sequentially. i.e.
The issue with this is that the repair process includes network IO, and some substantial processing especially if it involves state updates, or many class definitions for example. This means our tracking process will block for a while repair is underway. Probably not good, and difficult to assess the impact because hopefully we always have the happy path. An alternative is to process all these events and processes concurrently. However, this leads to the problem of state consistency. State consistencyIt's essentially the same problem rust solves, except in our case it also involves a database which rust cannot capture semantically. If we have concurrent tasks then we need some synchronisation over updating the pathfinder state. A task has some view of the local state which may become invalidated by another task mutating it. One can devise complicated schemes where each task gets to decide when its invalidated, but I think its probably be better to have a central place which has sole dominion over mutating state. This central driver can also decide when a task has become invalidated and may be cancelled. Task lifecycle
TestabilityI think we can neatly split a task and the controller into different parts. We can test the task by itself -- especially nice if its pure. We can test the state diff verification and also the application separately. And then the logic of the controller over when it spawns tasks i.e. a small state machine of sorts. |
Beta Was this translation helpful? Give feedback.
-
Mostly implemented at this point. |
Beta Was this translation helpful? Give feedback.
-
This is a proposal for rewriting our sync module to be simpler and to allow for different data sources, in particular p2p.
Status Quo
The current sync module is event based. This was a response to keeping all sqlite IO in a single thread / location -- the top level sync loop. This had the desired effect, but unfortunately logic and reponsibilities are now shared between the outer loop receiving the events, and the L1 and L2 threads generating the events. This has gotten worse over time, as the abstraction became.. difficult.
One particular pain point is that the L2 logic has to "query" the database via events with an embedded oneshot channel. Additionally, knowing when it is safe to switch between pending state and other business logic is kept in L2, but must also be acted on in the outer event loop.
Testing is particularly annoying, to the point where the main test has just been commented / ignored out.
There is also no distinction between syncing and not-syncing.
Proposal
Instead of an event based system, I propose just using async and a central logic unit driving it. Effectively, instead of having L1 & L2 be separate threads with an event channel connecting to the outer sync loop, just have L1 and L2 be async functions which sync controls.
Data source abstraction
Create a p2p focussed data source interface. The current gateway source can then be adapted to match this interface, until such a time as we enable p2p in general (and it can still be a fallback).
Something simple, like:
The exact semantics are coupled with the p2p protocol, but this should be a rough match.
The gateway implementation of this API could easily cache its results (since it gets block header + body in one go), and still return from cache first.
Sync function
There are several potential approaches here, and I think just trying it out will lead to a better feeling of what is good. Most of them boil down to a state machine implementation of some kind.
Testing
Ideally we can just test each sub-component in isolation. And using pure functions for handles so we can just trigger them on manual test input, instead of requiring a future to complete, or an event to send, or an API to mock.
Beta Was this translation helpful? Give feedback.
All reactions