GRANDPA: Warp Sync #1208

rphmeier · 2018-12-04T16:57:33Z

Assuming a

trait AncestryApi<Block> {
    // return the most recent ancestor of `at` where `ancestor.number() % 2^n == 0`.
    // OR as close as possible to it.
    // NOTE: only safe when traversing backwards from a FINALIZED block.
    fn log2ancestor(at: &BlockId, n: Block::Number) -> (Block::Number, Block::Hash);
}

What the log2ancestor proofs would be good for is warp sync.

Provide a commit message with "big" ancestry proof for the most recent finalized block.
walk backwards to the last handoff, doing log2-based walk. Prove as much of the handoff's commit-message ancestry with this walk as possible. For any remaining portions, prove with "big" ancestry proof.
repeat 2 until the genesis or last checkpoint is reached.

rphmeier · 2018-12-04T17:15:45Z

(We will probably want to put some kind of generalized warp-sync in place first)

rphmeier · 2020-08-12T17:55:38Z

Rather than log2-ancestors, the approach of Merkle Mountain Ranges (MMR) could be used instead. A pallet can keep the MMR peaks in the state and the MMR nodes in the offchain DB. Full nodes can then produce ancestry proofs for any block in the prior chain.

With that, the scheme would look like this:

We assume that there is some block that we refer to as our checkpoint where the validator-set and block information are hardcoded into the client. This may be the genesis block.

When we intend to warp-sync GRANDPA, we are presented with a series of validator-set hand-offs in reverse order, starting from the head of the chain and moving backwards to the checkpoint. Our goal is to show that every validator set since the checkpoint has signed off on the following validator set, and then to show that some block we'll call the Goal has been finalized by the most recent validator set.

Checkpoint <- Intermediate Change 1 <- Intermediate Change 2 <- .... <- Last Change <- Goal
                                                                             ^
                                                                             |
                                                                             |
                                         This is where the most recent validator set was activated

In practice, this is done by presenting a series of finality proofs of blocks that triggered validator set changes. Validator set changes in GRANDPA are handled with a signal-execution duality: first, there is a block that signals that a change will happen in N blocks, and then N blocks later (in the same change), the change is applied. On most chains, N is always equal to zero, but it's best not to lean on that assumption for reasons of flexibility.

So proving any particular hand-off from a validator set V to a new set V' is done by proving that V finalized a block B' which triggered a change signaled by its ancestor B, where the change indicated that V' would be the new set.

(signal)
B <--------- B'
                (applied at B' = B + N)

Note that in most chains, N = 0 and thus the signal block B and the activation block B' are one and the same. However, the implementation should support chains with N != 0.

The substrate node already keeps finality proofs (Justifications) of all validator-set change blocks.

We don't need to ensure that all the validator-set changes are actually in the same chain because we lean on the property that each validator-set only finalizes 1 block at each height. The chain of validator-set hand-offs thus justifies the entire chain between Checkpoint and Goal.

And the last piece of the puzzle is making sure that the light client can still fetch ancient blocks even though it's never synced them. The simplest way to do this is to provide the entire blockchain between Checkpoint and Goal, but that is the thing we are trying to avoid with a light sync. So this is where MMRs come in. The state of the latest finalized block commits to a set of peaks of the MMR, which can be used to prove the header-hash of any block in the ancestry. Initially after doing the warp sync, Goal will be the highest finalized block the light client is aware of, but after being active on the network and as the chain progresses, the highest finalized block will move forward. This technique can be used to safely fetch any block up to the last finalized block, which means that the light client can discard finalized blocks as it proceeds.

However, fetching data based on un-finalized MMR peaks is DANGEROUS and should not be done.

tomaka · 2020-08-27T12:10:32Z

When we intend to warp-sync GRANDPA, we are presented with a series of validator-set hand-offs in reverse order, starting from the head of the chain and moving backwards to the checkpoint.

I guess that this should be implemented with a networking protocol that is separate from the traditional sync.
A light client would ask "please show me the list of validator-set hand-offs to <checkpoint>", and the remote answers with that list.

Since the identifier of the chain is part of the protocol name is that negotiated between peers, the <checkpoint> in the network message could, in theory, consist of only a block height.

tomaka · 2020-08-27T12:20:16Z

Pragmatically speaking, if I'm not mistaken, the authorities list changes every 600 blocks for Kusama.
Supposing that the checkpoint is block #0 and that Kusama is at block 4 million, that'd be 6666 blocks to send on the network (and then to verify).
Empirical tests show that a light client in browser syncs at around ~1000 blocks/sec, so the full syncing would only take ~7 seconds (again, assuming checkpoint block #0, the worst case scenario).

6666 block headers is low enough that their total size can reasonably be put in a single networking message, but I'm a bit worried for the future. We might need to ask light clients with a far-enough checkpoint to do the warp-sync in multiple steps by sending multiple requests.
I would therefore suggest that the, when a warp sync request is made, the full node answers with the list starting at <checkpoint> and going up in block numbers.
This way, the full node is free to only send back a partial list, and the light client can then resume the syncing by sending a second requester with a checkpoint corresponding to the last block of the first request.

rphmeier · 2020-08-27T17:01:30Z

I guess that this should be implemented with a networking protocol that is separate from the traditional sync.
A light client would ask "please show me the list of validator-set hand-offs to ", and the remote answers with that list.

Since the identifier of the chain is part of the protocol name is that negotiated between peers, the in the network message could, in theory, consist of only a block height.

Yeah, that's right, in theory. The verifier on the other side would have to check that the checkpoint block they get back actually has that same hash. The block hash is only 32 bytes though so there wouldn't be much point in jumping through hoops to avoid it.

Empirical tests show that a light client in browser syncs at around ~1000 blocks/sec, so the full syncing would only take ~7 seconds (again, assuming checkpoint block #0, the worst case scenario).

Right, although we don't actually import these blocks and check block signatures, which I imagine is the bottleneck for sync. We do need to verify GRANDPA signatures, so that is going to be our bottleneck after networking. Each finality proof will have several hundred signatures.

We might need to ask light clients with a far-enough checkpoint to do the warp-sync in multiple steps by sending multiple requests.

In the long-term future, we can actually roll all of these handoffs up in an inductive zero-knowledge proof that gives us all of this data in only a few hundred bytes. Until then, yes, we might need to do multiple steps. However, if we have a checkpoint that advances once every month or even once every quarter, then it actually won't be a problem. We want to update the checkpoints every so often anyway because of long-range attacks: really historical validator sets might be compromised and don't have anything at stake anymore.

More near-term, we can avoid the space used by the multiple signatures by switching GRANDPA to BLS, where we can aggregate all signatures into a single piece of data the size of a single signature.

cheme · 2020-09-04T13:44:41Z

I have been looking a bit at this issue, fwiu you first need to check finality of the 'Goal' block by rebuilding the 'Goal' authority set from all digest authority set change (validator-set hand-offs) from 'Checkpoint' block (I did some coding around it to get and idea of the size involved).
We are fine with it because we got only one signing for any block, so I guess we rely on grandpa equivocation to assume that.

One thing I am still missing from this picture is:
How do you prove that the light client receive all the validator-set hand-offs from the full node without checking all consecutive headers?
(skipping a validator-set hand-offs in the reply leads to couples '(set_id, round_number)' that cannot be equivocated because they do not exists in the first place).

rphmeier · 2020-09-04T15:15:32Z

How do you prove that the light client receive all the validator-set hand-offs from the full node without checking all consecutive headers?

Why would you need consecutive headers to prove this? Yes, we are assuming that there are no safety violations and that GRANDPA finalizes only one block at each height. So you can just skip all the headers between any two blocks that you actually need to prove finality on.

tomaka · 2020-09-23T10:16:01Z

One little uncertain point is: after we've reached the head of the finalized chain, and we can now switch to regular sync'ing, we need information about the current and next Babe epochs in order to verify incoming blocks.

As far as I understand, the Babe epoch info should most of the time, but not always, be in the block sent during the warp sync. Is that accurate?

If a Babe epoch info is missing, we can always do an exhaustive block request of the blocks before the latest finalized (network block requests support asking for blocks in decreasing order starting from a certain hash), so it's not an unsolvable problem.

cheme · 2020-09-23T11:10:52Z

I had the same interrogation, mainly is babe state reconstructible from storage proof at warp block? And maybe add info in babe pallet storage?
But I do not have reply for it.

Just replying to previous post with the information I collected in between:

Why would you need consecutive headers to prove this?

Current grandpa equivocation seems to only check for double signing for a given validator set and a given round. I think this is currently enough because we replay all history, IIUC (which is not sure).
With warping a majority of validator of any set can send some single signing of (set_id, round) that never actually happen (eg if set 1 ends at round 5 we can send some set 1 signing at round 8 through warp sync and change validator set this way then build different chain) to synch on a different finalized chain, there will be no equivocation for it.
The light client will be unable to restore all history from block Goal to Checkpoint (would require an equivocable block), but will still be thinking that Goal is finalized.
So I suppose we also need to be able to slash validator for any signature that happens on a round that is not correct for a validator set (a new kind of equivocation to handle, it may require writing last 28 days validator set round bound in the state).

expenses · 2020-11-23T13:09:41Z

I'm looking into implementing this on the substrate side now. As far as I'm aware the network request should include a checkpoint block hash and the response should include a list of block headers that contain the validator set hand-offs to that checkpoint. Is that correct? Could we reduce the size of the response by sending finality proofs instead?

cheme · 2020-11-23T14:00:54Z

a list of block headers that contain the validator set hand-offs to that checkpoint. Is that correct?

Not sure, but that is what I started doing in master...cheme:light_grandpa_warp I did test it a bit (I don't remember well but I think I hardcoded a construction and check of such synch proof from block 0 to block n that got executed when starting opening the db on an existing chain (see method 'test_it', not sure what are the deletion in the diff I did link)). It was preliminary work to check if it works well.
(I stopped working on it for the reason mentioned in my previous post, but it is not much work up to this point, so it is probably best to start from a clean state).

Could we reduce the size of the response by sending finality proofs instead?

Not sure but I think for trusting finality proof we would need to sync underlying state before (fwiu we trust justification because there is equivocation possible on it).

expenses · 2020-12-04T13:27:49Z

@cheme how ready is this branch to being merged?

cheme · 2020-12-04T13:30:59Z

It is nothing mergeable, IIRC I just build was building and validating the proof at node start for some hardcoded range, but it is not integrated at all (more to assert all goes well).

tomaka · 2020-12-04T14:11:54Z

Quick networking implementation guide:

Duplicate this file (the link is from an un-merged PR, which will hopefully be merged soon) and adjust it to serve the warp sync requests instead of block requests. The protocol name (line 57) should be adjusted. The request payload (line 87) should probably consist in just a block height to start from.
Add a field to sc_network::config::Params like here.
Also duplicate this block, and pass the thing you create to the Params a couple of lines below.

expenses · 2020-12-07T11:36:04Z

I've created a branch: https://github.com/paritytech/substrate/compare/ashley-grandpa-warp-sync that follows @tomaka's instructions (#1208 (comment)) and merges @cheme's branch (#1208 (comment)). I haven't tested it yet but I believe the basic functionality is there.

expenses · 2020-12-07T11:39:49Z

I've created a branch: https://github.com/paritytech/substrate/compare/ashley-grandpa-warp-sync that follows @tomaka's instructions (#1208 (comment)) and merges @cheme's branch (#1208 (comment)). I haven't tested it yet but I believe the basic functionality is there.

What's the best way to go about testing this?

cheme · 2020-12-07T13:37:40Z

Not really sure about testing (tests in client/finality-grandpa looks interesting (there is some authority set changes), but not sure it is the best approach, the tricky part would be to test with invalid change of authority).
Also, I did realize I mistakenly delete "bin/node/inspect/src/lib.rs" in my branch.

tomaka · 2021-04-07T14:50:05Z

Update: the "server" side has been implemented. See here.

stale · 2021-07-07T19:13:14Z

Hey, is anyone still working on this? Due to the inactivity this issue has been automatically marked as stale. It will be closed if no further activity occurs. Thank you for your contributions.

rphmeier · 2021-07-07T19:36:44Z

Server-side implemented, client not.

rphmeier added the I9-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task. label Dec 4, 2018

rphmeier added this to the As-and-when milestone Dec 4, 2018

rphmeier mentioned this issue Dec 17, 2018

GRANDPA finality proof draft #1268

Merged

andresilva mentioned this issue May 28, 2019

"Warp" synch #2698

Closed

tomaka mentioned this issue Aug 27, 2020

Light Client: Implement checkpoints in chain_spec to speed up syncing #6804

Open

tomusdrw mentioned this issue Nov 2, 2020

BEEFY sync protocol paritytech/grandpa-bridge-gadget#20

Closed

2 tasks

tomaka mentioned this issue Dec 4, 2020

Implement Grandpa warp syncing paritytech/smoldot#270

Closed

expenses mentioned this issue Dec 10, 2020

Grandpa warp sync request-response protocol #7711

Merged

FlorianFranzen mentioned this issue May 19, 2021

host-spec: specify Grandpa warp sync w3f/polkadot-spec#455

Closed

stale bot added the A5-stale Pull request did not receive any updates in a long time. No review needed at this stage. Close it. label Jul 7, 2021

stale bot removed the A5-stale Pull request did not receive any updates in a long time. No review needed at this stage. Close it. label Jul 7, 2021

rphmeier mentioned this issue Jul 9, 2021

Warp sync part I #9227

Merged

ghost closed this as completed in #9227 Aug 2, 2021

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GRANDPA: Warp Sync #1208

GRANDPA: Warp Sync #1208

rphmeier commented Dec 4, 2018 •

edited

Loading

rphmeier commented Dec 4, 2018

rphmeier commented Aug 12, 2020 •

edited

Loading

tomaka commented Aug 27, 2020 •

edited

Loading

tomaka commented Aug 27, 2020 •

edited

Loading

rphmeier commented Aug 27, 2020 •

edited

Loading

cheme commented Sep 4, 2020

rphmeier commented Sep 4, 2020 •

edited

Loading

tomaka commented Sep 23, 2020

cheme commented Sep 23, 2020

expenses commented Nov 23, 2020

cheme commented Nov 23, 2020

expenses commented Dec 4, 2020

cheme commented Dec 4, 2020

tomaka commented Dec 4, 2020

expenses commented Dec 7, 2020 •

edited

Loading

expenses commented Dec 7, 2020

cheme commented Dec 7, 2020

tomaka commented Apr 7, 2021

stale bot commented Jul 7, 2021

rphmeier commented Jul 7, 2021

GRANDPA: Warp Sync #1208

GRANDPA: Warp Sync #1208

Comments

rphmeier commented Dec 4, 2018 • edited Loading

rphmeier commented Dec 4, 2018

rphmeier commented Aug 12, 2020 • edited Loading

tomaka commented Aug 27, 2020 • edited Loading

tomaka commented Aug 27, 2020 • edited Loading

rphmeier commented Aug 27, 2020 • edited Loading

cheme commented Sep 4, 2020

rphmeier commented Sep 4, 2020 • edited Loading

tomaka commented Sep 23, 2020

cheme commented Sep 23, 2020

expenses commented Nov 23, 2020

cheme commented Nov 23, 2020

expenses commented Dec 4, 2020

cheme commented Dec 4, 2020

tomaka commented Dec 4, 2020

expenses commented Dec 7, 2020 • edited Loading

expenses commented Dec 7, 2020

cheme commented Dec 7, 2020

tomaka commented Apr 7, 2021

stale bot commented Jul 7, 2021

rphmeier commented Jul 7, 2021

rphmeier commented Dec 4, 2018 •

edited

Loading

rphmeier commented Aug 12, 2020 •

edited

Loading

tomaka commented Aug 27, 2020 •

edited

Loading

tomaka commented Aug 27, 2020 •

edited

Loading

rphmeier commented Aug 27, 2020 •

edited

Loading

rphmeier commented Sep 4, 2020 •

edited

Loading

expenses commented Dec 7, 2020 •

edited

Loading