PeerDAS: Fix initial sync #14208

nalepae · 2024-07-11T14:11:38Z

Please read commit by commit.

This pull requests fixes the initial sync regarding data columns.
This is a first working "quick and dirty" version of the initial sync, mainly destined to be part of peerdas-devnet-3.

The dirty part is:

Data columns are only retrieved from nodes which custody a superset of columns we need. (Very probably: Only super nodes).
All needed data columns are retrieved, while, if more than 64 data columns are needed, we could only retrieve 64 of them then do a reconstruction.
If, let's say:
- Columns 1 and 5 are missing for slot 40
- Column 3 is missing for slot 41
- No columns are missing for slot 42
- Columns 1 and 4 are missing for slot 43

Then we are going to retrieve columns 1, 3, 4, and 5 for slots 40 to 43. Thus:

Columns 2, 3, 4 for slot 40
Columns 1, 2, 4 and 5 for slot 41
Columns 1, 2, 3, 4 and 5 for slot 42
Columns 2, 3 and 5 for slot 43

are retrieved for nothing, which creates useless network trafic.

However, this PR is tested against:

E2E tests
Kurtosis with 2 permanent Teku super nodes, 1 permanent Prysm full node, and 1 lately started Prysm super node.
Kurtosis with 2 permanent Teku super nodes, 1 permanent Prysm full node, and 1 lately started Prysm full node.

nisdas

So in this PR, we would still rely on supernodes to provide all the relevant columns. The part where we can allow normal nodes to provide columns is still a TODO. From what I understand this PR does not change the current status quo ? It does appear that we are mostly reordering methods.

nisdas · 2024-07-15T08:45:09Z

beacon-chain/sync/initial-sync/blocks_fetcher.go

@@ -489,8 +495,8 @@ func (r *blobRange) RequestDataColumns() *p2ppb.DataColumnSidecarsByRangeRequest
 var errBlobVerification = errors.New("peer unable to serve aligned BlobSidecarsByRange and BeaconBlockSidecarsByRange responses")
 var errMissingBlobsForBlockCommitments = errors.Wrap(errBlobVerification, "blobs unavailable for processing block with kzg commitments")

-func verifyAndPopulateBlobs(bwb []blocks2.BlockWithROBlobs, blobs []blocks.ROBlob, req *p2ppb.BlobSidecarsByRangeRequest, bss filesystem.BlobStorageSummarizer) ([]blocks2.BlockWithROBlobs, error) {
-	blobsByRoot := make(map[[32]byte][]blocks.ROBlob)
+func verifyAndPopulateBlobs(bwb []blocks2.BlockWithROBlobs, blobs []blocks2.ROBlob, req *p2ppb.BlobSidecarsByRangeRequest, bss filesystem.BlobStorageSummarizer) ([]blocks2.BlockWithROBlobs, error) {


why not just call it blocks ? instead of blocks2

nisdas · 2024-07-15T09:36:23Z

beacon-chain/sync/initial-sync/blocks_fetcher.go

+
+	// Find the last block for which some data columns to retrieve are not in our store.
+	lastBlockWithMissingColumnsIndex := lastIndex
+	for i := lastIndex; i >= firstBlockWithMissingColumnsIndex; i-- {


I do not understand this loop. the moment it is negatively incremented, the condition would be false as i < firstBlockWithMissingColumnsIndex . Do we meant to only check this once ?

The loop starts at lastBlockWithMissingColumnsIndex and goes backward until:

firstBlockWithMissingColumnsIndex is met, or if

A block which at least one data columns we should custody and which we actually don't custody is met.

nisdas · 2024-07-15T09:38:21Z

beacon-chain/sync/initial-sync/blocks_fetcher.go

+	}
+
+	// Find the first and last block for which some data columns to retrieve are missing in our store.
+	someColumnsAreMisisng, firstIndex, lastIndex := f.blocksWithMissingDataColumnsBoundaries(bwb, firstIndex, lastIndex, localCustodyColumns)


We should combine blocksWithBlobsCommitmentsBoundaries and blocksWithMissingDataColumnsBoundaries into one method, as we are primarily looking for the block range to start requesting columns from .

nalepae · 2024-07-15T10:54:03Z

So in this PR, we would still rely on supernodes to provide all the relevant columns. The part where we can allow normal nodes to provide columns is still a TODO. From what I understand this PR does not change the current status quo ? It does appear that we are mostly reordering methods.

On the peerDAS branch, we can pull data columns from super nodes only.
But also the peerDAS branch does not tolerate any non super node. As soon as we meet a non super node, the initial sync stops. This branch fixes that.

This branch fixes also other bugs, including bugs due to the initial copy/paste from blobs to data columns.

Also, just for example, this branch stops considering the peer which served us the block as a privileged peer for pulling the data columns.

And many others things like this, which was well suited for blobs, but are not for data columns.

…mnsAvailable`.

nisdas · 2024-07-25T05:31:44Z

beacon-chain/sync/initial-sync/blocks_fetcher.go

+	}
+
+	if coreTime.PeerDASIsActive(start) {
+		response.err = f.fetchDataColumnsFromPeers(ctx, response.bwb, peers)


can we follow the same pattern where we return the blocks with blobs / blocks with columns . So instead of returning just the error you return the blocks too

Actually I changed it on purpose.

The reason was, this function (as well as fetchBlobsFromPeer ) mutates the response.bwb argument.

The fact that the function both changes the input response.bwb argument, and returns it is quite misleading for the function user.

The user may think that the argument is not changed, and the mutated structure is returned.
(A little bit like peers = filterPeers(peers, ...) where in this case the peers in argument is not mutated.)

==> If we want to have consistency between functions, I would then advocate to change others functions not to return response.bwb.

Note: I added this in the function documentation:

// This function mutates `bwb` by adding the retrieved data columns.

Fixed the opposite way in 41f2a85.

* `SendDataColumnsByRangeRequest`: Add some new fields in logs. * `BlobStorageSummary`: Implement `HasDataColumnIndex` and `AllDataColumnsAvailable`. * Implement `fetchDataColumnsFromPeers`. * `fetchBlobsFromPeer`: Return only one error.

nalepae force-pushed the peerdas-initial-sync branch 8 times, most recently from 6d8ff10 to 6e0dbe5 Compare July 14, 2024 20:11

nalepae marked this pull request as ready for review July 15, 2024 07:37

nalepae requested a review from a team as a code owner July 15, 2024 07:37

nalepae requested review from prestonvanloon, saolyn and james-prysm and removed request for a team July 15, 2024 07:37

nisdas reviewed Jul 15, 2024

View reviewed changes

nalepae force-pushed the peerDAS branch from 164d743 to 3c9f5ef Compare July 18, 2024 09:40

nalepae force-pushed the peerdas-initial-sync branch from 6e0dbe5 to 859336a Compare July 18, 2024 14:02

nalepae added 2 commits July 24, 2024 15:25

SendDataColumnsByRangeRequest: Add some new fields in logs.

cfb9082

BlobStorageSummary: Implement HasDataColumnIndex and `AllDataColu…

a5b9e52

…mnsAvailable`.

nalepae force-pushed the peerdas-initial-sync branch 2 times, most recently from e9936bc to 156380e Compare July 24, 2024 23:29

Implement fetchDataColumnsFromPeers.

0a76aee

nalepae force-pushed the peerdas-initial-sync branch from 156380e to 0a76aee Compare July 24, 2024 23:36

nalepae added the peerDAS label Jul 25, 2024

nisdas reviewed Jul 25, 2024

View reviewed changes

fetchBlobsFromPeer: Return only one error.

41f2a85

nisdas approved these changes Jul 25, 2024

View reviewed changes

nalepae merged commit ea57ca7 into peerDAS Jul 25, 2024
13 of 16 checks passed

nalepae deleted the peerdas-initial-sync branch July 25, 2024 11:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PeerDAS: Fix initial sync #14208

PeerDAS: Fix initial sync #14208

nalepae commented Jul 11, 2024 •

edited

Loading

nisdas left a comment

nisdas Jul 15, 2024

nisdas Jul 15, 2024

nalepae Jul 17, 2024

nisdas Jul 15, 2024

nalepae commented Jul 15, 2024

nisdas Jul 25, 2024

nalepae Jul 25, 2024

nalepae Jul 25, 2024

PeerDAS: Fix initial sync #14208

PeerDAS: Fix initial sync #14208

Conversation

nalepae commented Jul 11, 2024 • edited Loading

nisdas left a comment

Choose a reason for hiding this comment

nisdas Jul 15, 2024

Choose a reason for hiding this comment

nisdas Jul 15, 2024

Choose a reason for hiding this comment

nalepae Jul 17, 2024

Choose a reason for hiding this comment

nisdas Jul 15, 2024

Choose a reason for hiding this comment

nalepae commented Jul 15, 2024

nisdas Jul 25, 2024

Choose a reason for hiding this comment

nalepae Jul 25, 2024

Choose a reason for hiding this comment

nalepae Jul 25, 2024

Choose a reason for hiding this comment

nalepae commented Jul 11, 2024 •

edited

Loading