Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PeerDAS: Fix initial sync #14208

Merged
merged 4 commits into from
Jul 25, 2024
Merged

PeerDAS: Fix initial sync #14208

merged 4 commits into from
Jul 25, 2024

Conversation

nalepae
Copy link
Contributor

@nalepae nalepae commented Jul 11, 2024

Please read commit by commit.

This pull requests fixes the initial sync regarding data columns.
This is a first working "quick and dirty" version of the initial sync, mainly destined to be part of peerdas-devnet-3.

The dirty part is:

  • Data columns are only retrieved from nodes which custody a superset of columns we need. (Very probably: Only super nodes).
  • All needed data columns are retrieved, while, if more than 64 data columns are needed, we could only retrieve 64 of them then do a reconstruction.
  • If, let's say:
    • Columns 1 and 5 are missing for slot 40
    • Column 3 is missing for slot 41
    • No columns are missing for slot 42
    • Columns 1 and 4 are missing for slot 43

Then we are going to retrieve columns 1, 3, 4, and 5 for slots 40 to 43. Thus:

  • Columns 2, 3, 4 for slot 40
  • Columns 1, 2, 4 and 5 for slot 41
  • Columns 1, 2, 3, 4 and 5 for slot 42
  • Columns 2, 3 and 5 for slot 43

are retrieved for nothing, which creates useless network trafic.

However, this PR is tested against:

  • E2E tests
  • Kurtosis with 2 permanent Teku super nodes, 1 permanent Prysm full node, and 1 lately started Prysm super node.
  • Kurtosis with 2 permanent Teku super nodes, 1 permanent Prysm full node, and 1 lately started Prysm full node.

@nalepae nalepae force-pushed the peerdas-initial-sync branch 8 times, most recently from 6d8ff10 to 6e0dbe5 Compare July 14, 2024 20:11
@nalepae nalepae marked this pull request as ready for review July 15, 2024 07:37
@nalepae nalepae requested a review from a team as a code owner July 15, 2024 07:37
@nalepae nalepae requested review from prestonvanloon, saolyn and james-prysm and removed request for a team July 15, 2024 07:37
Copy link
Member

@nisdas nisdas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in this PR, we would still rely on supernodes to provide all the relevant columns. The part where we can allow normal nodes to provide columns is still a TODO. From what I understand this PR does not change the current status quo ? It does appear that we are mostly reordering methods.

@@ -489,8 +495,8 @@ func (r *blobRange) RequestDataColumns() *p2ppb.DataColumnSidecarsByRangeRequest
var errBlobVerification = errors.New("peer unable to serve aligned BlobSidecarsByRange and BeaconBlockSidecarsByRange responses")
var errMissingBlobsForBlockCommitments = errors.Wrap(errBlobVerification, "blobs unavailable for processing block with kzg commitments")

func verifyAndPopulateBlobs(bwb []blocks2.BlockWithROBlobs, blobs []blocks.ROBlob, req *p2ppb.BlobSidecarsByRangeRequest, bss filesystem.BlobStorageSummarizer) ([]blocks2.BlockWithROBlobs, error) {
blobsByRoot := make(map[[32]byte][]blocks.ROBlob)
func verifyAndPopulateBlobs(bwb []blocks2.BlockWithROBlobs, blobs []blocks2.ROBlob, req *p2ppb.BlobSidecarsByRangeRequest, bss filesystem.BlobStorageSummarizer) ([]blocks2.BlockWithROBlobs, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just call it blocks ? instead of blocks2


// Find the last block for which some data columns to retrieve are not in our store.
lastBlockWithMissingColumnsIndex := lastIndex
for i := lastIndex; i >= firstBlockWithMissingColumnsIndex; i-- {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand this loop. the moment it is negatively incremented, the condition would be false as i < firstBlockWithMissingColumnsIndex . Do we meant to only check this once ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loop starts at lastBlockWithMissingColumnsIndex and goes backward until:

  • firstBlockWithMissingColumnsIndex is met, or if
  • A block which at least one data columns we should custody and which we actually don't custody is met.

}

// Find the first and last block for which some data columns to retrieve are missing in our store.
someColumnsAreMisisng, firstIndex, lastIndex := f.blocksWithMissingDataColumnsBoundaries(bwb, firstIndex, lastIndex, localCustodyColumns)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should combine blocksWithBlobsCommitmentsBoundaries and blocksWithMissingDataColumnsBoundaries into one method, as we are primarily looking for the block range to start requesting columns from .

@nalepae
Copy link
Contributor Author

nalepae commented Jul 15, 2024

So in this PR, we would still rely on supernodes to provide all the relevant columns. The part where we can allow normal nodes to provide columns is still a TODO. From what I understand this PR does not change the current status quo ? It does appear that we are mostly reordering methods.

On the peerDAS branch, we can pull data columns from super nodes only.
But also the peerDAS branch does not tolerate any non super node. As soon as we meet a non super node, the initial sync stops. This branch fixes that.

This branch fixes also other bugs, including bugs due to the initial copy/paste from blobs to data columns.

Also, just for example, this branch stops considering the peer which served us the block as a privileged peer for pulling the data columns.

And many others things like this, which was well suited for blobs, but are not for data columns.

@nalepae nalepae force-pushed the peerdas-initial-sync branch 2 times, most recently from e9936bc to 156380e Compare July 24, 2024 23:29
@nalepae nalepae force-pushed the peerdas-initial-sync branch from 156380e to 0a76aee Compare July 24, 2024 23:36
}

if coreTime.PeerDASIsActive(start) {
response.err = f.fetchDataColumnsFromPeers(ctx, response.bwb, peers)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we follow the same pattern where we return the blocks with blobs / blocks with columns . So instead of returning just the error you return the blocks too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I changed it on purpose.

The reason was, this function (as well as fetchBlobsFromPeer ) mutates the response.bwb argument.

The fact that the function both changes the input response.bwb argument, and returns it is quite misleading for the function user.

The user may think that the argument is not changed, and the mutated structure is returned.
(A little bit like peers = filterPeers(peers, ...) where in this case the peers in argument is not mutated.)

==> If we want to have consistency between functions, I would then advocate to change others functions not to return response.bwb.

Note: I added this in the function documentation:

// This function mutates `bwb` by adding the retrieved data columns.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the opposite way in 41f2a85.

@nalepae nalepae merged commit ea57ca7 into peerDAS Jul 25, 2024
13 of 16 checks passed
@nalepae nalepae deleted the peerdas-initial-sync branch July 25, 2024 11:24
nalepae added a commit that referenced this pull request Jul 29, 2024
* `SendDataColumnsByRangeRequest`: Add some new fields in logs.

* `BlobStorageSummary`: Implement `HasDataColumnIndex` and `AllDataColumnsAvailable`.

* Implement `fetchDataColumnsFromPeers`.

* `fetchBlobsFromPeer`: Return only one error.
nisdas pushed a commit that referenced this pull request Aug 14, 2024
* `SendDataColumnsByRangeRequest`: Add some new fields in logs.

* `BlobStorageSummary`: Implement `HasDataColumnIndex` and `AllDataColumnsAvailable`.

* Implement `fetchDataColumnsFromPeers`.

* `fetchBlobsFromPeer`: Return only one error.
nisdas pushed a commit that referenced this pull request Aug 15, 2024
* `SendDataColumnsByRangeRequest`: Add some new fields in logs.

* `BlobStorageSummary`: Implement `HasDataColumnIndex` and `AllDataColumnsAvailable`.

* Implement `fetchDataColumnsFromPeers`.

* `fetchBlobsFromPeer`: Return only one error.
nisdas pushed a commit that referenced this pull request Aug 20, 2024
* `SendDataColumnsByRangeRequest`: Add some new fields in logs.

* `BlobStorageSummary`: Implement `HasDataColumnIndex` and `AllDataColumnsAvailable`.

* Implement `fetchDataColumnsFromPeers`.

* `fetchBlobsFromPeer`: Return only one error.
nalepae added a commit that referenced this pull request Aug 27, 2024
* `SendDataColumnsByRangeRequest`: Add some new fields in logs.

* `BlobStorageSummary`: Implement `HasDataColumnIndex` and `AllDataColumnsAvailable`.

* Implement `fetchDataColumnsFromPeers`.

* `fetchBlobsFromPeer`: Return only one error.
nalepae added a commit that referenced this pull request Aug 28, 2024
* `SendDataColumnsByRangeRequest`: Add some new fields in logs.

* `BlobStorageSummary`: Implement `HasDataColumnIndex` and `AllDataColumnsAvailable`.

* Implement `fetchDataColumnsFromPeers`.

* `fetchBlobsFromPeer`: Return only one error.
nalepae added a commit that referenced this pull request Aug 28, 2024
* `SendDataColumnsByRangeRequest`: Add some new fields in logs.

* `BlobStorageSummary`: Implement `HasDataColumnIndex` and `AllDataColumnsAvailable`.

* Implement `fetchDataColumnsFromPeers`.

* `fetchBlobsFromPeer`: Return only one error.
nalepae added a commit that referenced this pull request Oct 7, 2024
* `SendDataColumnsByRangeRequest`: Add some new fields in logs.

* `BlobStorageSummary`: Implement `HasDataColumnIndex` and `AllDataColumnsAvailable`.

* Implement `fetchDataColumnsFromPeers`.

* `fetchBlobsFromPeer`: Return only one error.
nalepae added a commit that referenced this pull request Oct 7, 2024
* `SendDataColumnsByRangeRequest`: Add some new fields in logs.

* `BlobStorageSummary`: Implement `HasDataColumnIndex` and `AllDataColumnsAvailable`.

* Implement `fetchDataColumnsFromPeers`.

* `fetchBlobsFromPeer`: Return only one error.
nalepae added a commit that referenced this pull request Oct 7, 2024
* `SendDataColumnsByRangeRequest`: Add some new fields in logs.

* `BlobStorageSummary`: Implement `HasDataColumnIndex` and `AllDataColumnsAvailable`.

* Implement `fetchDataColumnsFromPeers`.

* `fetchBlobsFromPeer`: Return only one error.
nalepae added a commit that referenced this pull request Oct 8, 2024
* `SendDataColumnsByRangeRequest`: Add some new fields in logs.

* `BlobStorageSummary`: Implement `HasDataColumnIndex` and `AllDataColumnsAvailable`.

* Implement `fetchDataColumnsFromPeers`.

* `fetchBlobsFromPeer`: Return only one error.
nalepae added a commit that referenced this pull request Oct 23, 2024
* `SendDataColumnsByRangeRequest`: Add some new fields in logs.

* `BlobStorageSummary`: Implement `HasDataColumnIndex` and `AllDataColumnsAvailable`.

* Implement `fetchDataColumnsFromPeers`.

* `fetchBlobsFromPeer`: Return only one error.
nalepae added a commit that referenced this pull request Oct 23, 2024
* `SendDataColumnsByRangeRequest`: Add some new fields in logs.

* `BlobStorageSummary`: Implement `HasDataColumnIndex` and `AllDataColumnsAvailable`.

* Implement `fetchDataColumnsFromPeers`.

* `fetchBlobsFromPeer`: Return only one error.
nalepae added a commit that referenced this pull request Oct 24, 2024
* `SendDataColumnsByRangeRequest`: Add some new fields in logs.

* `BlobStorageSummary`: Implement `HasDataColumnIndex` and `AllDataColumnsAvailable`.

* Implement `fetchDataColumnsFromPeers`.

* `fetchBlobsFromPeer`: Return only one error.
nalepae added a commit that referenced this pull request Nov 20, 2024
* `SendDataColumnsByRangeRequest`: Add some new fields in logs.

* `BlobStorageSummary`: Implement `HasDataColumnIndex` and `AllDataColumnsAvailable`.

* Implement `fetchDataColumnsFromPeers`.

* `fetchBlobsFromPeer`: Return only one error.
nalepae added a commit that referenced this pull request Nov 22, 2024
* `SendDataColumnsByRangeRequest`: Add some new fields in logs.

* `BlobStorageSummary`: Implement `HasDataColumnIndex` and `AllDataColumnsAvailable`.

* Implement `fetchDataColumnsFromPeers`.

* `fetchBlobsFromPeer`: Return only one error.
nalepae added a commit that referenced this pull request Nov 22, 2024
* `SendDataColumnsByRangeRequest`: Add some new fields in logs.

* `BlobStorageSummary`: Implement `HasDataColumnIndex` and `AllDataColumnsAvailable`.

* Implement `fetchDataColumnsFromPeers`.

* `fetchBlobsFromPeer`: Return only one error.
nalepae added a commit that referenced this pull request Nov 25, 2024
* `SendDataColumnsByRangeRequest`: Add some new fields in logs.

* `BlobStorageSummary`: Implement `HasDataColumnIndex` and `AllDataColumnsAvailable`.

* Implement `fetchDataColumnsFromPeers`.

* `fetchBlobsFromPeer`: Return only one error.
nalepae added a commit that referenced this pull request Nov 25, 2024
* `SendDataColumnsByRangeRequest`: Add some new fields in logs.

* `BlobStorageSummary`: Implement `HasDataColumnIndex` and `AllDataColumnsAvailable`.

* Implement `fetchDataColumnsFromPeers`.

* `fetchBlobsFromPeer`: Return only one error.
nalepae added a commit that referenced this pull request Nov 27, 2024
* `SendDataColumnsByRangeRequest`: Add some new fields in logs.

* `BlobStorageSummary`: Implement `HasDataColumnIndex` and `AllDataColumnsAvailable`.

* Implement `fetchDataColumnsFromPeers`.

* `fetchBlobsFromPeer`: Return only one error.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants