PeerDAS: Fix major bug in `dataColumnSidecarsByRangeRPCHandler` and allow syncing from full nodes. #14532

nalepae · 2024-10-14T12:57:08Z

Please read commit by commit, with commit messages.

This PR brings two major changes:

Fix dataColumnSidecarsByRangeRPCHandler. (Read the corresponding commit message for details.)
Allow initial sync from all nodes, not only from super nodes any more.

When selecting peers to fetch columns from, the node will select admissible peers with the highest corresponding columns, to minimise the number of peers to fetch columns from.

Remaining task on initial sync:

Run verifications (among others, KZG verifications) by batch

Before this commit, the node was unable to respond with a data column index higher than the count of stored data columns. For example, if there is 8 data columns stored for a given block, the node was able to respond for data columns indices 1, 3, and 5, but not for 10, 16 or 127. The issue was visible only for full nodes, since super nodes always store 128 data columns.

(Not only from supernodes.)

nisdas · 2024-10-16T10:13:36Z

beacon-chain/sync/initial-sync/blocks_fetcher.go

+	missingColumnsByRoot map[[fieldparams.RootLength]byte]map[uint64]bool,
+	batchSize uint64,
+	bwbSlice bwbSlice) error {
+	lastSlot := bwbs[bwbSlice.end].Block.Block().Slot()


why is this repeated with endSlot ?

Fixed in bf8a152.

nisdas · 2024-10-16T10:15:54Z

beacon-chain/sync/initial-sync/blocks_fetcher.go

+
+func (f *blocksFetcher) fetchBwbSliceFromPeers(
+	ctx context.Context,
+	mu *sync.RWMutex,


why are we passing mutexes as an argument ? this looks like an antipattern

See the other equivalent comment.

nisdas · 2024-10-16T10:22:59Z

beacon-chain/sync/initial-sync/blocks_fetcher.go

@@ -1089,15 +1201,43 @@ func processDataColumn(
 // - `missingColumnsByRoot` by removing the fetched data columns.
 func (f *blocksFetcher) fetchDataColumnFromPeer(
 	ctx context.Context,
+	mu *sync.RWMutex,


Same here, its odd to be passing in mutexes as an argument

Usually, we use mutexes as struct field because mutexes are protecting struct fields as well.
Here, mu is only protecting concurrent access to bwbs and missingColumnsByRoot, which are not struct fields, but variables defined inside a function. That's why the corresponding mutex mu does not need to be defined as a struct field, and so the mutex is passed as an argument.

is it possible to pass in the shared resource rather than just the mutex ? Basically have all these items enclosed in a single logical object

Fixed in 4b4ba60.

…llow syncing from full nodes. (#14532) * `validateDataColumnsByRange`: `current` ==> `currentSlot`. * `validateRequest`: Extract `remotePeer` variable. * `dataColumnSidecarsByRangeRPCHandler`: Small non functional refactor. * `streamDataColumnBatch`: Fix major bug. Before this commit, the node was unable to respond with a data column index higher than the count of stored data columns. For example, if there is 8 data columns stored for a given block, the node was able to respond for data columns indices 1, 3, and 5, but not for 10, 16 or 127. The issue was visible only for full nodes, since super nodes always store 128 data columns. * Initial sync: Fetch data columns from all peers. (Not only from supernodes.) * Nishant's comment: Fix `lastSlot` and `endSlot` duplication. * Address Nishant's comment.

nalepae added the peerDAS label Oct 14, 2024

Base automatically changed from initial-sync-rework to peerDAS October 15, 2024 10:08

nalepae added 4 commits October 15, 2024 17:05

validateDataColumnsByRange: current ==> currentSlot.

6ce798e

validateRequest: Extract remotePeer variable.

a9e4878

dataColumnSidecarsByRangeRPCHandler: Small non functional refactor.

a4539bb

nalepae force-pushed the initial-sync-rework-2 branch 3 times, most recently from a8930c6 to 16b679c Compare October 16, 2024 08:33

nalepae marked this pull request as ready for review October 16, 2024 08:34

nalepae requested a review from a team as a code owner October 16, 2024 08:34

nalepae requested review from terencechain, rkapka and nisdas and removed request for a team October 16, 2024 08:34

nalepae changed the title ~~Initial sync rework 2~~ PeerDAS: Fix major bug in dataColumnSidecarsByRangeRPCHandler and allow syncing from full nodes. Oct 16, 2024

Initial sync: Fetch data columns from all peers.

9f679e8

(Not only from supernodes.)

nalepae force-pushed the initial-sync-rework-2 branch from 16b679c to 9f679e8 Compare October 16, 2024 08:59

nisdas reviewed Oct 16, 2024

View reviewed changes

nalepae added 2 commits October 16, 2024 12:56

Nishant's comment: Fix lastSlot and endSlot duplication.

bf8a152

Address Nishant's comment.

4b4ba60

nisdas approved these changes Oct 16, 2024

View reviewed changes

nalepae merged commit c53f209 into peerDAS Oct 16, 2024
13 of 16 checks passed

nalepae deleted the initial-sync-rework-2 branch October 16, 2024 12:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PeerDAS: Fix major bug in `dataColumnSidecarsByRangeRPCHandler` and allow syncing from full nodes. #14532

PeerDAS: Fix major bug in `dataColumnSidecarsByRangeRPCHandler` and allow syncing from full nodes. #14532

nalepae commented Oct 14, 2024 •

edited

Loading

nisdas Oct 16, 2024

nalepae Oct 16, 2024

nisdas Oct 16, 2024

nalepae Oct 16, 2024

nisdas Oct 16, 2024

nalepae Oct 16, 2024

nisdas Oct 16, 2024

nalepae Oct 16, 2024

PeerDAS: Fix major bug in dataColumnSidecarsByRangeRPCHandler and allow syncing from full nodes. #14532

PeerDAS: Fix major bug in dataColumnSidecarsByRangeRPCHandler and allow syncing from full nodes. #14532

Conversation

nalepae commented Oct 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PeerDAS: Fix major bug in `dataColumnSidecarsByRangeRPCHandler` and allow syncing from full nodes. #14532

PeerDAS: Fix major bug in `dataColumnSidecarsByRangeRPCHandler` and allow syncing from full nodes. #14532

nalepae commented Oct 14, 2024 •

edited

Loading