Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PeerDAS: Fix major bug in dataColumnSidecarsByRangeRPCHandler and allow syncing from full nodes. #14532

Merged
merged 7 commits into from
Oct 16, 2024

Conversation

nalepae
Copy link
Contributor

@nalepae nalepae commented Oct 14, 2024

Please read commit by commit, with commit messages.

This PR brings two major changes:

  • Fix dataColumnSidecarsByRangeRPCHandler. (Read the corresponding commit message for details.)
  • Allow initial sync from all nodes, not only from super nodes any more.

When selecting peers to fetch columns from, the node will select admissible peers with the highest corresponding columns, to minimise the number of peers to fetch columns from.

Remaining task on initial sync:

  • Run verifications (among others, KZG verifications) by batch

Base automatically changed from initial-sync-rework to peerDAS October 15, 2024 10:08
Before this commit, the node was unable to respond with a data column index higher than the count of stored data columns.
For example, if there is 8 data columns stored for a given block, the node was
able to respond for data columns indices 1, 3, and 5, but not for 10, 16 or 127.

The issue was visible only for full nodes, since super nodes always store 128 data columns.
@nalepae nalepae force-pushed the initial-sync-rework-2 branch 3 times, most recently from a8930c6 to 16b679c Compare October 16, 2024 08:33
@nalepae nalepae marked this pull request as ready for review October 16, 2024 08:34
@nalepae nalepae requested a review from a team as a code owner October 16, 2024 08:34
@nalepae nalepae requested review from terencechain, rkapka and nisdas and removed request for a team October 16, 2024 08:34
@nalepae nalepae changed the title Initial sync rework 2 PeerDAS: Fix major bug in dataColumnSidecarsByRangeRPCHandler and allow syncing from full nodes. Oct 16, 2024
@nalepae nalepae force-pushed the initial-sync-rework-2 branch from 16b679c to 9f679e8 Compare October 16, 2024 08:59
missingColumnsByRoot map[[fieldparams.RootLength]byte]map[uint64]bool,
batchSize uint64,
bwbSlice bwbSlice) error {
lastSlot := bwbs[bwbSlice.end].Block.Block().Slot()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this repeated with endSlot ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in bf8a152.


func (f *blocksFetcher) fetchBwbSliceFromPeers(
ctx context.Context,
mu *sync.RWMutex,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we passing mutexes as an argument ? this looks like an antipattern

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the other equivalent comment.

@@ -1089,15 +1201,43 @@ func processDataColumn(
// - `missingColumnsByRoot` by removing the fetched data columns.
func (f *blocksFetcher) fetchDataColumnFromPeer(
ctx context.Context,
mu *sync.RWMutex,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, its odd to be passing in mutexes as an argument

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually, we use mutexes as struct field because mutexes are protecting struct fields as well.
Here, mu is only protecting concurrent access to bwbs and missingColumnsByRoot, which are not struct fields, but variables defined inside a function. That's why the corresponding mutex mu does not need to be defined as a struct field, and so the mutex is passed as an argument.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to pass in the shared resource rather than just the mutex ? Basically have all these items enclosed in a single logical object

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 4b4ba60.

@nalepae nalepae merged commit c53f209 into peerDAS Oct 16, 2024
13 of 16 checks passed
@nalepae nalepae deleted the initial-sync-rework-2 branch October 16, 2024 12:42
nalepae added a commit that referenced this pull request Oct 23, 2024
…llow syncing from full nodes. (#14532)

* `validateDataColumnsByRange`: `current` ==> `currentSlot`.

* `validateRequest`: Extract `remotePeer` variable.

* `dataColumnSidecarsByRangeRPCHandler`: Small non functional refactor.

* `streamDataColumnBatch`: Fix major bug.

Before this commit, the node was unable to respond with a data column index higher than the count of stored data columns.
For example, if there is 8 data columns stored for a given block, the node was
able to respond for data columns indices 1, 3, and 5, but not for 10, 16 or 127.

The issue was visible only for full nodes, since super nodes always store 128 data columns.

* Initial sync: Fetch data columns from all peers.
(Not only from supernodes.)

* Nishant's comment: Fix `lastSlot` and `endSlot` duplication.

* Address Nishant's comment.
nalepae added a commit that referenced this pull request Oct 23, 2024
…llow syncing from full nodes. (#14532)

* `validateDataColumnsByRange`: `current` ==> `currentSlot`.

* `validateRequest`: Extract `remotePeer` variable.

* `dataColumnSidecarsByRangeRPCHandler`: Small non functional refactor.

* `streamDataColumnBatch`: Fix major bug.

Before this commit, the node was unable to respond with a data column index higher than the count of stored data columns.
For example, if there is 8 data columns stored for a given block, the node was
able to respond for data columns indices 1, 3, and 5, but not for 10, 16 or 127.

The issue was visible only for full nodes, since super nodes always store 128 data columns.

* Initial sync: Fetch data columns from all peers.
(Not only from supernodes.)

* Nishant's comment: Fix `lastSlot` and `endSlot` duplication.

* Address Nishant's comment.
nalepae added a commit that referenced this pull request Oct 23, 2024
…llow syncing from full nodes. (#14532)

* `validateDataColumnsByRange`: `current` ==> `currentSlot`.

* `validateRequest`: Extract `remotePeer` variable.

* `dataColumnSidecarsByRangeRPCHandler`: Small non functional refactor.

* `streamDataColumnBatch`: Fix major bug.

Before this commit, the node was unable to respond with a data column index higher than the count of stored data columns.
For example, if there is 8 data columns stored for a given block, the node was
able to respond for data columns indices 1, 3, and 5, but not for 10, 16 or 127.

The issue was visible only for full nodes, since super nodes always store 128 data columns.

* Initial sync: Fetch data columns from all peers.
(Not only from supernodes.)

* Nishant's comment: Fix `lastSlot` and `endSlot` duplication.

* Address Nishant's comment.
nalepae added a commit that referenced this pull request Oct 24, 2024
…llow syncing from full nodes. (#14532)

* `validateDataColumnsByRange`: `current` ==> `currentSlot`.

* `validateRequest`: Extract `remotePeer` variable.

* `dataColumnSidecarsByRangeRPCHandler`: Small non functional refactor.

* `streamDataColumnBatch`: Fix major bug.

Before this commit, the node was unable to respond with a data column index higher than the count of stored data columns.
For example, if there is 8 data columns stored for a given block, the node was
able to respond for data columns indices 1, 3, and 5, but not for 10, 16 or 127.

The issue was visible only for full nodes, since super nodes always store 128 data columns.

* Initial sync: Fetch data columns from all peers.
(Not only from supernodes.)

* Nishant's comment: Fix `lastSlot` and `endSlot` duplication.

* Address Nishant's comment.
nalepae added a commit that referenced this pull request Nov 20, 2024
…llow syncing from full nodes. (#14532)

* `validateDataColumnsByRange`: `current` ==> `currentSlot`.

* `validateRequest`: Extract `remotePeer` variable.

* `dataColumnSidecarsByRangeRPCHandler`: Small non functional refactor.

* `streamDataColumnBatch`: Fix major bug.

Before this commit, the node was unable to respond with a data column index higher than the count of stored data columns.
For example, if there is 8 data columns stored for a given block, the node was
able to respond for data columns indices 1, 3, and 5, but not for 10, 16 or 127.

The issue was visible only for full nodes, since super nodes always store 128 data columns.

* Initial sync: Fetch data columns from all peers.
(Not only from supernodes.)

* Nishant's comment: Fix `lastSlot` and `endSlot` duplication.

* Address Nishant's comment.
nalepae added a commit that referenced this pull request Nov 22, 2024
…llow syncing from full nodes. (#14532)

* `validateDataColumnsByRange`: `current` ==> `currentSlot`.

* `validateRequest`: Extract `remotePeer` variable.

* `dataColumnSidecarsByRangeRPCHandler`: Small non functional refactor.

* `streamDataColumnBatch`: Fix major bug.

Before this commit, the node was unable to respond with a data column index higher than the count of stored data columns.
For example, if there is 8 data columns stored for a given block, the node was
able to respond for data columns indices 1, 3, and 5, but not for 10, 16 or 127.

The issue was visible only for full nodes, since super nodes always store 128 data columns.

* Initial sync: Fetch data columns from all peers.
(Not only from supernodes.)

* Nishant's comment: Fix `lastSlot` and `endSlot` duplication.

* Address Nishant's comment.
nalepae added a commit that referenced this pull request Nov 22, 2024
…llow syncing from full nodes. (#14532)

* `validateDataColumnsByRange`: `current` ==> `currentSlot`.

* `validateRequest`: Extract `remotePeer` variable.

* `dataColumnSidecarsByRangeRPCHandler`: Small non functional refactor.

* `streamDataColumnBatch`: Fix major bug.

Before this commit, the node was unable to respond with a data column index higher than the count of stored data columns.
For example, if there is 8 data columns stored for a given block, the node was
able to respond for data columns indices 1, 3, and 5, but not for 10, 16 or 127.

The issue was visible only for full nodes, since super nodes always store 128 data columns.

* Initial sync: Fetch data columns from all peers.
(Not only from supernodes.)

* Nishant's comment: Fix `lastSlot` and `endSlot` duplication.

* Address Nishant's comment.
nalepae added a commit that referenced this pull request Nov 25, 2024
…llow syncing from full nodes. (#14532)

* `validateDataColumnsByRange`: `current` ==> `currentSlot`.

* `validateRequest`: Extract `remotePeer` variable.

* `dataColumnSidecarsByRangeRPCHandler`: Small non functional refactor.

* `streamDataColumnBatch`: Fix major bug.

Before this commit, the node was unable to respond with a data column index higher than the count of stored data columns.
For example, if there is 8 data columns stored for a given block, the node was
able to respond for data columns indices 1, 3, and 5, but not for 10, 16 or 127.

The issue was visible only for full nodes, since super nodes always store 128 data columns.

* Initial sync: Fetch data columns from all peers.
(Not only from supernodes.)

* Nishant's comment: Fix `lastSlot` and `endSlot` duplication.

* Address Nishant's comment.
nalepae added a commit that referenced this pull request Nov 25, 2024
…llow syncing from full nodes. (#14532)

* `validateDataColumnsByRange`: `current` ==> `currentSlot`.

* `validateRequest`: Extract `remotePeer` variable.

* `dataColumnSidecarsByRangeRPCHandler`: Small non functional refactor.

* `streamDataColumnBatch`: Fix major bug.

Before this commit, the node was unable to respond with a data column index higher than the count of stored data columns.
For example, if there is 8 data columns stored for a given block, the node was
able to respond for data columns indices 1, 3, and 5, but not for 10, 16 or 127.

The issue was visible only for full nodes, since super nodes always store 128 data columns.

* Initial sync: Fetch data columns from all peers.
(Not only from supernodes.)

* Nishant's comment: Fix `lastSlot` and `endSlot` duplication.

* Address Nishant's comment.
nalepae added a commit that referenced this pull request Nov 27, 2024
…llow syncing from full nodes. (#14532)

* `validateDataColumnsByRange`: `current` ==> `currentSlot`.

* `validateRequest`: Extract `remotePeer` variable.

* `dataColumnSidecarsByRangeRPCHandler`: Small non functional refactor.

* `streamDataColumnBatch`: Fix major bug.

Before this commit, the node was unable to respond with a data column index higher than the count of stored data columns.
For example, if there is 8 data columns stored for a given block, the node was
able to respond for data columns indices 1, 3, and 5, but not for 10, 16 or 127.

The issue was visible only for full nodes, since super nodes always store 128 data columns.

* Initial sync: Fetch data columns from all peers.
(Not only from supernodes.)

* Nishant's comment: Fix `lastSlot` and `endSlot` duplication.

* Address Nishant's comment.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants