-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PeerDAS: Fix major bug in dataColumnSidecarsByRangeRPCHandler
and allow syncing from full nodes.
#14532
Conversation
Before this commit, the node was unable to respond with a data column index higher than the count of stored data columns. For example, if there is 8 data columns stored for a given block, the node was able to respond for data columns indices 1, 3, and 5, but not for 10, 16 or 127. The issue was visible only for full nodes, since super nodes always store 128 data columns.
a8930c6
to
16b679c
Compare
dataColumnSidecarsByRangeRPCHandler
and allow syncing from full nodes.
(Not only from supernodes.)
16b679c
to
9f679e8
Compare
missingColumnsByRoot map[[fieldparams.RootLength]byte]map[uint64]bool, | ||
batchSize uint64, | ||
bwbSlice bwbSlice) error { | ||
lastSlot := bwbs[bwbSlice.end].Block.Block().Slot() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this repeated with endSlot
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in bf8a152.
|
||
func (f *blocksFetcher) fetchBwbSliceFromPeers( | ||
ctx context.Context, | ||
mu *sync.RWMutex, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we passing mutexes as an argument ? this looks like an antipattern
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the other equivalent comment.
@@ -1089,15 +1201,43 @@ func processDataColumn( | |||
// - `missingColumnsByRoot` by removing the fetched data columns. | |||
func (f *blocksFetcher) fetchDataColumnFromPeer( | |||
ctx context.Context, | |||
mu *sync.RWMutex, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, its odd to be passing in mutexes as an argument
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually, we use mutexes as struct field because mutexes are protecting struct fields as well.
Here, mu
is only protecting concurrent access to bwbs
and missingColumnsByRoot
, which are not struct fields, but variables defined inside a function. That's why the corresponding mutex mu
does not need to be defined as a struct field, and so the mutex is passed as an argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to pass in the shared resource rather than just the mutex ? Basically have all these items enclosed in a single logical object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 4b4ba60.
…llow syncing from full nodes. (#14532) * `validateDataColumnsByRange`: `current` ==> `currentSlot`. * `validateRequest`: Extract `remotePeer` variable. * `dataColumnSidecarsByRangeRPCHandler`: Small non functional refactor. * `streamDataColumnBatch`: Fix major bug. Before this commit, the node was unable to respond with a data column index higher than the count of stored data columns. For example, if there is 8 data columns stored for a given block, the node was able to respond for data columns indices 1, 3, and 5, but not for 10, 16 or 127. The issue was visible only for full nodes, since super nodes always store 128 data columns. * Initial sync: Fetch data columns from all peers. (Not only from supernodes.) * Nishant's comment: Fix `lastSlot` and `endSlot` duplication. * Address Nishant's comment.
…llow syncing from full nodes. (#14532) * `validateDataColumnsByRange`: `current` ==> `currentSlot`. * `validateRequest`: Extract `remotePeer` variable. * `dataColumnSidecarsByRangeRPCHandler`: Small non functional refactor. * `streamDataColumnBatch`: Fix major bug. Before this commit, the node was unable to respond with a data column index higher than the count of stored data columns. For example, if there is 8 data columns stored for a given block, the node was able to respond for data columns indices 1, 3, and 5, but not for 10, 16 or 127. The issue was visible only for full nodes, since super nodes always store 128 data columns. * Initial sync: Fetch data columns from all peers. (Not only from supernodes.) * Nishant's comment: Fix `lastSlot` and `endSlot` duplication. * Address Nishant's comment.
…llow syncing from full nodes. (#14532) * `validateDataColumnsByRange`: `current` ==> `currentSlot`. * `validateRequest`: Extract `remotePeer` variable. * `dataColumnSidecarsByRangeRPCHandler`: Small non functional refactor. * `streamDataColumnBatch`: Fix major bug. Before this commit, the node was unable to respond with a data column index higher than the count of stored data columns. For example, if there is 8 data columns stored for a given block, the node was able to respond for data columns indices 1, 3, and 5, but not for 10, 16 or 127. The issue was visible only for full nodes, since super nodes always store 128 data columns. * Initial sync: Fetch data columns from all peers. (Not only from supernodes.) * Nishant's comment: Fix `lastSlot` and `endSlot` duplication. * Address Nishant's comment.
…llow syncing from full nodes. (#14532) * `validateDataColumnsByRange`: `current` ==> `currentSlot`. * `validateRequest`: Extract `remotePeer` variable. * `dataColumnSidecarsByRangeRPCHandler`: Small non functional refactor. * `streamDataColumnBatch`: Fix major bug. Before this commit, the node was unable to respond with a data column index higher than the count of stored data columns. For example, if there is 8 data columns stored for a given block, the node was able to respond for data columns indices 1, 3, and 5, but not for 10, 16 or 127. The issue was visible only for full nodes, since super nodes always store 128 data columns. * Initial sync: Fetch data columns from all peers. (Not only from supernodes.) * Nishant's comment: Fix `lastSlot` and `endSlot` duplication. * Address Nishant's comment.
…llow syncing from full nodes. (#14532) * `validateDataColumnsByRange`: `current` ==> `currentSlot`. * `validateRequest`: Extract `remotePeer` variable. * `dataColumnSidecarsByRangeRPCHandler`: Small non functional refactor. * `streamDataColumnBatch`: Fix major bug. Before this commit, the node was unable to respond with a data column index higher than the count of stored data columns. For example, if there is 8 data columns stored for a given block, the node was able to respond for data columns indices 1, 3, and 5, but not for 10, 16 or 127. The issue was visible only for full nodes, since super nodes always store 128 data columns. * Initial sync: Fetch data columns from all peers. (Not only from supernodes.) * Nishant's comment: Fix `lastSlot` and `endSlot` duplication. * Address Nishant's comment.
…llow syncing from full nodes. (#14532) * `validateDataColumnsByRange`: `current` ==> `currentSlot`. * `validateRequest`: Extract `remotePeer` variable. * `dataColumnSidecarsByRangeRPCHandler`: Small non functional refactor. * `streamDataColumnBatch`: Fix major bug. Before this commit, the node was unable to respond with a data column index higher than the count of stored data columns. For example, if there is 8 data columns stored for a given block, the node was able to respond for data columns indices 1, 3, and 5, but not for 10, 16 or 127. The issue was visible only for full nodes, since super nodes always store 128 data columns. * Initial sync: Fetch data columns from all peers. (Not only from supernodes.) * Nishant's comment: Fix `lastSlot` and `endSlot` duplication. * Address Nishant's comment.
…llow syncing from full nodes. (#14532) * `validateDataColumnsByRange`: `current` ==> `currentSlot`. * `validateRequest`: Extract `remotePeer` variable. * `dataColumnSidecarsByRangeRPCHandler`: Small non functional refactor. * `streamDataColumnBatch`: Fix major bug. Before this commit, the node was unable to respond with a data column index higher than the count of stored data columns. For example, if there is 8 data columns stored for a given block, the node was able to respond for data columns indices 1, 3, and 5, but not for 10, 16 or 127. The issue was visible only for full nodes, since super nodes always store 128 data columns. * Initial sync: Fetch data columns from all peers. (Not only from supernodes.) * Nishant's comment: Fix `lastSlot` and `endSlot` duplication. * Address Nishant's comment.
…llow syncing from full nodes. (#14532) * `validateDataColumnsByRange`: `current` ==> `currentSlot`. * `validateRequest`: Extract `remotePeer` variable. * `dataColumnSidecarsByRangeRPCHandler`: Small non functional refactor. * `streamDataColumnBatch`: Fix major bug. Before this commit, the node was unable to respond with a data column index higher than the count of stored data columns. For example, if there is 8 data columns stored for a given block, the node was able to respond for data columns indices 1, 3, and 5, but not for 10, 16 or 127. The issue was visible only for full nodes, since super nodes always store 128 data columns. * Initial sync: Fetch data columns from all peers. (Not only from supernodes.) * Nishant's comment: Fix `lastSlot` and `endSlot` duplication. * Address Nishant's comment.
…llow syncing from full nodes. (#14532) * `validateDataColumnsByRange`: `current` ==> `currentSlot`. * `validateRequest`: Extract `remotePeer` variable. * `dataColumnSidecarsByRangeRPCHandler`: Small non functional refactor. * `streamDataColumnBatch`: Fix major bug. Before this commit, the node was unable to respond with a data column index higher than the count of stored data columns. For example, if there is 8 data columns stored for a given block, the node was able to respond for data columns indices 1, 3, and 5, but not for 10, 16 or 127. The issue was visible only for full nodes, since super nodes always store 128 data columns. * Initial sync: Fetch data columns from all peers. (Not only from supernodes.) * Nishant's comment: Fix `lastSlot` and `endSlot` duplication. * Address Nishant's comment.
…llow syncing from full nodes. (#14532) * `validateDataColumnsByRange`: `current` ==> `currentSlot`. * `validateRequest`: Extract `remotePeer` variable. * `dataColumnSidecarsByRangeRPCHandler`: Small non functional refactor. * `streamDataColumnBatch`: Fix major bug. Before this commit, the node was unable to respond with a data column index higher than the count of stored data columns. For example, if there is 8 data columns stored for a given block, the node was able to respond for data columns indices 1, 3, and 5, but not for 10, 16 or 127. The issue was visible only for full nodes, since super nodes always store 128 data columns. * Initial sync: Fetch data columns from all peers. (Not only from supernodes.) * Nishant's comment: Fix `lastSlot` and `endSlot` duplication. * Address Nishant's comment.
Please read commit by commit, with commit messages.
This PR brings two major changes:
dataColumnSidecarsByRangeRPCHandler
. (Read the corresponding commit message for details.)When selecting peers to fetch columns from, the node will select admissible peers with the highest corresponding columns, to minimise the number of peers to fetch columns from.
Remaining task on initial sync: