Fix pending block/blob zero peer edge case #13625
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What type of PR is this?
Bug fix
What does this PR do? Why is it needed?
There appears to be an edge case where the pending block queue can send a block to ReceiveBlock without requesting blobs first. The pending block broadcast code tries to fetch any necessary blobs before calling ReceiveBlock and broadcasting, but will silently skip the blob fetch if there are no connected peers.
I believe this could cause a deadlock when coupled with a Resync.I no longer think there's a deadlock, but it does generally cause the syncing process to get messy and gets the pending block wedged in da check for at least 3 slots before Resync can do its job.This PR also removes blob fetching from the step where the batch of pending blocks is retrieved, deferring that work until broadcast time. This is to simplify the flow, whereas today we could have blobs requested in multiple places with unclear timing consquences due to asyncrony in adding fetched blocks to the queue before blobs are retrieved, and pending queue task spawning new goroutines on a timer. It also allows the blob request to be balanced to a different peer and prevents requesting excess amounts of blobs when processing the batch of multiple blocks.Which issues(s) does this PR fix?
unclear if this will fix the issue so I won't mark it as fixed for now.