This repository has been archived by the owner on Aug 2, 2022. It is now read-only.
net_plugin bugfix for privacy test corner case #10349
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Change Description
There is a flaky bug (appears in at least 30-50% of runs) in net_plugin that reproduces in my newly added test for feature privacy. For now I have workaround node restart to avoid it.
What happens?
topology:
[Prod1, Prod2]
->Node1
->Node2
on some point via security group removal we cause
Node1
(and henceNode2
) to stop receive any blocks. After a while we are adding node1 back and it starts to receive blocks again.it getting new blocks and sends those to
Node2
.Shortly it gets unlink-able block exception because of gap, sends handshake, gets notice and enters lib catchup state [sync 1 (head < lib received)].
Once in sync it sends handshake to everyone including
Node2
Node2
enters head catchup state [sync 3 (head < head received)] as head/lib sent by node1 was smaller than those sent by prod1/prod2Then it receives one of most recent blocks (probably queued earlier), gets unlinkable exception, sends handshake and after back and forth notice exchange receives lib notice. it responses with
sync_request_message
to it without breaking current head catchup. Issue happening here because of it is doing this while still in head catchup state.why?
Node1
gettingsync_request_message
and overwritespeer_requested
structure that contains range of blocks to be send.Node2
saved in its state that it wants HEAD and once it will get it it will send handshake.However because of this interrupted LIB catchup
Node1
now thinks thatNode2
requires different range and since LIB < HEAD, it doesn’t send requested HEAD block.So
Node2
is waiting for few more blocks to send handshakeNode1 thinks it sent everything and waits for handshake before moving
Node2
back to in sync to continue to send fresh blocks.FIX:
Do not overwrite peer_requested structure if it contains valid range, instead extend it to have maximum end_block (either existing or newly requested). This looks easiest fix not to change protocol. Discussed with @brianjohnson5972 and @heifner
Change Type
Select ONE:
Testing Changes
Select ANY that apply:
Consensus Changes
API Changes
Documentation Additions