-
Notifications
You must be signed in to change notification settings - Fork 233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
eip4844 gossip #4444
eip4844 gossip #4444
Conversation
a5e806a
to
373fcad
Compare
@tersec updated per our offline discussion: nim-kzg dependency removed for now, and fixed URLs to point to alpha.2. |
373fcad
to
7ea35b6
Compare
return blockRes | ||
|
||
let sidecarRes = validateBeaconBlockAndBlobsSidecar(signedBlockAndBlobsSidecar) | ||
if sidecarRes.isOk(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
codewise, it makes more sense to use isErr
here and return early so as to keep the same flow for all error returns and have the success condition at the end..
also, I wonder if we should structure the code more towards the blob being decoupled from each other - not in this PR perhaps, but probably we want to run blob/sidecar validation in parallel (in a decoupled design, we'd need a quarantine / holding area for unmatched blob/block pairs etc but it would be interesting to explore, code-wise, what the impact is)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
codewise, it makes more sense to use isErr here and return early so as to keep the same flow for all error returns and have the success condition at the end..
fully agreed, done in c03e57a
but probably we want to run blob/sidecar validation in parallel
Is the motivation to limit validation latency? I guess I don't have much intuition at this point of how much extra sequential processing time blob validation will take, but if the additional latency is notable then yes it makes sense.
Or maybe this is this future-proofing for a world where gossip ends up being reworked to have separate blocks and blobs topics? I agree with that benefit (ofc depends on if/when that gossip change would be made).
beacon_chain/nimbus_beacon_node.nim
Outdated
template installBeaconBlocksValidator(digest: auto, phase: auto) = | ||
node.network.addValidator( | ||
getBeaconBlocksTopic(digest), | ||
proc (signedBlock: phase.SignedBeaconBlock): ValidationResult = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this phase.SignedBeaconBlock
method works reliably, that'd be useful more broadly.
The current codebase doesn't do this, I'm not sure if it's documented, and Nim does some strange things with it, though:
let x = 3
template f(y: auto) = echo y.x # Builds, and
f(system) # outputs "3"; same with f(y: untyped)
# template g(y: untyped) = echo y.3 # Error: identifier expected, but got '3'
# echo system.3 # Error: identifier expected, but got '3'
#echo system.x # Error: undeclared identifier: 'x'
Also, Nim 1.6 still doesn't properly typecheck these situations -- nim-lang/Nim#1027 was only fixed a couple months ago in Nim devel
. I've asked for a backport of its fix nim-lang/Nim#20631 to 1.6, since Nimbus uses this pattern extensively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One approach used is to pass the entire type (phase0.SignedBeaconBlock
, altair.SignedBeaconBlock
, et cetera) as a type or generics parameter to a template or function. Repeats itself slightly, but the semantics are better-defined -- those things are in themselves types, while it's not clear at least to me what Nim thinks is happening here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm partly concerned because the local testnets in CI keep failing (e.g., https://ci.status.im/blue/organizations/jenkins/nimbus-eth2%2Fplatforms%2Flinux%2Fx86_64/detail/PR-4444/7/pipeline https://ci.status.im/blue/organizations/jenkins/nimbus-eth2%2Fplatforms%2Fmacos%2Faarch64/detail/PR-4444/7/pipeline https://ci.status.im/job/nimbus-eth2/job/platforms/job/macos/job/x86_64/job/PR-4444/7/display/redirect).
Block receive gossip just stops working, regardless of fork or slot, entirely with this PR.
So, blocks get either sent by a node, which sends them through the block processing pipeline locally or sent and ... apparently not really processed properly. This works for a while, because nodes manage to request backfill blocks in time via req/resp, e.g.:
{"lvl":"DBG","ts":"2023-01-03 21:56:07.723+00:00","msg":"Requesting detected missing blocks","topics":"beacnde","blocks":"[f1db17e9]"}
{"lvl":"DBG","ts":"2023-01-03 21:56:07.723+00:00","msg":"Requesting blocks by root","topics":"requman","peer":"16U*ZvWiek","blocks":"[f1db17e9]","peer_score":1000}
{"lvl":"DBG","ts":"2023-01-03 21:56:07.733+00:00","msg":"Block resolved","blockRoot":"f1db17e9","blck":{"slot":20,"proposer_index":84,"parent_root":"264c4bde","state_root":"07584766","eth1data":{"deposit_root":"1681181313acd0d68b1e8d8d275665fe521a9ce5724858c3121cd2f2d84f3e33","deposit_count":1024,"block_hash":"4242424242424242424242424242424242424242424242424242424242424242"},"graffiti":"Nimbus/v22.12.0-c03e57-stateofus","proposer_slashings_len":0,"attester_slashings_len":0,"attestations_len":12,"deposits_len":0,"voluntary_exits_len":0,"sync_committee_participants":21,"block_number":0,"fee_recipient":"0x0000000000000000000000000000000000000000"},"blockVerified":true,"heads":1,"stateDataDur":{"value":8417},"sigVerifyDur":{"value":6546958},"stateVerifyDur":{"value":229417},"putBlockDur":{"value":210083},"epochRefDur":{"value":4417}}
{"lvl":"DBG","ts":"2023-01-03 21:56:07.733+00:00","msg":"Updated head block","topics":"chaindag","stateRoot":"07584766","justified":"0:00000000","finalized":"0:00000000","isOptHead":false,"newHead":"f1db17e9:20","lastHead":"264c4bde:19"}
{"lvl":"DBG","ts":"2023-01-03 21:56:07.734+00:00","msg":"Block processed","localHeadSlot":20,"blockSlot":20,"validationDur":{"value":0},"queueDur":{"value":197417},"storeBlockDur":{"value":7515041},"updateHeadDur":{"value":282667}}
Where node 3 initially sent that:
{"lvl":"NOT","ts":"2023-01-03 21:56:05.029+00:00","msg":"Block sent","blockRoot":"f1db17e9","blck":{"slot":20,"proposer_index":84,"parent_root":"264c4bde","state_root":"07584766","eth1data":{"deposit_root":"1681181313acd0d68b1e8d8d275665fe521a9ce5724858c3121cd2f2d84f3e33","deposit_count":1024,"block_hash":"4242424242424242424242424242424242424242424242424242424242424242"},"graffiti":"Nimbus/v22.12.0-c03e57-stateofus","proposer_slashings_len":0,"attester_slashings_len":0,"attestations_len":12,"deposits_len":0,"voluntary_exits_len":0,"sync_committee_participants":21,"block_number":0,"fee_recipient":"0x0000000000000000000000000000000000000000"},"signature":"a66896bc","delay":"29ms419us"}
{"lvl":"DBG","ts":"2023-01-03 21:56:05.037+00:00","msg":"Block resolved","blockRoot":"f1db17e9","blck":{"slot":20,"proposer_index":84,"parent_root":"264c4bde","state_root":"07584766","eth1data":{"deposit_root":"1681181313acd0d68b1e8d8d275665fe521a9ce5724858c3121cd2f2d84f3e33","deposit_count":1024,"block_hash":"4242424242424242424242424242424242424242424242424242424242424242"},"graffiti":"Nimbus/v22.12.0-c03e57-stateofus","proposer_slashings_len":0,"attester_slashings_len":0,"attestations_len":12,"deposits_len":0,"voluntary_exits_len":0,"sync_committee_participants":21,"block_number":0,"fee_recipient":"0x0000000000000000000000000000000000000000"},"blockVerified":true,"heads":1,"stateDataDur":{"value":10625},"sigVerifyDur":{"value":7561541},"stateVerifyDur":{"value":192250},"putBlockDur":{"value":268833},"epochRefDur":{"value":3417}}
{"lvl":"DBG","ts":"2023-01-03 21:56:05.038+00:00","msg":"Updated head block","topics":"chaindag","stateRoot":"07584766","justified":"0:00000000","finalized":"0:00000000","isOptHead":false,"newHead":"f1db17e9:20","lastHead":"264c4bde:19"}
{"lvl":"DBG","ts":"2023-01-03 21:56:05.038+00:00","msg":"Block processed","localHeadSlot":20,"blockSlot":20,"validationDur":{"value":0},"queueDur":{"value":38042},"storeBlockDur":{"value":8647625},"updateHeadDur":{"value":257000}}
This is the entirety of "Block received" via gossip, across all nodes and all slots:
local-testnet-minimal$ rg "Block received"
log0.txt
6294:{"lvl":"DBG","ts":"2023-01-03 21:54:53.019+00:00","msg":"Block received","topics":"beacnde","delay":"19ms893us","blockRoot":"83cebea6","wallSlot":8,"signature":"91a80241","blck":{"slot":8,"proposer_index":1003,"parent_root":"1aad008d","state_root":"2e2f695d","eth1data":{"deposit_root":"1681181313acd0d68b1e8d8d275665fe521a9ce5724858c3121cd2f2d84f3e33","deposit_count":1024,"block_hash":"4242424242424242424242424242424242424242424242424242424242424242"},"graffiti":"Nimbus/v22.12.0-c03e57-stateofus","proposer_slashings_len":0,"attester_slashings_len":0,"attestations_len":7,"deposits_len":0,"voluntary_exits_len":0,"sync_committee_participants":0,"block_number":0,"fee_recipient":""}}
7098:{"lvl":"DBG","ts":"2023-01-03 21:54:59.023+00:00","msg":"Block received","topics":"beacnde","delay":"23ms518us","blockRoot":"7f2a43ea","wallSlot":9,"signature":"aa987743","blck":{"slot":9,"proposer_index":804,"parent_root":"83cebea6","state_root":"094c2578","eth1data":{"deposit_root":"1681181313acd0d68b1e8d8d275665fe521a9ce5724858c3121cd2f2d84f3e33","deposit_count":1024,"block_hash":"4242424242424242424242424242424242424242424242424242424242424242"},"graffiti":"Nimbus/v22.12.0-c03e57-stateofus","proposer_slashings_len":0,"attester_slashings_len":0,"attestations_len":4,"deposits_len":0,"voluntary_exits_len":0,"sync_committee_participants":5,"block_number":0,"fee_recipient":""}}
7994:{"lvl":"DBG","ts":"2023-01-03 21:55:05.028+00:00","msg":"Block received","topics":"beacnde","delay":"28ms185us","blockRoot":"2a130009","wallSlot":10,"signature":"b72d5a20","blck":{"slot":10,"proposer_index":230,"parent_root":"7f2a43ea","state_root":"e82082e7","eth1data":{"deposit_root":"1681181313acd0d68b1e8d8d275665fe521a9ce5724858c3121cd2f2d84f3e33","deposit_count":1024,"block_hash":"4242424242424242424242424242424242424242424242424242424242424242"},"graffiti":"Nimbus/v22.12.0-c03e57-stateofus","proposer_slashings_len":0,"attester_slashings_len":0,"attestations_len":20,"deposits_len":0,"voluntary_exits_len":0,"sync_committee_participants":13,"block_number":0,"fee_recipient":""}}
8929:{"lvl":"DBG","ts":"2023-01-03 21:55:11.012+00:00","msg":"Block received","topics":"beacnde","delay":"12ms368us","blockRoot":"d1b30239","wallSlot":11,"signature":"b0c231e7","blck":{"slot":11,"proposer_index":920,"parent_root":"2a130009","state_root":"65d0af91","eth1data":{"deposit_root":"1681181313acd0d68b1e8d8d275665fe521a9ce5724858c3121cd2f2d84f3e33","deposit_count":1024,"block_hash":"4242424242424242424242424242424242424242424242424242424242424242"},"graffiti":"Nimbus/v22.12.0-c03e57-stateofus","proposer_slashings_len":0,"attester_slashings_len":0,"attestations_len":4,"deposits_len":0,"voluntary_exits_len":0,"sync_committee_participants":29,"block_number":0,"fee_recipient":""}}
9820:{"lvl":"DBG","ts":"2023-01-03 21:55:17.032+00:00","msg":"Block received","topics":"beacnde","delay":"32ms541us","blockRoot":"73c1b8f5","wallSlot":12,"signature":"8571e96a","blck":{"slot":12,"proposer_index":876,"parent_root":"d1b30239","state_root":"1bd3e6c9","eth1data":{"deposit_root":"1681181313acd0d68b1e8d8d275665fe521a9ce5724858c3121cd2f2d84f3e33","deposit_count":1024,"block_hash":"4242424242424242424242424242424242424242424242424242424242424242"},"graffiti":"Nimbus/v22.12.0-c03e57-stateofus","proposer_slashings_len":0,"attester_slashings_len":0,"attestations_len":4,"deposits_len":0,"voluntary_exits_len":0,"sync_committee_participants":29,"block_number":0,"fee_recipient":""}}
10744:{"lvl":"DBG","ts":"2023-01-03 21:55:23.033+00:00","msg":"Block received","topics":"beacnde","delay":"33ms487us","blockRoot":"0830c947","wallSlot":13,"signature":"ae0d9289","blck":{"slot":13,"proposer_index":456,"parent_root":"73c1b8f5","state_root":"67ba293e","eth1data":{"deposit_root":"1681181313acd0d68b1e8d8d275665fe521a9ce5724858c3121cd2f2d84f3e33","deposit_count":1024,"block_hash":"4242424242424242424242424242424242424242424242424242424242424242"},"graffiti":"Nimbus/v22.12.0-c03e57-stateofus","proposer_slashings_len":0,"attester_slashings_len":0,"attestations_len":8,"deposits_len":0,"voluntary_exits_len":0,"sync_committee_participants":29,"block_number":0,"fee_recipient":""}}
11584:{"lvl":"DBG","ts":"2023-01-03 21:55:29.018+00:00","msg":"Block received","topics":"beacnde","delay":"18ms649us","blockRoot":"f7bec601","wallSlot":14,"signature":"877a66b2","blck":{"slot":14,"proposer_index":686,"parent_root":"0830c947","state_root":"157c5ba6","eth1data":{"deposit_root":"1681181313acd0d68b1e8d8d275665fe521a9ce5724858c3121cd2f2d84f3e33","deposit_count":1024,"block_hash":"4242424242424242424242424242424242424242424242424242424242424242"},"graffiti":"Nimbus/v22.12.0-c03e57-stateofus","proposer_slashings_len":0,"attester_slashings_len":0,"attestations_len":4,"deposits_len":0,"voluntary_exits_len":0,"sync_committee_participants":29,"block_number":0,"fee_recipient":""}}
12431:{"lvl":"DBG","ts":"2023-01-03 21:55:35.032+00:00","msg":"Block received","topics":"beacnde","delay":"32ms632us","blockRoot":"64f7df29","wallSlot":15,"signature":"b484524e","blck":{"slot":15,"proposer_index":96,"parent_root":"f7bec601","state_root":"4548c429","eth1data":{"deposit_root":"1681181313acd0d68b1e8d8d275665fe521a9ce5724858c3121cd2f2d84f3e33","deposit_count":1024,"block_hash":"4242424242424242424242424242424242424242424242424242424242424242"},"graffiti":"Nimbus/v22.12.0-c03e57-stateofus","proposer_slashings_len":0,"attester_slashings_len":0,"attestations_len":5,"deposits_len":0,"voluntary_exits_len":0,"sync_committee_participants":29,"block_number":0,"fee_recipient":""}}
There should be 3 block received logs for every slot (not the one which sent the block,, of the 4 nodes).
The trouble with relying on req/resp, aside from that block gossip shouldn't be failing to begin with, is that since the attestations get dropped, these blocks don't necessarily get enough votes to keep the head moving. By the time req/resp gets processed, it's too late for the nodes to count votes in a similar way.
So, the combination of those Nim issues/bugs and this particular failure mode causes me to be skeptical that this refactoring is working as intended.
N=0; while ./scripts/launch_local_testnet.sh --preset minimal --nodes 4 --disable-htop --stop-at-epoch 8 -- --verify-finalization --discv5:no; do N=$((N+1)); echo "That was run #${N}"; sleep 67; done
where that variation runs the minimal preset version in a loop until it fails for some reason.
./scripts/launch_local_testnet.sh --preset minimal --nodes 4 --disable-htop --stop-at-epoch 8 -- --verify-finalization --discv5:no
from the nimbus-eth2
directory runs the local testnet a single time, and the various options can of course be adjusted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for digging into this and for the explanations. I've removed the template. I figured the refactoring was straightforward enough; lesson learned. In any case, it belonged at the least in a separate commit and probably in a different PR.
As a sidenote, this also changed the order in which validators were installed. I did that that to group related validators for clarity, assuming the change had to be innocuous, but given the specific order in which the validators were being installed, am now wondering if there might have been a reason for that order. Anyway, order restored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for digging into this and for the explanations. I've removed the template. I figured the refactoring was straightforward enough; lesson learned. In any case, it belonged at the least in a separate commit and probably in a different PR.
Sure. I agree, in principle, with the refactoring, but also that's it probably shouldn't be part of this PR.
As a sidenote, this also changed the order in which validators were installed. I did that that to group related validators for clarity, assuming the change had to be innocuous, but given the specific order in which the validators were being installed, am now wondering if there might have been a reason for that order. Anyway, order restored.
The order was not intended to be important. They'll get called by nim-libp2p
in whatever order messages come in, regardless. Might as well keep the order though.
Adds gossip for the
beacon_block_and_blobs_sidecar
topic. Validation is mostly stubbed out pending integration of nim-kzg library.