Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eip4844 gossip #4444

Merged
merged 10 commits into from
Jan 4, 2023
Merged

eip4844 gossip #4444

merged 10 commits into from
Jan 4, 2023

Conversation

henridf
Copy link
Contributor

@henridf henridf commented Dec 21, 2022

Adds gossip for the beacon_block_and_blobs_sidecar topic. Validation is mostly stubbed out pending integration of nim-kzg library.

@github-actions
Copy link

github-actions bot commented Dec 21, 2022

Unit Test Results

         9 files  ±0    1 047 suites  ±0   28m 45s ⏱️ - 4m 40s
  3 407 tests ±0    3 170 ✔️ ±0  237 💤 ±0  0 ±0 
14 735 runs  ±0  14 470 ✔️ ±0  265 💤 ±0  0 ±0 

Results for commit c03e57a. ± Comparison against base commit aff0505.

♻️ This comment has been updated with latest results.

@henridf henridf mentioned this pull request Dec 22, 2022
21 tasks
@henridf henridf force-pushed the eip4844-gossip branch 2 times, most recently from a5e806a to 373fcad Compare December 23, 2022 08:18
@henridf henridf marked this pull request as ready for review December 23, 2022 08:18
@henridf
Copy link
Contributor Author

henridf commented Dec 23, 2022

@tersec updated per our offline discussion: nim-kzg dependency removed for now, and fixed URLs to point to alpha.2.

@henridf henridf mentioned this pull request Jan 3, 2023
return blockRes

let sidecarRes = validateBeaconBlockAndBlobsSidecar(signedBlockAndBlobsSidecar)
if sidecarRes.isOk():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

codewise, it makes more sense to use isErr here and return early so as to keep the same flow for all error returns and have the success condition at the end..

also, I wonder if we should structure the code more towards the blob being decoupled from each other - not in this PR perhaps, but probably we want to run blob/sidecar validation in parallel (in a decoupled design, we'd need a quarantine / holding area for unmatched blob/block pairs etc but it would be interesting to explore, code-wise, what the impact is)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

codewise, it makes more sense to use isErr here and return early so as to keep the same flow for all error returns and have the success condition at the end..

fully agreed, done in c03e57a

but probably we want to run blob/sidecar validation in parallel

Is the motivation to limit validation latency? I guess I don't have much intuition at this point of how much extra sequential processing time blob validation will take, but if the additional latency is notable then yes it makes sense.

Or maybe this is this future-proofing for a world where gossip ends up being reworked to have separate blocks and blobs topics? I agree with that benefit (ofc depends on if/when that gossip change would be made).

template installBeaconBlocksValidator(digest: auto, phase: auto) =
node.network.addValidator(
getBeaconBlocksTopic(digest),
proc (signedBlock: phase.SignedBeaconBlock): ValidationResult =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this phase.SignedBeaconBlock method works reliably, that'd be useful more broadly.

The current codebase doesn't do this, I'm not sure if it's documented, and Nim does some strange things with it, though:

let x = 3
template f(y: auto) = echo y.x      # Builds, and
f(system)                           # outputs "3"; same with f(y: untyped)

# template g(y: untyped) = echo y.3 # Error: identifier expected, but got '3'
# echo system.3                     # Error: identifier expected, but got '3'

#echo system.x                       # Error: undeclared identifier: 'x'

Also, Nim 1.6 still doesn't properly typecheck these situations -- nim-lang/Nim#1027 was only fixed a couple months ago in Nim devel. I've asked for a backport of its fix nim-lang/Nim#20631 to 1.6, since Nimbus uses this pattern extensively.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One approach used is to pass the entire type (phase0.SignedBeaconBlock, altair.SignedBeaconBlock, et cetera) as a type or generics parameter to a template or function. Repeats itself slightly, but the semantics are better-defined -- those things are in themselves types, while it's not clear at least to me what Nim thinks is happening here.

Copy link
Contributor

@tersec tersec Jan 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm partly concerned because the local testnets in CI keep failing (e.g., https://ci.status.im/blue/organizations/jenkins/nimbus-eth2%2Fplatforms%2Flinux%2Fx86_64/detail/PR-4444/7/pipeline https://ci.status.im/blue/organizations/jenkins/nimbus-eth2%2Fplatforms%2Fmacos%2Faarch64/detail/PR-4444/7/pipeline https://ci.status.im/job/nimbus-eth2/job/platforms/job/macos/job/x86_64/job/PR-4444/7/display/redirect).

Block receive gossip just stops working, regardless of fork or slot, entirely with this PR.

So, blocks get either sent by a node, which sends them through the block processing pipeline locally or sent and ... apparently not really processed properly. This works for a while, because nodes manage to request backfill blocks in time via req/resp, e.g.:

{"lvl":"DBG","ts":"2023-01-03 21:56:07.723+00:00","msg":"Requesting detected missing blocks","topics":"beacnde","blocks":"[f1db17e9]"}
{"lvl":"DBG","ts":"2023-01-03 21:56:07.723+00:00","msg":"Requesting blocks by root","topics":"requman","peer":"16U*ZvWiek","blocks":"[f1db17e9]","peer_score":1000}
{"lvl":"DBG","ts":"2023-01-03 21:56:07.733+00:00","msg":"Block resolved","blockRoot":"f1db17e9","blck":{"slot":20,"proposer_index":84,"parent_root":"264c4bde","state_root":"07584766","eth1data":{"deposit_root":"1681181313acd0d68b1e8d8d275665fe521a9ce5724858c3121cd2f2d84f3e33","deposit_count":1024,"block_hash":"4242424242424242424242424242424242424242424242424242424242424242"},"graffiti":"Nimbus/v22.12.0-c03e57-stateofus","proposer_slashings_len":0,"attester_slashings_len":0,"attestations_len":12,"deposits_len":0,"voluntary_exits_len":0,"sync_committee_participants":21,"block_number":0,"fee_recipient":"0x0000000000000000000000000000000000000000"},"blockVerified":true,"heads":1,"stateDataDur":{"value":8417},"sigVerifyDur":{"value":6546958},"stateVerifyDur":{"value":229417},"putBlockDur":{"value":210083},"epochRefDur":{"value":4417}}
{"lvl":"DBG","ts":"2023-01-03 21:56:07.733+00:00","msg":"Updated head block","topics":"chaindag","stateRoot":"07584766","justified":"0:00000000","finalized":"0:00000000","isOptHead":false,"newHead":"f1db17e9:20","lastHead":"264c4bde:19"}
{"lvl":"DBG","ts":"2023-01-03 21:56:07.734+00:00","msg":"Block processed","localHeadSlot":20,"blockSlot":20,"validationDur":{"value":0},"queueDur":{"value":197417},"storeBlockDur":{"value":7515041},"updateHeadDur":{"value":282667}}

Where node 3 initially sent that:

{"lvl":"NOT","ts":"2023-01-03 21:56:05.029+00:00","msg":"Block sent","blockRoot":"f1db17e9","blck":{"slot":20,"proposer_index":84,"parent_root":"264c4bde","state_root":"07584766","eth1data":{"deposit_root":"1681181313acd0d68b1e8d8d275665fe521a9ce5724858c3121cd2f2d84f3e33","deposit_count":1024,"block_hash":"4242424242424242424242424242424242424242424242424242424242424242"},"graffiti":"Nimbus/v22.12.0-c03e57-stateofus","proposer_slashings_len":0,"attester_slashings_len":0,"attestations_len":12,"deposits_len":0,"voluntary_exits_len":0,"sync_committee_participants":21,"block_number":0,"fee_recipient":"0x0000000000000000000000000000000000000000"},"signature":"a66896bc","delay":"29ms419us"}
{"lvl":"DBG","ts":"2023-01-03 21:56:05.037+00:00","msg":"Block resolved","blockRoot":"f1db17e9","blck":{"slot":20,"proposer_index":84,"parent_root":"264c4bde","state_root":"07584766","eth1data":{"deposit_root":"1681181313acd0d68b1e8d8d275665fe521a9ce5724858c3121cd2f2d84f3e33","deposit_count":1024,"block_hash":"4242424242424242424242424242424242424242424242424242424242424242"},"graffiti":"Nimbus/v22.12.0-c03e57-stateofus","proposer_slashings_len":0,"attester_slashings_len":0,"attestations_len":12,"deposits_len":0,"voluntary_exits_len":0,"sync_committee_participants":21,"block_number":0,"fee_recipient":"0x0000000000000000000000000000000000000000"},"blockVerified":true,"heads":1,"stateDataDur":{"value":10625},"sigVerifyDur":{"value":7561541},"stateVerifyDur":{"value":192250},"putBlockDur":{"value":268833},"epochRefDur":{"value":3417}}
{"lvl":"DBG","ts":"2023-01-03 21:56:05.038+00:00","msg":"Updated head block","topics":"chaindag","stateRoot":"07584766","justified":"0:00000000","finalized":"0:00000000","isOptHead":false,"newHead":"f1db17e9:20","lastHead":"264c4bde:19"}
{"lvl":"DBG","ts":"2023-01-03 21:56:05.038+00:00","msg":"Block processed","localHeadSlot":20,"blockSlot":20,"validationDur":{"value":0},"queueDur":{"value":38042},"storeBlockDur":{"value":8647625},"updateHeadDur":{"value":257000}}

This is the entirety of "Block received" via gossip, across all nodes and all slots:

local-testnet-minimal$ rg "Block received"
log0.txt
6294:{"lvl":"DBG","ts":"2023-01-03 21:54:53.019+00:00","msg":"Block received","topics":"beacnde","delay":"19ms893us","blockRoot":"83cebea6","wallSlot":8,"signature":"91a80241","blck":{"slot":8,"proposer_index":1003,"parent_root":"1aad008d","state_root":"2e2f695d","eth1data":{"deposit_root":"1681181313acd0d68b1e8d8d275665fe521a9ce5724858c3121cd2f2d84f3e33","deposit_count":1024,"block_hash":"4242424242424242424242424242424242424242424242424242424242424242"},"graffiti":"Nimbus/v22.12.0-c03e57-stateofus","proposer_slashings_len":0,"attester_slashings_len":0,"attestations_len":7,"deposits_len":0,"voluntary_exits_len":0,"sync_committee_participants":0,"block_number":0,"fee_recipient":""}}
7098:{"lvl":"DBG","ts":"2023-01-03 21:54:59.023+00:00","msg":"Block received","topics":"beacnde","delay":"23ms518us","blockRoot":"7f2a43ea","wallSlot":9,"signature":"aa987743","blck":{"slot":9,"proposer_index":804,"parent_root":"83cebea6","state_root":"094c2578","eth1data":{"deposit_root":"1681181313acd0d68b1e8d8d275665fe521a9ce5724858c3121cd2f2d84f3e33","deposit_count":1024,"block_hash":"4242424242424242424242424242424242424242424242424242424242424242"},"graffiti":"Nimbus/v22.12.0-c03e57-stateofus","proposer_slashings_len":0,"attester_slashings_len":0,"attestations_len":4,"deposits_len":0,"voluntary_exits_len":0,"sync_committee_participants":5,"block_number":0,"fee_recipient":""}}
7994:{"lvl":"DBG","ts":"2023-01-03 21:55:05.028+00:00","msg":"Block received","topics":"beacnde","delay":"28ms185us","blockRoot":"2a130009","wallSlot":10,"signature":"b72d5a20","blck":{"slot":10,"proposer_index":230,"parent_root":"7f2a43ea","state_root":"e82082e7","eth1data":{"deposit_root":"1681181313acd0d68b1e8d8d275665fe521a9ce5724858c3121cd2f2d84f3e33","deposit_count":1024,"block_hash":"4242424242424242424242424242424242424242424242424242424242424242"},"graffiti":"Nimbus/v22.12.0-c03e57-stateofus","proposer_slashings_len":0,"attester_slashings_len":0,"attestations_len":20,"deposits_len":0,"voluntary_exits_len":0,"sync_committee_participants":13,"block_number":0,"fee_recipient":""}}
8929:{"lvl":"DBG","ts":"2023-01-03 21:55:11.012+00:00","msg":"Block received","topics":"beacnde","delay":"12ms368us","blockRoot":"d1b30239","wallSlot":11,"signature":"b0c231e7","blck":{"slot":11,"proposer_index":920,"parent_root":"2a130009","state_root":"65d0af91","eth1data":{"deposit_root":"1681181313acd0d68b1e8d8d275665fe521a9ce5724858c3121cd2f2d84f3e33","deposit_count":1024,"block_hash":"4242424242424242424242424242424242424242424242424242424242424242"},"graffiti":"Nimbus/v22.12.0-c03e57-stateofus","proposer_slashings_len":0,"attester_slashings_len":0,"attestations_len":4,"deposits_len":0,"voluntary_exits_len":0,"sync_committee_participants":29,"block_number":0,"fee_recipient":""}}
9820:{"lvl":"DBG","ts":"2023-01-03 21:55:17.032+00:00","msg":"Block received","topics":"beacnde","delay":"32ms541us","blockRoot":"73c1b8f5","wallSlot":12,"signature":"8571e96a","blck":{"slot":12,"proposer_index":876,"parent_root":"d1b30239","state_root":"1bd3e6c9","eth1data":{"deposit_root":"1681181313acd0d68b1e8d8d275665fe521a9ce5724858c3121cd2f2d84f3e33","deposit_count":1024,"block_hash":"4242424242424242424242424242424242424242424242424242424242424242"},"graffiti":"Nimbus/v22.12.0-c03e57-stateofus","proposer_slashings_len":0,"attester_slashings_len":0,"attestations_len":4,"deposits_len":0,"voluntary_exits_len":0,"sync_committee_participants":29,"block_number":0,"fee_recipient":""}}
10744:{"lvl":"DBG","ts":"2023-01-03 21:55:23.033+00:00","msg":"Block received","topics":"beacnde","delay":"33ms487us","blockRoot":"0830c947","wallSlot":13,"signature":"ae0d9289","blck":{"slot":13,"proposer_index":456,"parent_root":"73c1b8f5","state_root":"67ba293e","eth1data":{"deposit_root":"1681181313acd0d68b1e8d8d275665fe521a9ce5724858c3121cd2f2d84f3e33","deposit_count":1024,"block_hash":"4242424242424242424242424242424242424242424242424242424242424242"},"graffiti":"Nimbus/v22.12.0-c03e57-stateofus","proposer_slashings_len":0,"attester_slashings_len":0,"attestations_len":8,"deposits_len":0,"voluntary_exits_len":0,"sync_committee_participants":29,"block_number":0,"fee_recipient":""}}
11584:{"lvl":"DBG","ts":"2023-01-03 21:55:29.018+00:00","msg":"Block received","topics":"beacnde","delay":"18ms649us","blockRoot":"f7bec601","wallSlot":14,"signature":"877a66b2","blck":{"slot":14,"proposer_index":686,"parent_root":"0830c947","state_root":"157c5ba6","eth1data":{"deposit_root":"1681181313acd0d68b1e8d8d275665fe521a9ce5724858c3121cd2f2d84f3e33","deposit_count":1024,"block_hash":"4242424242424242424242424242424242424242424242424242424242424242"},"graffiti":"Nimbus/v22.12.0-c03e57-stateofus","proposer_slashings_len":0,"attester_slashings_len":0,"attestations_len":4,"deposits_len":0,"voluntary_exits_len":0,"sync_committee_participants":29,"block_number":0,"fee_recipient":""}}
12431:{"lvl":"DBG","ts":"2023-01-03 21:55:35.032+00:00","msg":"Block received","topics":"beacnde","delay":"32ms632us","blockRoot":"64f7df29","wallSlot":15,"signature":"b484524e","blck":{"slot":15,"proposer_index":96,"parent_root":"f7bec601","state_root":"4548c429","eth1data":{"deposit_root":"1681181313acd0d68b1e8d8d275665fe521a9ce5724858c3121cd2f2d84f3e33","deposit_count":1024,"block_hash":"4242424242424242424242424242424242424242424242424242424242424242"},"graffiti":"Nimbus/v22.12.0-c03e57-stateofus","proposer_slashings_len":0,"attester_slashings_len":0,"attestations_len":5,"deposits_len":0,"voluntary_exits_len":0,"sync_committee_participants":29,"block_number":0,"fee_recipient":""}}

There should be 3 block received logs for every slot (not the one which sent the block,, of the 4 nodes).

The trouble with relying on req/resp, aside from that block gossip shouldn't be failing to begin with, is that since the attestations get dropped, these blocks don't necessarily get enough votes to keep the head moving. By the time req/resp gets processed, it's too late for the nodes to count votes in a similar way.

So, the combination of those Nim issues/bugs and this particular failure mode causes me to be skeptical that this refactoring is working as intended.

N=0; while ./scripts/launch_local_testnet.sh --preset minimal --nodes 4 --disable-htop --stop-at-epoch 8 -- --verify-finalization --discv5:no; do N=$((N+1)); echo "That was run #${N}"; sleep 67; done

where that variation runs the minimal preset version in a loop until it fails for some reason.

./scripts/launch_local_testnet.sh --preset minimal --nodes 4 --disable-htop --stop-at-epoch 8 -- --verify-finalization --discv5:no

from the nimbus-eth2 directory runs the local testnet a single time, and the various options can of course be adjusted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for digging into this and for the explanations. I've removed the template. I figured the refactoring was straightforward enough; lesson learned. In any case, it belonged at the least in a separate commit and probably in a different PR.

As a sidenote, this also changed the order in which validators were installed. I did that that to group related validators for clarity, assuming the change had to be innocuous, but given the specific order in which the validators were being installed, am now wondering if there might have been a reason for that order. Anyway, order restored.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for digging into this and for the explanations. I've removed the template. I figured the refactoring was straightforward enough; lesson learned. In any case, it belonged at the least in a separate commit and probably in a different PR.

Sure. I agree, in principle, with the refactoring, but also that's it probably shouldn't be part of this PR.

As a sidenote, this also changed the order in which validators were installed. I did that that to group related validators for clarity, assuming the change had to be innocuous, but given the specific order in which the validators were being installed, am now wondering if there might have been a reason for that order. Anyway, order restored.

The order was not intended to be important. They'll get called by nim-libp2p in whatever order messages come in, regardless. Might as well keep the order though.

@tersec tersec enabled auto-merge (squash) January 4, 2023 11:21
@tersec tersec merged commit 8251cc2 into status-im:unstable Jan 4, 2023
@henridf henridf deleted the eip4844-gossip branch February 6, 2023 10:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants