Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Descendants of INVALID block are not deleted in optsync #3954

Closed
etan-status opened this issue Aug 11, 2022 · 1 comment
Closed

Descendants of INVALID block are not deleted in optsync #3954

etan-status opened this issue Aug 11, 2022 · 1 comment
Assignees
Milestone

Comments

@etan-status
Copy link
Contributor

Describe the bug
On testing-large-01.aws-eu-central-1a.nimbus.prater, the Nimbus beacon node entered a restart loop and crashes on start:

{"lvl":"INF","ts":"2022-08-11 10:14:13.534+00:00","msg":"Loading block DAG from database","topics":"beacnde","path":"/data/beacon-node-prater-testing/data/db"}
/data/beacon-node-prater-testing/repo/vendor/nim-libp2p/libp2p/stream/bufferstream.nim(456) main
/data/beacon-node-prater-testing/repo/vendor/nim-libp2p/libp2p/stream/bufferstream.nim(449) NimMain
/data/beacon-node-prater-testing/repo/beacon_chain/nimbus_beacon_node.nim(2177) main
/data/beacon-node-prater-testing/repo/beacon_chain/nimbus_beacon_node.nim(2045) handleStartUpCmd
/data/beacon-node-prater-testing/repo/beacon_chain/nimbus_beacon_node.nim(1863) doRunBeaconNode
/data/beacon-node-prater-testing/repo/beacon_chain/nimbus_beacon_node.nim(646) init
/data/beacon-node-prater-testing/repo/beacon_chain/nimbus_beacon_node.nim(191) loadChainDag
/data/beacon-node-prater-testing/repo/beacon_chain/consensus_object_pools/blockchain_dag.nim(836) init
/data/beacon-node-prater-testing/repo/vendor/nim-stew/stew/results.nim(756) expect
/data/beacon-node-prater-testing/repo/vendor/nim-stew/stew/results.nim(348) raiseResultDefect
Error: unhandled exception: not nil [ResultDefect]

The problem occurs because in the DB, when loading blocks from head => finalizedHead, one block is missing from the DB.
Examining the logs reveals that the missing block actually got judged as INVALID by the EL. Whether this is due to the config issue is a separate problem.

{"lvl":"DBG","ts":"2022-08-11 09:42:07.883+00:00","msg":"newPayload: succeeded","parentHash":"893a5a27","blockHash":"1642d9bc","blockNumber":7384505,"payloadStatus":2}
{"lvl":"DBG","ts":"2022-08-11 09:42:07.883+00:00","msg":"runQueueProcessingLoop: execution payload invalid","executionPayloadStatus":2,"blck":{"blck":{"slot":3641909,"proposer_index":330973,"parent_root":"c8752b72","state_root":"692dbc94","eth1data":{"deposit_root":"a9256931ff65af92d47c829fced0582b07c055019c468be5c4b20190b3745ee0","deposit_count":169791,"block_hash":"4aecd111f7df76364d8c8cf46917ee938543ebd6f7c3026e1d3c5fecbe412a19"},"graffiti":"teku/v22.7.0","proposer_slashings_len":0,"attester_slashings_len":0,"attestations_len":128,"deposits_len":0,"voluntary_exits_len":0,"sync_committee_participants":391},"signature":"8efaf57a"}}
{"lvl":"DBG","ts":"2022-08-11 09:42:07.883+00:00","msg":"markBlockInvalid","topics":"chaindag","blck":"7b63b0d2:3641909"}
{"lvl":"NOT","ts":"2022-08-11 09:42:07.884+00:00","msg":"Received invalid block","topics":"requman","peer":"16U*Sr8Grn","blocks":"[7b63b0d2, 7b63b0d2, 7b63b0d2, c11444f7, c11444f7, c11444f7]","peer_score":600}
{"lvl":"DBG","ts":"2022-08-11 09:42:07.884+00:00","msg":"Peer was removed from PeerPool due to low score","topics":"beacnde","peer":"16U*Sr8Grn","peer_score":-400,"score_low_limit":0,"score_high_limit":1000}

markBlockInvalid needs to better handle the case and also delete all descendents from the DB, and also revert the DAG head back to the parent, if this happens.

Note that it is probably safe to ignore markBlockInvalid for any slot <= dag.finalizedHead.slot, as DAG updateHead won't allow reverting to anything before that.

To Reproduce
Steps to reproduce the behavior:

  1. Platform details (OS, architecture): testing-large-01.aws-eu-central-1a.nimbus.prater
  2. Branch/commit used: v22.7.0-7cac6f-stateofus
  3. Commands being executed: Regular startup
  4. Relevant log lines: See above

Additional context
This was on a server where multiple CL were configured against the same EL. It is unlikely for a block so deep to be reported as INVALID during normal operation. However, if it happens, it is still a bug that should not corrupt the CL database.

@tersec
Copy link
Contributor

tersec commented Oct 3, 2022

#4174

@tersec tersec closed this as completed Oct 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants