Descendants of `INVALID` block are not deleted in optsync #3954

etan-status · 2022-08-11T16:47:52Z

Describe the bug
On testing-large-01.aws-eu-central-1a.nimbus.prater, the Nimbus beacon node entered a restart loop and crashes on start:

{"lvl":"INF","ts":"2022-08-11 10:14:13.534+00:00","msg":"Loading block DAG from database","topics":"beacnde","path":"/data/beacon-node-prater-testing/data/db"}
/data/beacon-node-prater-testing/repo/vendor/nim-libp2p/libp2p/stream/bufferstream.nim(456) main
/data/beacon-node-prater-testing/repo/vendor/nim-libp2p/libp2p/stream/bufferstream.nim(449) NimMain
/data/beacon-node-prater-testing/repo/beacon_chain/nimbus_beacon_node.nim(2177) main
/data/beacon-node-prater-testing/repo/beacon_chain/nimbus_beacon_node.nim(2045) handleStartUpCmd
/data/beacon-node-prater-testing/repo/beacon_chain/nimbus_beacon_node.nim(1863) doRunBeaconNode
/data/beacon-node-prater-testing/repo/beacon_chain/nimbus_beacon_node.nim(646) init
/data/beacon-node-prater-testing/repo/beacon_chain/nimbus_beacon_node.nim(191) loadChainDag
/data/beacon-node-prater-testing/repo/beacon_chain/consensus_object_pools/blockchain_dag.nim(836) init
/data/beacon-node-prater-testing/repo/vendor/nim-stew/stew/results.nim(756) expect
/data/beacon-node-prater-testing/repo/vendor/nim-stew/stew/results.nim(348) raiseResultDefect
Error: unhandled exception: not nil [ResultDefect]

The problem occurs because in the DB, when loading blocks from head => finalizedHead, one block is missing from the DB.
Examining the logs reveals that the missing block actually got judged as INVALID by the EL. Whether this is due to the config issue is a separate problem.

{"lvl":"DBG","ts":"2022-08-11 09:42:07.883+00:00","msg":"newPayload: succeeded","parentHash":"893a5a27","blockHash":"1642d9bc","blockNumber":7384505,"payloadStatus":2}
{"lvl":"DBG","ts":"2022-08-11 09:42:07.883+00:00","msg":"runQueueProcessingLoop: execution payload invalid","executionPayloadStatus":2,"blck":{"blck":{"slot":3641909,"proposer_index":330973,"parent_root":"c8752b72","state_root":"692dbc94","eth1data":{"deposit_root":"a9256931ff65af92d47c829fced0582b07c055019c468be5c4b20190b3745ee0","deposit_count":169791,"block_hash":"4aecd111f7df76364d8c8cf46917ee938543ebd6f7c3026e1d3c5fecbe412a19"},"graffiti":"teku/v22.7.0","proposer_slashings_len":0,"attester_slashings_len":0,"attestations_len":128,"deposits_len":0,"voluntary_exits_len":0,"sync_committee_participants":391},"signature":"8efaf57a"}}
{"lvl":"DBG","ts":"2022-08-11 09:42:07.883+00:00","msg":"markBlockInvalid","topics":"chaindag","blck":"7b63b0d2:3641909"}
{"lvl":"NOT","ts":"2022-08-11 09:42:07.884+00:00","msg":"Received invalid block","topics":"requman","peer":"16U*Sr8Grn","blocks":"[7b63b0d2, 7b63b0d2, 7b63b0d2, c11444f7, c11444f7, c11444f7]","peer_score":600}
{"lvl":"DBG","ts":"2022-08-11 09:42:07.884+00:00","msg":"Peer was removed from PeerPool due to low score","topics":"beacnde","peer":"16U*Sr8Grn","peer_score":-400,"score_low_limit":0,"score_high_limit":1000}

markBlockInvalid needs to better handle the case and also delete all descendents from the DB, and also revert the DAG head back to the parent, if this happens.

Note that it is probably safe to ignore markBlockInvalid for any slot <= dag.finalizedHead.slot, as DAG updateHead won't allow reverting to anything before that.

To Reproduce
Steps to reproduce the behavior:

Platform details (OS, architecture): testing-large-01.aws-eu-central-1a.nimbus.prater
Branch/commit used: v22.7.0-7cac6f-stateofus
Commands being executed: Regular startup
Relevant log lines: See above

Additional context
This was on a server where multiple CL were configured against the same EL. It is unlikely for a block so deep to be reported as INVALID during normal operation. However, if it happens, it is still a bug that should not corrupt the CL database.

The text was updated successfully, but these errors were encountered:

tersec · 2022-10-03T16:28:28Z

#4174

etan-status assigned tersec Aug 11, 2022

zah added this to the v22.8.0 milestone Aug 12, 2022

This was referenced Sep 1, 2022

clean up more properly after fcU-INVALID #4058

Closed

handle INVALIDATED forkchoiceUpdated better #4081

Merged

tersec closed this as completed Oct 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Descendants of `INVALID` block are not deleted in optsync #3954

Descendants of `INVALID` block are not deleted in optsync #3954

etan-status commented Aug 11, 2022

tersec commented Oct 3, 2022

Descendants of INVALID block are not deleted in optsync #3954

Descendants of INVALID block are not deleted in optsync #3954

Comments

etan-status commented Aug 11, 2022

tersec commented Oct 3, 2022

Descendants of `INVALID` block are not deleted in optsync #3954

Descendants of `INVALID` block are not deleted in optsync #3954