[Ready 1/2] Fork choice rewrite #865

mratsim · 2020-04-06T22:33:36Z

Fork choice complete refactor.

This closes #719 but no EF test vectors yet (#777) see ethereum/consensus-spec-tests#17 for upstream discussion.

The implementation closely follows Lighthouse's at https://github.com/sigp/lighthouse/tree/869b0621d6f51e98473bb67cec0446d3623b8957/eth2/proto_array_fork_choice/src

Lighthouse implementation is derived from Protolambda's Array-based stateful DAG at https://github.com/protolambda/lmd-ghost

Fun fact, Protolambda's forked the fork at: https://github.com/protolambda/eth2-py-hacks/blob/ae2865670dcb0427f10b0725b897cd2d7b887c9c/proto_array.py

Other resources:

Spec: https://github.com/ethereum/eth2.0-specs/blob/v0.11.1/specs/phase0/fork-choice.md
Prysmatic write up: https://hackmd.io/bABJiht3Q9SyV3Ga4FT9lQ#High-level-concept
Gasper White Paper from march 6: https://arxiv.org/abs/2003.03052

This highlighted 2 critical bugs

in Nimcrypto, fixed by bump nimcrypto: fix equality check of hashes #864. When nimcrypto.hash module was imported == was sometimes wrong causing mayhem in fork choice. This might be the final solution to the finality issues we have.
var openarray seems unstable, uncommenting the debugEcho in the following spot make the fork choice compute_deltas test give the wrong results, this is only for the "isMainModule" sanity checks, the unit tests are not affected: https://github.com/status-im/nim-beacon-chain/blob/65889784fc7b8a65451f1e3b7f7f56ba1c529d8c/beacon_chain/fork_choice/fork_choice.nim#L182-L208

TODO:

- The current branch is littered with (commented out) debugEcho that have to be removed
- in a second PR - Replacing the old fork choice in the attestation_pool with the new one

Note: The PR will be separated in 2 as this one is already 3000 lines of complex code.
This only add the fork choice and accompanying tests as a separate module.
It does not replace the current fork choice used in testnet.

This will be done in a second PR, that will update the attestation_pool https://github.com/status-im/nim-beacon-chain/blob/bd5400aea4f23a722c207826b21070db7966bee5/beacon_chain/attestation_pool.nim#L378-L440

Postponed / Not in scope:

Internally we are using Nim options and tables but those raise exceptions. Ideally we have a fork that doesn't except.
The implementation has the slot and state_root as unused metadata, mirroring Lighthouse but this might be useless for us or we might need more metadata, in that case maybe we should have a generic "metadata" field here: https://github.com/status-im/nim-beacon-chain/blob/65889784fc7b8a65451f1e3b7f7f56ba1c529d8c/beacon_chain/fork_choice/fork_choice_types.nim#L93-L103
The current implementation (and Lighthouse / Prysmatic) does not implement should_update_justified_checkpoint which protect against a class of attack called "bouncing attacks": https://github.com/ethereum/eth2.0-specs/blob/v0.11.1/specs/phase0/fork-choice.md#should_update_justified_checkpoint
Implementing the 1-to-1 with spec might help generating test cases as internally the control flow is quite complex. However the interface and error model are very different and it is a significant effort: https://github.com/status-im/nim-beacon-chain/tree/fork-choice/beacon_chain/fork_choice
Protolambda has a variant that always prune at https://github.com/protolambda/eth2-py-hacks/blob/ae2865670dcb0427f10b0725b897cd2d7b887c9c/proto_array.py by making pruning cheaper. Instead of scanning the proto_array and substracting on a prune, it keeps the pruned index offset as a field and only substract on query.

Tips for debugging in the future

The commit 9146bcd removed the debugEcho I found useful to debug fork choice. It can be locally reverted to restore them for debugging.