-
Notifications
You must be signed in to change notification settings - Fork 248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Ready 1/2] Fork choice rewrite #865
Conversation
votes: var openArray[VoteTracker], | ||
old_balances: openarray[Gwei], | ||
new_balances: openarray[Gwei] | ||
): ForkChoiceError {.raises: [KeyError].} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also returning a Result[void, ForkChoiceError]
here will make the other code more convenient - you can then start using ?
and other helpers making it more "regular" because there's a common language for "success" across the codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's internal to fork choice and used in only a single place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's a useful habit once you start using these tools - let's you reuse the utilities developed for it an maintain consistency - but sure, doesn't really matter in private api
since I found some places where errors were returned as |
justified_epoch: Epoch, | ||
finalized_epoch: Epoch, | ||
finalized_root: Eth2Digest | ||
): Result[ForkChoice, string] {.raises: [KeyError].} = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto
# Sanity checks on internal private procedures | ||
|
||
when isMainModule: | ||
import stew/endians2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests of private procedures are relevant to the recent discussions regarding the module tests feature. @arnetheduck was insisting that our policy should mandate that only public procs should be tested and preferably from suites that import the tested modules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's quite important to have those. Debugging the fork choice is quite complex and isolated sanity checks of the smaller part help having a more robust codebase. In particular those tests highlight a potential var openarray bug here: https://github.com/status-im/nim-beacon-chain/blob/65889784fc7b8a65451f1e3b7f7f56ba1c529d8c/beacon_chain/fork_choice/fork_choice.nim#L182-L208
that is not caught in the public API.
The public API is tested with the normal framework.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used to think testing private api was important too - but then I noticed that something is wrong almost every single time I feel that need - something is too complicated or not factored correctly - and it adds to the burden when refactoring, something I want to optimize for because refactoring is what makes the difference between an average codebase and an excellent one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to the code smell, it's also a sign that the testing of your public api is not comprehensive - it's along the same lines as when control-flow bugs happen - it means that there are gaps in the tests and failures are not being accounted for..
static: doAssert ProtoNode.supportsCopyMem(), "ProtoNode must be a trivial type" | ||
let tail = self.nodes.len - finalized_index | ||
# TODO: can we have an unallocated `self.nodes`? i.e. self.nodes[0] is nil | ||
moveMem(self.nodes[0].addr, self.nodes[finalized_index].addr, tail * sizeof(ProtoNode)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The complicated logic here makes me wonder why the Deque
types was not used and why the parent
links don't use relative addressing. Perhaps pruning is considered a much more rare operation and the additional indirections will hurt the more common use cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO for another PR, for now I use the lighthouse implementation but as I mentioned
Protolambda has a variant that always prune at https://github.com/protolambda/eth2-py-hacks/blob/ae2865670dcb0427f10b0725b897cd2d7b887c9c/proto_array.py by making pruning cheaper. Instead of scanning the proto_array and substracting on a prune, it keeps the pruned index offset as a field and only substract on query.
It's also listed as a todo on top: https://github.com/status-im/nim-beacon-chain/blob/afad6485611f3ff3b98c385fe861f58c584d1ea3/beacon_chain/fork_choice/proto_array.nim#L256-L263
…e test DB and test timing
- raises: [Defect] doesn't work -> TODO - process_attestation cannot fail - try/except as expression pending Nim v1.2.0 - cleanup TODOs
Fork choice complete refactor.
This closes #719 but no EF test vectors yet (#777) see ethereum/consensus-spec-tests#17 for upstream discussion.
The implementation closely follows Lighthouse's at https://github.com/sigp/lighthouse/tree/869b0621d6f51e98473bb67cec0446d3623b8957/eth2/proto_array_fork_choice/src
Lighthouse implementation is derived from Protolambda's Array-based stateful DAG at https://github.com/protolambda/lmd-ghost
Fun fact, Protolambda's forked the fork at: https://github.com/protolambda/eth2-py-hacks/blob/ae2865670dcb0427f10b0725b897cd2d7b887c9c/proto_array.py
Other resources:
This highlighted 2 critical bugs
==
was sometimes wrong causing mayhem in fork choice. This might be the final solution to the finality issues we have.var openarray
seems unstable, uncommenting the debugEcho in the following spot make the fork choice compute_deltas test give the wrong results, this is only for the "isMainModule" sanity checks, the unit tests are not affected: https://github.com/status-im/nim-beacon-chain/blob/65889784fc7b8a65451f1e3b7f7f56ba1c529d8c/beacon_chain/fork_choice/fork_choice.nim#L182-L208TODO:
Note: The PR will be separated in 2 as this one is already 3000 lines of complex code.
This only add the fork choice and accompanying tests as a separate module.
It does not replace the current fork choice used in testnet.
This will be done in a second PR, that will update the attestation_pool https://github.com/status-im/nim-beacon-chain/blob/bd5400aea4f23a722c207826b21070db7966bee5/beacon_chain/attestation_pool.nim#L378-L440
Postponed / Not in scope:
should_update_justified_checkpoint
which protect against a class of attack called "bouncing attacks": https://github.com/ethereum/eth2.0-specs/blob/v0.11.1/specs/phase0/fork-choice.md#should_update_justified_checkpointTips for debugging in the future
The commit 9146bcd removed the
debugEcho
I found useful to debug fork choice. It can be locally reverted to restore them for debugging.