Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ready 1/2] Fork choice rewrite #865

Merged
merged 22 commits into from
Apr 9, 2020
Merged

[Ready 1/2] Fork choice rewrite #865

merged 22 commits into from
Apr 9, 2020

Conversation

mratsim
Copy link
Contributor

@mratsim mratsim commented Apr 6, 2020

Fork choice complete refactor.

This closes #719 but no EF test vectors yet (#777) see ethereum/consensus-spec-tests#17 for upstream discussion.

The implementation closely follows Lighthouse's at https://github.com/sigp/lighthouse/tree/869b0621d6f51e98473bb67cec0446d3623b8957/eth2/proto_array_fork_choice/src

Lighthouse implementation is derived from Protolambda's Array-based stateful DAG at https://github.com/protolambda/lmd-ghost

Fun fact, Protolambda's forked the fork at: https://github.com/protolambda/eth2-py-hacks/blob/ae2865670dcb0427f10b0725b897cd2d7b887c9c/proto_array.py

Other resources:

This highlighted 2 critical bugs

  1. in Nimcrypto, fixed by bump nimcrypto: fix equality check of hashes #864. When nimcrypto.hash module was imported == was sometimes wrong causing mayhem in fork choice. This might be the final solution to the finality issues we have.
  2. var openarray seems unstable, uncommenting the debugEcho in the following spot make the fork choice compute_deltas test give the wrong results, this is only for the "isMainModule" sanity checks, the unit tests are not affected: https://github.com/status-im/nim-beacon-chain/blob/65889784fc7b8a65451f1e3b7f7f56ba1c529d8c/beacon_chain/fork_choice/fork_choice.nim#L182-L208

TODO:

  • - The current branch is littered with (commented out) debugEcho that have to be removed
  • - in a second PR - Replacing the old fork choice in the attestation_pool with the new one

Note: The PR will be separated in 2 as this one is already 3000 lines of complex code.
This only add the fork choice and accompanying tests as a separate module.
It does not replace the current fork choice used in testnet.

This will be done in a second PR, that will update the attestation_pool https://github.com/status-im/nim-beacon-chain/blob/bd5400aea4f23a722c207826b21070db7966bee5/beacon_chain/attestation_pool.nim#L378-L440

Postponed / Not in scope:

Tips for debugging in the future

The commit 9146bcd removed the debugEcho I found useful to debug fork choice. It can be locally reverted to restore them for debugging.

@mratsim mratsim changed the title WIP - Fork choice rewrite [Ready 1/2] Fork choice rewrite Apr 7, 2020
beacon_chain/fork_choice/fork_choice.nim Outdated Show resolved Hide resolved
beacon_chain/fork_choice/fork_choice.nim Outdated Show resolved Hide resolved
beacon_chain/fork_choice/fork_choice.nim Outdated Show resolved Hide resolved
beacon_chain/fork_choice/fork_choice.nim Outdated Show resolved Hide resolved
votes: var openArray[VoteTracker],
old_balances: openarray[Gwei],
new_balances: openarray[Gwei]
): ForkChoiceError {.raises: [KeyError].}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also returning a Result[void, ForkChoiceError] here will make the other code more convenient - you can then start using ? and other helpers making it more "regular" because there's a common language for "success" across the codebase.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's internal to fork choice and used in only a single place.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a useful habit once you start using these tools - let's you reuse the utilities developed for it an maintain consistency - but sure, doesn't really matter in private api

beacon_chain/fork_choice/fork_choice.nim Outdated Show resolved Hide resolved
@arnetheduck
Copy link
Member

since I found some places where errors were returned as ok (due to missing return), it would seem that error cases are not covered by the test suite - this is worth documenting at least - I stopped reviewing after I found a few, will make another pass when they're fixed - basically, everywhere that uses result.err should be changed to return result.err or functional style (which makes the compiler detect missing results)

beacon_chain/fork_choice/fork_choice.nim Outdated Show resolved Hide resolved
justified_epoch: Epoch,
finalized_epoch: Epoch,
finalized_root: Eth2Digest
): Result[ForkChoice, string] {.raises: [KeyError].} =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

beacon_chain/fork_choice/fork_choice.nim Outdated Show resolved Hide resolved
@mratsim mratsim added this to the Apr 2020 milestone Apr 7, 2020
beacon_chain/fork_choice/fork_choice.nim Outdated Show resolved Hide resolved
beacon_chain/fork_choice/fork_choice.nim Outdated Show resolved Hide resolved
# Sanity checks on internal private procedures

when isMainModule:
import stew/endians2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests of private procedures are relevant to the recent discussions regarding the module tests feature. @arnetheduck was insisting that our policy should mandate that only public procs should be tested and preferably from suites that import the tested modules.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's quite important to have those. Debugging the fork choice is quite complex and isolated sanity checks of the smaller part help having a more robust codebase. In particular those tests highlight a potential var openarray bug here: https://github.com/status-im/nim-beacon-chain/blob/65889784fc7b8a65451f1e3b7f7f56ba1c529d8c/beacon_chain/fork_choice/fork_choice.nim#L182-L208
that is not caught in the public API.

The public API is tested with the normal framework.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used to think testing private api was important too - but then I noticed that something is wrong almost every single time I feel that need - something is too complicated or not factored correctly - and it adds to the burden when refactoring, something I want to optimize for because refactoring is what makes the difference between an average codebase and an excellent one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to the code smell, it's also a sign that the testing of your public api is not comprehensive - it's along the same lines as when control-flow bugs happen - it means that there are gaps in the tests and failures are not being accounted for..

beacon_chain/fork_choice/proto_array.nim Outdated Show resolved Hide resolved
beacon_chain/fork_choice/proto_array.nim Outdated Show resolved Hide resolved
beacon_chain/fork_choice/proto_array.nim Outdated Show resolved Hide resolved
static: doAssert ProtoNode.supportsCopyMem(), "ProtoNode must be a trivial type"
let tail = self.nodes.len - finalized_index
# TODO: can we have an unallocated `self.nodes`? i.e. self.nodes[0] is nil
moveMem(self.nodes[0].addr, self.nodes[finalized_index].addr, tail * sizeof(ProtoNode))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The complicated logic here makes me wonder why the Deque types was not used and why the parent links don't use relative addressing. Perhaps pruning is considered a much more rare operation and the additional indirections will hurt the more common use cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO for another PR, for now I use the lighthouse implementation but as I mentioned

Protolambda has a variant that always prune at https://github.com/protolambda/eth2-py-hacks/blob/ae2865670dcb0427f10b0725b897cd2d7b887c9c/proto_array.py by making pruning cheaper. Instead of scanning the proto_array and substracting on a prune, it keeps the pruned index offset as a field and only substract on query.

It's also listed as a todo on top: https://github.com/status-im/nim-beacon-chain/blob/afad6485611f3ff3b98c385fe861f58c584d1ea3/beacon_chain/fork_choice/proto_array.nim#L256-L263

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants