-
Notifications
You must be signed in to change notification settings - Fork 231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement v1.2.0 optimistic sync tests #4174
Changes from 1 commit
28ede8f
bdad06b
313a35d
840d097
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1898,6 +1898,45 @@ proc updateHead*( | |
dag.finalizedHead.blck.root, stateRoot, dag.finalizedHead.slot.epoch) | ||
dag.onFinHappened(dag, data) | ||
|
||
proc getEarliestInvalidRoot*( | ||
dag: ChainDAGRef, initialSearchRoot: Eth2Digest, lvh: Eth2Digest, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
defaultEarliestInvalidRoot: Eth2Digest): Eth2Digest = | ||
# Earliest within a chain/fork in question, per LVH definition. Intended to | ||
# be called with `initialRoot` as the parent of the block regarding which a | ||
# newPayload or forkchoiceUpdated execution_status has been received as the | ||
# tests effectively require being able to access this before the BlockRef's | ||
# made. Therefore, to accommodate the EF consensus spec sync tests, and the | ||
# possibilities that the LVH might be an immediate parent or a more distant | ||
# ancestor special-case handling of an earliest invalid root as potentially | ||
# not being from this function's search, but being provided as a default by | ||
# the caller with access to the block. | ||
var curBlck = dag.getBlockRef(initialSearchRoot).valueOr: | ||
# Being asked to traverse a chain which the DAG doesn't know about -- but | ||
# that'd imply the block's otherwise invalid for CL as well as EL. | ||
return static(default(Eth2Digest)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, would it be correct to return If yes, the return type could be changed to |
||
|
||
# Only allow this special case outside loop; it's when the LVH is the direct | ||
# parent of the reported invalid block | ||
if curBlck.executionBlockRoot.isSome and | ||
curBlck.executionBlockRoot.get == lvh: | ||
return defaultEarliestInvalidRoot | ||
|
||
while true: | ||
# This was supposed to have been either caught by the pre-loop check or the | ||
# parent check. | ||
if curBlck.executionBlockRoot.isSome and | ||
curBlck.executionBlockRoot.get == lvh: | ||
doAssert false, "getEarliestInvalidRoot: unexpected LVH in loop body" | ||
|
||
if (curBlck.parent.isNil) or | ||
curBlck.parent.executionBlockRoot.isNone or | ||
(curBlck.parent.executionBlockRoot.isSome and | ||
curBlck.parent.executionBlockRoot.get == lvh): | ||
tersec marked this conversation as resolved.
Show resolved
Hide resolved
|
||
break | ||
curBlck = curBlck.parent | ||
|
||
curBlck.root | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is not necessarily correct. When someone switches from a different CL to Nimbus but retains their EL, the EL's If There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, in extended periods of non-finality, the linear scan may become expensive if performed repeatedly. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In theory, latest-known In particular, https://github.com/ethereum/execution-apis/blob/main/src/engine/specification.md#payloadstatusv1 defines:
This also addresses the concern about the EL's As far as cost in general, this should be a relatively one-time thing per invalid branch -- once invalidated, it shouldn't be doing that backwards search again, so potentially better to do it once than multiple times, along the lines of how pruning huge swaths of finalized blocks after a long-unfinalizing network finalizes is expensive, but one-time. Doing halfway versions introduces less-well-defined state and doesn't necessarily save time overall, in a throughput sense. There is an issue here where the quarantine unviables are sometimes checked only as direct parents rather than ancestors, which is an argument for not using the LVH-based search for adding to quarantine. The other aspect is that it should rediscover all of this by just asking the EL, if it misses something. So everything here should be considered a cache. For the moment, this is all only used in tests, so another approach is to move this out from While the initial intent, and still one that seems broadly ideal, is to maximally match test behavior with non-test behavior (otherwise, what is the test testing?), just the status quo here is that this LVH infrastructure is test-only, so it can/should reflect that. |
||
|
||
proc isInitialized*(T: type ChainDAGRef, db: BeaconChainDB): Result[void, cstring] = | ||
# Lightweight check to see if we have the minimal information needed to | ||
# load up a database - we don't check head here - if something is wrong with | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -167,7 +167,7 @@ func setOptimisticHead*( | |
proc runForkchoiceUpdated*( | ||
eth1Monitor: Eth1Monitor, | ||
headBlockRoot, safeBlockRoot, finalizedBlockRoot: Eth2Digest): | ||
Future[PayloadExecutionStatus] {.async.} = | ||
Future[(PayloadExecutionStatus, Option[BlockHash])] {.async.} = | ||
# Allow finalizedBlockRoot to be 0 to avoid sync deadlocks. | ||
# | ||
# https://github.com/ethereum/EIPs/blob/master/EIPS/eip-3675.md#pos-events | ||
|
@@ -199,11 +199,11 @@ proc runForkchoiceUpdated*( | |
latestValidHash = $fcuR.payloadStatus.latestValidHash, | ||
validationError = $fcuR.payloadStatus.validationError | ||
|
||
return fcuR.payloadStatus.status | ||
return (fcuR.payloadStatus.status, fcuR.payloadStatus.latestValidHash) | ||
except CatchableError as err: | ||
error "runForkchoiceUpdated: forkchoiceUpdated failed", | ||
err = err.msg | ||
return PayloadExecutionStatus.syncing | ||
return (PayloadExecutionStatus.syncing, none BlockHash) | ||
|
||
proc runForkchoiceUpdatedDiscardResult*( | ||
eth1Monitor: Eth1Monitor, | ||
|
@@ -228,17 +228,31 @@ proc updateExecutionClientHead( | |
return Opt[void].ok() | ||
|
||
# Can't use dag.head here because it hasn't been updated yet | ||
let payloadExecutionStatus = await self.eth1Monitor.runForkchoiceUpdated( | ||
headExecutionPayloadHash, | ||
newHead.safeExecutionPayloadHash, | ||
newHead.finalizedExecutionPayloadHash) | ||
let (payloadExecutionStatus, latestValidHash) = | ||
await self.eth1Monitor.runForkchoiceUpdated( | ||
headExecutionPayloadHash, | ||
newHead.safeExecutionPayloadHash, | ||
newHead.finalizedExecutionPayloadHash) | ||
|
||
case payloadExecutionStatus | ||
of PayloadExecutionStatus.valid: | ||
self.dag.markBlockVerified(self.quarantine[], newHead.blck.root) | ||
of PayloadExecutionStatus.invalid, PayloadExecutionStatus.invalid_block_hash: | ||
# This is a CL root, not EL hash | ||
let earliestKnownInvalidRoot = | ||
if latestValidHash.isSome: | ||
self.dag.getEarliestInvalidRoot( | ||
newHead.blck.root, latestValidHash.get.asEth2Digest, | ||
newHead.blck.root) | ||
else: | ||
newHead.blck.root | ||
|
||
self.attestationPool[].forkChoice.mark_root_invalid( | ||
earliestKnownInvalidRoot) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm. Not sure whether this is correct in situations where the EL is restarted or changed to a different EL while Nimbus is running. Or, multi-EL scenarios which may not have the same sync progress. The previous mechanism of only marking the specific block invalid, and then re-selecting a new head, doing fcu to it and then marking it invalid/valid one at a time seems more robust, in general, while not losing correctness (just a bit of latency), especially when considering how persistent an We could shortcut the recovery mechanism though, by using the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm fine with keeping the previous logic in place (plus The previous mechanism still exists, too, and it's necessary in intial optimistic sync, where there can't be an LVH because there hasn't yet been a Intermittent EL I'm not sure exactly what the multi-EL scenario will look like, but if they disagree with each other on this, well, that's not going to work out well. But, I'm fine with keeping this unchanged in this PR and exploring this in a future PR, while the LVH handling remains in place to pass the EF optimistic sync tests. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 313a35d reverts the LVH parts, while still adding |
||
self.dag.markBlockInvalid(newHead.blck.root) | ||
self.dag.markBlockInvalid(earliestKnownInvalidRoot) | ||
self.quarantine[].addUnviable(newHead.blck.root) | ||
self.quarantine[].addUnviable(earliestKnownInvalidRoot) | ||
return Opt.none(void) | ||
of PayloadExecutionStatus.accepted, PayloadExecutionStatus.syncing: | ||
self.dag.optimisticRoots.incl newHead.blck.root | ||
|
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -442,6 +442,7 @@ func mark_root_invalid*(self: var ForkChoice, root: Eth2Digest) = | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
self.backend.proto_array.nodes.offset | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
if nodePhysicalIdx < self.backend.proto_array.nodes.buf.len: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
self.backend.proto_array.nodes.buf[nodePhysicalIdx].invalid = true | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
self.backend.proto_array.propagateInvalidity(nodePhysicalIdx) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shouldn't this already be covered for purpose of scoring / selecting heads? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The updating is done in nimbus-eth2/beacon_chain/fork_choice/proto_array.nim Lines 424 to 451 in ad286b9
Which is called by nimbus-eth2/beacon_chain/fork_choice/proto_array.nim Lines 270 to 282 in ad286b9
Which is correct for the scores as such -- it allows quick updates. However, invalidity propagates the other direction, from ancestors to descendants, unlike the scores, and there's no existing mechanism in fork choice to propagate information from ancestors to descendants. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
# Best-effort; attempts to mark unknown roots invalid harmlessly ignored | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
except KeyError: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
discard | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -558,6 +558,28 @@ func nodeIsViableForHead(self: ProtoArray, node: ProtoNode): bool = | |
(self.checkpoints.finalized.epoch == GENESIS_EPOCH) | ||
) | ||
|
||
func propagateInvalidity*( | ||
self: var ProtoArray, startPhysicalIdx: Index) = | ||
# Called when startPhysicalIdx is updated in a parent role, so the pairs of | ||
# indices generated of (parent, child) where both >= startPhysicalIdx, mean | ||
# the loop in general from the child's perspective starts one index higher. | ||
for nodePhysicalIdx in startPhysicalIdx + 1 ..< self.nodes.len: | ||
let nodeParent = self.nodes.buf[nodePhysicalIdx].parent | ||
if nodeParent.isNone: | ||
continue | ||
|
||
let | ||
parentLogicalIdx = nodeParent.unsafeGet() | ||
parentPhysicalIdx = parentLogicalIdx - self.nodes.offset | ||
|
||
# Former case is orphaned, latter is invalid, but caught in score updates | ||
if parentPhysicalIdx < 0 or parentPhysicalIdx >= self.nodes.len: | ||
continue | ||
|
||
# Invalidity transmits to all descendents | ||
if self.nodes.buf[parentPhysicalIdx].invalid: | ||
self.nodes.buf[nodePhysicalIdx].invalid = true | ||
Comment on lines
+580
to
+581
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What about new descendants of an invalid block that are being added after There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In theory, they shouldn't be added in the first place, as they'd be rejected as unviable by the DAG/quarantine. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The other fallback here is that they'll ask the EL again if someone does try to build on them, and get that block instead. Since by then things should have mostly caught up to head, it's more likely to be the only remaining set of blocks on which building is still happening, and therefore should iteratively converge. |
||
|
||
# Diagnostics | ||
# ---------------------------------------------------------------------- | ||
# Helpers to dump internal state | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getEarliestInvalidHashBlockRoot
orgetDescendantOfLatestValidHash
would be clearer. Or, at the very least,getEarliestInvalidBlockRoot
to remove ambiguity with other roots e.g. state rootThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd want to use
Root
in some capacity here, because while it's dealing with the LVH (a hash, from the EL), what it's returning is firmly in CL-land, an SSZ root.HashBlockRoot
is a bit redundant in that sense, and confusing the boundary between EL-"hash" and CL-"root".But yes, state vs block root is a useful distinction to make, so
getEarliestInvalidBlockRoot
is a good change.840d097