Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow execution clients several seconds to construct blocks #4012

Merged
merged 6 commits into from
Aug 23, 2022
Merged

Conversation

tersec
Copy link
Contributor

@tersec tersec commented Aug 22, 2022

Addresses #3952 and #3953

Copy link
Contributor

@etan-status etan-status left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes are mostly of best-effort nature and might be okay for 22.8, given that validator performance is tied to it. I don't have the metrics though to see how frequently this change actually would improve the situation.

proc runProposalForkchoiceUpdated(self: ref ConsensusManager) {.async.} =
withState(self.dag.headState):
let
nextSlot = state.data.slot + 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edge cases:

  • During sync
  • When the latest N slot were empty, e.g., we will propose wallSlot + 1, but dag.headState.slot < wallSlot

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wastes a bit of time on the EL locally, nothing else.

finalizedBlockRoot: finalizedBlockRoot,
feeRecipient: feeRecipient)
except CatchableError as err:
error "Engine API fork-choice update failed", err = err.msg
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strictVerification assert

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if ideal. Those are maybe best when it's verifying something truly Nimbus-internal, but this isn't. There are exogenous factors involved here, and strictVericiation, though strict, shouldn't trigger except on actual Nimbus logic errors fixable solely in Nimbus code.

if lastFcU.headBlockRoot == latestHead and
lastFcU.finalizedBlockRoot == latestFinalized and
lastFcU.feeRecipient == feeRecipient:
some bellatrix.PayloadID(lastFcU.payloadId)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How long do these payloadId stay valid? Considering we also have other fcu tasks scheduled concurrently that may select different heads.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few seconds at least, within a slot. That's fine though, if the head/finalized/fee recipient (i.e. fork choice parameters, though maybe prudent to add the rest, e.g., timestamp) mismatch, it just won't be used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

223f949 adds timestamp, so the validity length per se isn't that crucial. It's always within a slot this way, and that's definitely fine.

Comment on lines 357 to 358
proc getFeeRecipient(node: BeaconNode,
pubkey: ValidatorPubKey,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks correct, but it doesn't seem necessary to move this if slated for 22.8. It introduces quite some noise to this PR, also with the keymanagerHost moving.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, yeah, it was one of the issues I wondered about. For now, I agree, it's better to leave in place, reduce risk, though the duplicate logic should be removed eventually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

223f949 restores the validator_duties getFeeRecipient setup, except for adding a comment to remove it later.

Comment on lines +286 to +288
# TODO after things stabilize with this, check for upcoming proposal and
# don't bother sending first fcU, but initially, keep both in place
asyncSpawn self.runProposalForkchoiceUpdated()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is yet another forkChoiceUpdated in block_processor.nim (expectValidForkchoiceUpdated). The function here (updateHeadWithExecution) is only called when a block is processed with a payload that initially returned an inconclusive result (syncing/accepted). The other one is called on blocks with payloads that already have valid verdict. It seems a bit counterintuitive to only do this if there was any recent syncing/accepted block but not if there only werevalid block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has to wait until dag.updateHead runs, which happens after the bulk of the block_processor/consensus_manager logic, because that's when/where dag.headState is updated, which is necessary to get the RANDAO information from said state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f8b4984 runs it when newPayload is VALID

let proposer = dag.getProposer(dag.head, slot + 1)
if proposer.isNone():
return Opt.none((ValidatorIndex, ValidatorPubKey))
Opt.some((proposer.get, dag.validatorKey(proposer.get).get().toPubKey))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, dag.getProposer already validates that dag.validatorKey will return some result.

get_randao_mix(state.data, get_current_epoch(state.data)).data
feeRecipient = self.getFeeRecipient(
nextProposer, validatorIndex, nextSlot.epoch)
headBlockRoot = self.dag.loadExecutionBlockRoot(self.dag.head)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is called from an async context, dag.head may already point to a new head by this time. Around the merge, this could be a non-execution block with headBlockRoot.isZero, which is not documented as an allowed value for forkChoiceUpdated (in the runForkchoiceUpdated wrapper, it even asserts for that)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 400 to 402
if lastFcU.headBlockRoot == latestHead and
lastFcU.finalizedBlockRoot == latestFinalized and
lastFcU.feeRecipient == feeRecipient:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If latestHead ends up as zero (due to no terminalBlockHash being reached / pre-merge), this may be triggered unintentionally (as lastFcu was never initialized), and resolve to some default(PayloadId), so will run execution_engine.getPayload instead of using the pre-merge default(bellatrix.ExecutionPayload).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getExecutionPayload is only called in a merge context:

elif slot.epoch < node.dag.cfg.BELLATRIX_FORK_EPOCH or
not (
is_merge_transition_complete(proposalState.bellatrixData.data) or
((not node.eth1Monitor.isNil) and
node.eth1Monitor.terminalBlockHash.isSome)):
# https://github.com/nim-lang/Nim/issues/19802
(static(default(bellatrix.ExecutionPayload)))
else:
let
pubkey = node.dag.validatorKey(validator_index)
maybeExecutionPayload = (await getExecutionPayload(
node, proposalState,
slot.epoch, validator_index,
# TODO https://github.com/nim-lang/Nim/issues/19802
if pubkey.isSome: pubkey.get.toPubKey else: default(ValidatorPubKey)))
if maybeExecutionPayload.isNone:
warn "Unable to get execution payload. Skipping block proposal",
slot, validator_index
return ForkedBlockResult.err("Unable to get execution payload")
maybeExecutionPayload.get

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

223f949 (a) adds a timestamp check which should never accidentally match and (b) adds an explicit initialization check (which makes things uglier enough I'd typically avoid it, but it's more risk-conservative)

@tersec
Copy link
Contributor Author

tersec commented Aug 22, 2022

The changes are mostly of best-effort nature and might be okay for 22.8, given that validator performance is tied to it. I don't have the metrics though to see how frequently this change actually would improve the situation.

Tha's the intent, yes. It's not an optimal design from scratch.

@@ -34,20 +47,34 @@ type
# ----------------------------------------------------------------
eth1Monitor*: Eth1Monitor

# Allow determination of preferred fee recipient during proposals
# ----------------------------------------------------------------
dynamicFeeRecipientsStore: DynamicFeeRecipientsStore
Copy link
Contributor

@zah zah Aug 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Er, is this a second instance besides the one in the BeaconNode? This is not reasonable, because this object is configured through the REST API (see "/eth/v1/validator/prepare_beacon_proposer"). A separate instance would be useless unless hooked with that API end-point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.keymanagerHost[].getSuggestedFeeRecipient(pubkey).valueOr:
self.keymanagerHost[].defaultFeeRecipient
else:
self.keymanagerHost[].defaultFeeRecipient
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.keymanagerHost is nil here (see the check above). That's why the original definition of this function needed to access the value from the BeaconNode config.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@github-actions
Copy link

Unit Test Results

       12 files  ±0       864 suites  ±0   1h 25m 19s ⏱️ + 2m 53s
  1 988 tests ±0    1 841 ✔️ ±0  147 💤 ±0  0 ±0 
10 686 runs  ±0  10 496 ✔️ ±0  190 💤 ±0  0 ±0 

Results for commit 00ba2b2. ± Comparison against base commit f1ddcff.

@zah zah merged commit 1d55743 into unstable Aug 23, 2022
@zah zah deleted the pMb branch August 23, 2022 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants