-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Dynamic Protocol State] EpochStateContainer
stores epoch active identities
#4834
[Dynamic Protocol State] EpochStateContainer
stores epoch active identities
#4834
Conversation
…dentities to contain only epoch related identities
…hub.com/onflow/flow-go into yurii/4649-prev-epoch-refactoring-attempt
• revised goDoc of protocol state implementation • changed `ProtocolStateEntry.PreviousEpoch` to pointer (consistent with `ProtocolStateEntry.NextEpoch`) • There should be no significant algorithmic changes. Though, I have switched the order of the if and else branches when processing an Epoch Setup event. For me, this is much more in line with the intuitive order of documentation. state/protocol/protocol_state/updater.go: • notable updates and revisions of goDoc • tried to address concerns around inconsistent handling of invariances • updated code to work with `ProtocolStateEntry.PreviousEpoch` being potentially a nil pointer storage/badger/protocol_state_test.go: • detailed goDoc revisions for test `assertRichProtocolStateValidity`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This greatly improved code readability and mental structure for me. Amazing work. 👏
There are minor some concerns about a couple places in the code. I have created PR #4854 (targeting your branch) that addresses the first of my concerns but not the second one. Furthermore, the PR updated, polishes and extends a lot of goDocs.
@@ -187,17 +150,20 @@ func (u *Updater) ProcessEpochCommit(epochCommit *flow.EpochCommit) error { | |||
// No errors are expected during normal operations. | |||
func (u *Updater) UpdateIdentity(updated *flow.DynamicIdentityEntry) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚠️ ❓
I have a hard time convincing myself that this code is correct. Just as a brief sketch of my thoughts:
at the point when we are leaving an epoch, the Identity Table (EpochStateContainer.ActiveIdentities
) essentially snapshots the latest state when these nodes were active.In my opinion, changes of the dynamic identities essentially should only carry forward (causality) but not backwards.Hence,ProtocolStateEntry.PreviousEpoch
should never be modified. But we do here:flow-go/state/protocol/protocol_state/updater.go
Lines 153 to 156 in e765750
prevEpochIdentity, foundInPrev := u.prevEpochIdentitiesLookup[updated.NodeID] if foundInPrev { prevEpochIdentity.Dynamic = updated.Dynamic }
- changing dynamic properties of node X affects the current and next epoch. However, as the weight might change at epoch boundaries, we cannot simply overwrite the weight for the current and next epoch by the same value. I think the following code is an incorrect simplification:
flow-go/state/protocol/protocol_state/updater.go
Lines 157 to 164 in e765750
currentEpochIdentity, foundInCurrent := u.currentEpochIdentitiesLookup[updated.NodeID] if foundInCurrent { currentEpochIdentity.Dynamic = updated.Dynamic } nextEpochIdentity, foundInNext := u.nextEpochIdentitiesLookup[updated.NodeID] if foundInNext { nextEpochIdentity.Dynamic = updated.Dynamic }
This is not addressed in my PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand we will address it in subsequent PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update:
-
We discussed this point and decided that we really should continue to modify the identities also from past epochs.
- We are essentially creating a snapshot of the identity table as of a certain block. By modifying an identity, we only carry this change forward.
- Furthermore, we need to be able to modify "our current value" for a node X that participated in the last epoch but not in the current. Otherwise, we would not be able to eject X in the current epoch despite it misbehaving.
Conclusion: current approach is correct.
-
This will be addressed in a subsequent PR: we will replace (dynamic) weight by an enum representing the participation state (joining, active, leaving)
- Currently, we maintain a dynamic weight but only differentiate between zero weight (not active) and positive (active).
- It is not clear whether the protocol would benefit from dynamically changing trust weight in the future. Reasoning:
-
For consensus participants (consensus nodes, collector nodes), we anyway use their initial weight. The practical benefits are massive, because this enables light clients that don't need to download and locally check every block header.
Leader selection is also pre-determined for an entire epoch based on initial weight.
-
For other Verification [VNs] and Execution [ENs] nodes we currently don't have meaningful ways for weight-based load balancing. ENs anyway need to execute every block and our chunk-assignment algorithm generates uniform load across all VNs. It is not clear whether weighted protocol variants even exist (and even if such algorithms exist [open research topic], a large amount of complicated software changes would likely be needed).
-
Reworked TODO in [Dynamic Protocol State] Remaining work and ToDos #4649 (second item in
High priority
) to reflect our latest approach to this challenge
…-attempt_-_suggestions
// - Per convention, the system smart contracts should list the IdentitySkeletons in canonical order. This is useful for | ||
// most efficient construction of the full active Identities for an epoch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The system contract does not list identities in canonical order. Conceptually, the identity table is an unordered set. We assume it is ordered a particular way at various points in the flow-go code to simplify various implementation details, like the DKG.
We enforce that the identity list used within flow-go is ordered by ordering it during the conversion from cadence to Go:
flow-go/model/convert/service_event.go
Line 638 in 51a572b
participants = participants.Sort(order.Canonical) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we update our smart contract to do this? Either way we would need to work on it in future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should have the smart contract do sorting. It's a resource-constrained environment and it is not critical that the set be sorted within the system contract.
It seems fine to me for the protocol layer to translate the event into a representation that is easier for it to work with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Jordan for flagging that my comment in the code was incorrect. I have updated it in 49934ce to read:
flow-go/state/protocol/protocol_state/updater.go
Lines 92 to 98 in 49934ce
// sanity checking SAFETY-CRITICAL INVARIANT (II): | |
// - Per convention, the `flow.EpochSetup` event should list the IdentitySkeletons in canonical order. This is useful | |
// for most efficient construction of the full active Identities for an epoch. We enforce this here at the gateway | |
// to the protocol state, when we incorporate new information from the EpochSetup event. | |
// - Note that the system smart contracts manage the identity table as an unordered set! For the protocol state, we desire a fixed | |
// ordering to simplify various implementation details, like the DKG. Therefore, we order identities in `flow.EpochSetup` during | |
// conversion from cadence to Go in the function `convert.ServiceEvent(flow.ChainID, flow.Event)` in package `model/convert` |
Thereby, the documentation precisely describes the current state, which I think is all that is needed for this PR (?)
My thoughts on where ordering could be implemented in the future
-
On the one hand, I see (weak) benefits of not having an ordering requirement in the system smart contracts. Conceptually, a set is sufficient for the current functionality of the system smart contracts. Not requiring an ordering is one less details that engineers have know about and that could accidentally be broken.
For similar reasons, a block proposer is not required to include seals in an ordered manner. Instead, we re-order them when ingesting the block, which improves modularity of the code by removing inter-dependencies.
-
On the other hand, in the future, the system smart contracts will also need to have the canonical ordering. This is because we want to eventually implement performance-dependent rewards for consensus nodes, which in turn requires a smart contract to decode the signer indices for blocks to determine which node contributed to a QC (the basis for the subsequent reward payout at the end of the epoch). To decode the signer indices, the smart contract needs to know the ordering.
- We then have two choices: (i) the system smart contract maintains an un-ordered set and we re-order the identities at every block (a huge waste of computation) or (ii) the system smart maintains the identities in canonical ordering
- Btw, note that to decode QC signer identities, we also need a notion of identities for each epoch. In other words, we would no longer be able top the current epoch's identity table at the end of the staking phase 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
requires a smart contract to decode the signer indices for blocks to determine which node contributed to a QC
Or, we could implement an API on the injected Block
Cadence type that does this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could implement an API on the injected
Block
Cadence type that does this?
We could. At which point to we change the publisher of the data (system smart contract), vs layering on logic on the consumers (protocol state, dynamic reward's logic)? If we have only one consumer, I don't feel it makes much of a difference for the implementation because the ordering has to live either on the consumer or the publisher -- but putting the ordering into the consumer improves modularity. With two consumers, we are starting to duplicate the ordering logic. To me, that would be the point, where I think it would be more beneficial to put the ordering in the publisher.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the insightful discussion on this point. I feel for the scope of this PR, the comment is addressed (?)
} | ||
} | ||
currentEpochParticipants := entry.CurrentEpochIdentityTable.Filter(func(identity *flow.Identity) bool { | ||
_, found := entry.CurrentEpochSetup.Participants.ByNodeID(identity.NodeID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May not matter for a fixture, but ByNodeID
does linear search so this is going to be
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but I feel it's more expressive and it's worth paying the cost in such case(especially in tests)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, ignoring our discussion about removing Weight
altogether from this morning, which we'll address in a separate PR.
The one change I'd like to make sure we include is using sentinels to denote when a service event is invalid: InvalidServiceEventError. That way we can handle that failure path in the upper-level logic later on.
…och-refactoring-attempt
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## feature/dynamic-protocol-state #4834 +/- ##
==================================================================
+ Coverage 49.69% 55.83% +6.13%
==================================================================
Files 480 946 +466
Lines 47751 87792 +40041
==================================================================
+ Hits 23729 49017 +25288
- Misses 22253 35092 +12839
- Partials 1769 3683 +1914
Flags with carried forward coverage won't be shown. Click here to find out more.
☔ View full report in Codecov by Sentry. |
Co-authored-by: Jordan Schalm <jordan@dapperlabs.com>
…mit. Updated tests, error handling, docs
…ithub.com/onflow/flow-go into yurii/4649-prev-epoch-refactoring-attempt
… ordering assumptions in the system smart contracts vs the core protocol layer.
if protocol.IsInvalidServiceEventError(err) { | ||
// we have observed an invalid service event, which triggers epoch fallback mode | ||
updater.SetInvalidStateTransitionAttempted() | ||
return dbUpdates, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚠️
I am a little worried about this:
At this point, we don't know that the updater
's internal state is a valid Protocol State. For all we know, it could be complete garbage. There is no atomicity requirement on the StateUpdater
as far as I can tell, in that it either applies the update entirely or not at all.
Suggestion:
- I think it it is ok to treat all protocol state operations mutations in a block as one atomic state transition. It either succeeds in its entirety or in case of a failure, there is no update to the protocol state (except setting
InvalidStateTransitionAttempted
totrue
). - I think it is straight forward to extend the protocol state
Updater
to "drop all modifications ifInvalidStateTransitionAttempted
" and document this properly throughout the code base. In a nutshell, within theBuild
method I would check thestate.InvalidStateTransitionAttempted
first; in that case we could just return a copy of theparentState
with only thestate.InvalidStateTransitionAttempted
set to true. - While the code changes are relatively small, we need to properly test this and add detailed documentation to the
Updater
, the corresponding interface, and ideally also the places where we call theUpdater
Further thoughts:
- If the updater is signalling that it encountered an invalid state transition via an
InvalidStateTransitionAttempted
, should the updater maybe already set its ownInvalidStateTransitionAttempted
flag? We would still raise the error to signal the failure to the caller. However, the caller would no longer be required to explicitly callupdater.SetInvalidStateTransitionAttempted()
(I'd leave that method, to allowUpdater
external logic to set this flag for other reasons).
I think we should not do this as part of the current PR. The PR is already big enough and we keep layering on changes which makes it hard to keep track of without re-reviewing the entire PR. Added todo to #4649
…`Updater.ProcessEpochSetup()` such that the documentation and implementation are in the same order
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this PR is ready to be merged. Great work!
There are a few remaining comments, but I think those should all be addressed in subsequent PRs. I went through the PR comments that are still open and added the respective items as todos to #4649
if u.state.InvalidStateTransitionAttempted { | ||
return nil // won't process new events if we are in EECC | ||
return nil // won't process new events if we are in epoch fallback mode. | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
regarding these checks:
flow-go/state/protocol/protocol_state/updater.go
Lines 210 to 212 in 49934ce
if u.state.InvalidStateTransitionAttempted { return fmt.Errorf("invalid state transition has been attempted, no transition is allowed") } flow-go/state/protocol/protocol_state/updater.go
Lines 70 to 72 in 49934ce
if u.state.InvalidStateTransitionAttempted { return nil // won't process new events if we are in epoch fallback mode. } flow-go/state/protocol/protocol_state/updater.go
Lines 156 to 158 in 49934ce
if u.state.InvalidStateTransitionAttempted { return nil // won't process new events if we are going to enter epoch fallback mode }
I would be inclined to move these check to the very beginning of the respective methods. Otherwise, we would probably continue encountering errors after we once set InvalidStateTransitionAttempted
, because later events will not make sense anymore as the protocol state stoped updating.
// sanity checking invariant (I): | ||
currentEpochDynamicProperties, found := activeIdentitiesLookup[nextEpochIdentitySkeleton.NodeID] | ||
if found && currentEpochDynamicProperties.Dynamic.Ejected { // invariance violated | ||
return protocol.NewInvalidServiceEventErrorf("node %v is ejected in current epoch %d but readmitted by EpochSetup event for epoch %d", nextEpochIdentitySkeleton.NodeID, u.parentState.CurrentEpochSetup.Counter, epochSetup.Counter) | ||
} | ||
|
||
nextEpochIdentities := make(flow.DynamicIdentityEntryList, 0, len(currentEpochIdentities)) | ||
currentEpochIdentitiesLookup := currentEpochIdentities.Lookup() | ||
// For an `identity` participating in the upcoming epoch, we effectively perform steps 2 and 3 from above within a single loop. | ||
for _, identity := range epochSetup.Participants { | ||
// Step 2: node is _not_ participating in the current epoch, but joining in the upcoming epoch. | ||
// The node is allowed to join the network already in this epoch's Setup Phase, but has weight 0. | ||
if _, found := currentEpochIdentitiesLookup[identity.NodeID]; !found { | ||
currentEpochIdentities = append(currentEpochIdentities, &flow.DynamicIdentityEntry{ | ||
NodeID: identity.NodeID, | ||
Dynamic: flow.DynamicIdentity{ | ||
Weight: 0, | ||
Ejected: false, | ||
}, | ||
}) | ||
// sanity checking invariant (II): | ||
if idx > 0 && !order.IdentifierCanonical(prevNodeID, nextEpochIdentitySkeleton.NodeID) { | ||
return protocol.NewInvalidServiceEventErrorf("epoch setup event lists active participants not in canonical ordering") | ||
} | ||
prevNodeID = nextEpochIdentitySkeleton.NodeID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took the liberty to reorder these and the respective in-code documentation above such that the documentation and implementation are in the same order 👉 c638874
Context
This PR is an attempt to change
ActiveIdentities
to store only participants that can contribute to extending the chain instead of storing participants for current and prev/next epoch depending on phase.Previously
ActiveIdentities
were storing current + prev epoch participants in staking phase and current + next in setup/commit phases. By storing identities for previous epoch it allows us to store only identities that were introduced in current epoch as part ofActiveIdentities
. In nutshellActiveIdentities
holds dynamic part of identities that were introduced in epoch setup event.I will be updating docs and tests but don't expect any major changes to the implementation itself, so feel free to review.