-
-
Notifications
You must be signed in to change notification settings - Fork 290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement support for validator next-epoch proposer duties #3782
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3782 +/- ##
==========================================
+ Coverage 36.26% 36.49% +0.22%
==========================================
Files 324 326 +2
Lines 9099 9886 +787
Branches 1465 1693 +228
==========================================
+ Hits 3300 3608 +308
- Misses 5626 6072 +446
- Partials 173 206 +33 |
Performance Report✔️ no performance regression detected Full benchmark results
|
@@ -58,6 +59,7 @@ const SYNC_TOLERANCE_EPOCHS = 1; | |||
*/ | |||
export function getValidatorApi({chain, config, logger, metrics, network, sync}: ApiModules): routes.validator.Api { | |||
let genesisBlockRoot: Root | null = null; | |||
const nextEpochProposerDutyCache = new Map<Epoch, ValidatorIndex[]>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keeping caches here is not a good separation of concerns. We must keep all caches attached to the BeaconChain class, this functions should be state-less
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is the best architecture. The result futureProposers
is dependant on that specific state. If you just cache by epoch you are committing to a "head" that may change latter. Instead this result of future proposers should be cached in the epoch context, and computed lazily on demand. @tuyennhv What do you think?
I agree. We shouldn't attach the cache to the chain directly, because a reorg can make the proposer cache invalid, and that's currently not handled here. Caching at the level of the beacon state (in our epoch-long cache object, "epoch contect") would be the way to go 👍 |
Since the result of next-epoch proposer duties computation is not final at all and it's only consumed by Rocket Pool, initially I didn't want to cache in beacon-state-transition because we may make it confused to cache both current and next epoch's proposers. @dadepo if the direction is to cache in beacon-state-transition, I guess we just need to name it carefully and make a comment that it's not the final result of next epoch's proposers, at each epoch transition we still need to compute proposers for current epoch again |
@@ -400,6 +417,7 @@ export class EpochContext { | |||
const nextEpoch = currEpoch + 1; | |||
this.nextShuffling = computeEpochShuffling(state, epochProcess.nextEpochShufflingActiveValidatorIndices, nextEpoch); | |||
this.proposers = computeProposers(state, this.currentShuffling, this.effectiveBalanceIncrements); | |||
this.nextEpochProposers = computeProposers(state, this.nextShuffling, this.effectiveBalanceIncrements); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should only calculate this on demand (when the api call) and cache it, instead of on every epoch transition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually thought about that also, but decided to go with this implementation due to its simplicity. (No need for code to only generate this after the first call, and then code to purge previous values when a new epoch is reached) It also does not introduce any performance regression, that is why I went with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually thought about that also, but decided to go with this implementation due to its simplicity
Should not be complicated at all to implement. Just type as T | null and compute if requested and null.
It also does not introduce any performance regression, that is why I went with it.
Our performance CI only rejects x3 fn run time changes. computeProposers() has a cost of ~100ms over the state transition which takes 1000ms, so a ballpark 10% increase aprox.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should not be complicated at all to implement. Just type as T | null and compute if requested and null.
The complexity is not about the typing though. It's more about passing CachedBeaconStateAllForks
on the http requests to computeProposers
- something which is already handled in the transition phase. Also adding a prune call to remove past caches - something which is not needed with this approach as current and next proposers are always refreshed on state transition.
Our performance CI only rejects x3 fn run time changes. computeProposers() has a cost of ~100ms over the state transition which takes 1000ms, so a ballpark 10% increase aprox.
Okay. Avoiding a ~10% performance hit not a bad thing. I'll update to have it computed on demand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Together with @tuyennhv we looked at how to get the state needed by computeProposers
which makes implementing the suggestion to compute on demand possible as can be seen in this commit here.
A couple of observation where I think this approach makes the code less straightforward:
getNextEpochBeaconProposer
has to be moved toBeaconStateContext
in other to have access to theallForks.BeaconState
needed forcomputeProposers
(while the other similar methodgetBeaconProposer
remains onEpochContext
)- A
getState
has to be added toBeaconStateContext
to get value ofallForks.BeaconState
- Even though
getNextEpochBeaconProposer
is onBeaconStateContext
it still has to reach toEpochContext
to update the cache
-getBeaconProposer
method now need to have knowledge of the previous proposer cache since there is now the need to clear the previousnextEpochProposers
in the cache on request for current proposer.
I guess at the end of the day it boils down to whether this is okay and better than incurring the ~10% decrease in performace - which I am fine with not incurring.
Also open to other suggestions that could simply the compute at demand approach.
@g11tech Clone nethermind interop branch Sim merge tests step failed, can you take a look? |
…l need optimization
|
@@ -150,6 +163,9 @@ export class EpochContext { | |||
epoch: Epoch; | |||
syncPeriod: SyncPeriod; | |||
|
|||
currentProposerSeed: Uint8Array; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does needs to be persisted? What's the advantage of keeping this around?
@@ -150,6 +163,9 @@ export class EpochContext { | |||
epoch: Epoch; | |||
syncPeriod: SyncPeriod; | |||
|
|||
currentProposerSeed: Uint8Array; | |||
nextProposerSeed: Uint8Array; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, what's the advantage of keeping this around? Why not compute on demand and discard after caching nextEpochProposers
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue in that, is related to what you observed in the comment here
getNextEpochBeaconProposer should be in the epoch context and use only data available there immutable during an epoch
Computing the seed via getSeed
method, requires the state which makes it impossible to compute using data available in the context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then please add comments 🙏 all non-obvious decisions must be rationalized in committed comments
packages/beacon-state-transition/test/perf/allForks/util/shufflings.test.ts
Outdated
Show resolved
Hide resolved
* balances are sampled to adjust the probability of the next selection (32 per epoch on average). So to invalidate | ||
* the prediction the effective of one of those 32 samples should change and change the random_byte inequality. | ||
*/ | ||
getBeaconProposersNextEpoch(): ValidatorIndex[] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a rationale on when and why the proposer prediction stands or is invalidated
this.effectiveBalanceIncrements | ||
); | ||
this.proposersNextEpoch = {computed: true, indexes}; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now computation of proposers is deferred AND cached when computed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dapplion this won't cache though, and on every request to get next proposers, the this.proposersNextEpoch.computed
check here will always return false, and the if
branch check always run.
You can confirm this by running the application, and trigger the end point multiple times, the if branch will always run.
I noticed this while working on the implementation and I think the reason why this is the case is how the chain.getHeadStateAtCurrentEpoch
here works, but I did not go ahead with caching when I realised that the request to the endpoint itself was actually pretty fast and no need to even cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed this while working on the implementation and I think the reason why this is the case is how the chain.getHeadStateAtCurrentEpoch here works
If this is true, then it's a big issue! Can you open a dedicated issue for it? It should be investigated latter
I did not go ahead with caching when I realised that the request to the endpoint itself was actually pretty fast and no need to even cache.
This code runs on the main thread and according to benchmarks it takes ~100ms. That's still a significant amount of time to block the main thread and repeated computations must be avoided if possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And please link back to this context after opening a new dedicated issue. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* New metric filtering missed blocks (#3927) * Log block delay second * Add elappsedTimeTillBecomeHead metric * Add 'till become head' metric to dashboard * chore: correct the metric name to elapsedTimeTillBecomeHead * Add and use secFromSlot to clock * Track block source * Revert "Track block source" This reverts commit 5fe6220. * Update bucket values * Limit how old blocks are tracked in elapsedTimeTillBecomeHead * Simplify secFromSlot Co-authored-by: dapplion <35266934+dapplion@users.noreply.github.com> * Fix the terminal validations of the merge block (#3984) * Fix the terminal validations of the merge block * activate merge transition block spec tests * some comments to explain the merge block validations movement * Extend error messages when voluntary exit errors because of present of lockfile (#3935) * Extend error and Clean up * Only showing the message to use --force to override in case of voluntary exit * Simplify gitData and version guessing (#3992) Don't print double slash in version string Dont add git-data.json to NPM releases Write git-data.json only in from source docker build Remove numCommits Test git-data.json generation from within the test Move comment Revert "Dont add git-data.json to NPM releases" This reverts commit 5fe2d38. Simplify gitData and version guessing Run cmd * Activate ex-ante fork-choice spec tests (#4003) * Prepare custom version on next release (#3990) * Prepare custom version on next release * Test in branch * Don't set version in advance * Remove --canary flag * Change and commit version * Setup git config * Revert temp changes * Lightclient e2e: increase validator client (#4006) * Bump to v0.37.0 nightly builds (#4013) * Guarantee full spec tests coverage (#4012) * Ensure all spec tests are run * Fix general bls tests * Improve docs of specTestIterator * Fix fork_choice tests * Remove Check spec tests step * Add merge transition/finalization banners (#3963) * Add merge transition/finalization banners * fix signatures * Benchmark initial sync (#3995) * Basic range sync perf test * Benchmark initial sync * Add INFURA_ETH2_CREDENTIALS to benchmark GA * Download test cache file from alternative source * Re-org beforeValue and testCase helpers * Break light-client - state-transition test dependency * Revert adding downloadTestCacheFile * Download files from a Github release * Clarify #3977 with unbounded uint issue (#4018) * Update mainnet-shadow-5 configs (#4021) * Bump moment from 2.29.1 to 2.29.2 (#3901) Bumps [moment](https://github.com/moment/moment) from 2.29.1 to 2.29.2. - [Release notes](https://github.com/moment/moment/releases) - [Changelog](https://github.com/moment/moment/blob/develop/CHANGELOG.md) - [Commits](moment/moment@2.29.1...2.29.2) --- updated-dependencies: - dependency-name: moment dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Implement support for validator next-epoch proposer duties (#3782) * Implementation to be able to get block proposer an epoch ahead - still need optimization * revert changes made to waitForSlot * caching the results of computing future proposers. Also extended test * using effectiveBalanceIncrements from state instead of recomputing it * fix lint errors * revert check not needed in getBeaconProposer * Update tests to include assertion messages * Move caching of next proposer duties to BeaconChain class * Delete the block proposer previously cached when next proposer was requested at current epoch * moved next epoch proposers from the chain to the state * Compute next proposer on demand and cache * Fix lint errors * update implementation to work with changes from master * caching epoch seed in context so that getNextEpochBeaconProposer can be independent of state * Revert "caching epoch seed in context so that getNextEpochBeaconProposer can be independent of state" This reverts commit 02a722a. * caching epoch seed in context so that getNextEpochBeaconProposer can be independent of state * removing the need to delete from nextEpochProposers in call to getBeaconProposer * no need to recompute currrentProposerSeed again * Revert "no need to recompute currrentProposerSeed again" This reverts commit b6b1b8c. * removed empty file left after fixing merge conflicts * remove some unnecessary variable from the epoch context. * add some comments * Fix lint * import from the right location * Review PR * Merge imports * Delete get proposers api impl test * Remove duplicated comment Co-authored-by: dapplion <35266934+dapplion@users.noreply.github.com> * Extend timeout for gitData unit test (#4026) * Fix readAndGetGitData (#4025) * Ensure light client update is in a single period (#4029) * Handle merge block fetch error (#4016) * Handle merge block fetch error * Log errors on fetch errors for terminal pow * docs: Update nodeJS minimum requirement (#4037) * Remove child_process call in gitData before step (#4033) * Oppool aggregates use BitArray only for set logic (#4034) * Use BitArrays for aggregate merging * Test intersectUint8Arrays * Review PR * Update tests * Remove un-used code * Modify gossipsub params following consensus spec v1.1.10 (#4011) * Modify gossipsub params following consensus spec v1.1.10 * Specify GOSSIPSUB_HEARTBEAT_INTERVAL as a constant * Throw a more informative error on invalid keystore (#4022) * Throw a more informative error on invalid keystore * Make error more descriptive * Use template string * Update keys.ts * Update keys.ts Co-authored-by: Lion - dapplion <35266934+dapplion@users.noreply.github.com> * Ignore gossip AggregateAndProof if aggregate is seen (#4019) * Ignore gossip AggregateAndProof if aggregate is seen * Check for non-strict superset of seen attestation data * Fix validateGossipAggregateAndProof benchmark test * Fix import * Ultilize intersectUint8Arrays() * Implement SeenContributionAndProof.participantsKnown * Add metrics to seen cache * Add perf tests * Change method name to isSuperSetOrEqual() * Refactor metric names * Specify lerna exact version for release-nightly workflow (#4049) * Add ropsten network (#4051) * Force all packages to be versioned for exact (#4052) * Update discv5 to v0.7.1 (#4044) * Add ability to update the fee recipient for execution via beacon and/or validator defaults (#3958) * Add and use a default fee recipient for a validator process * transfer the proposer cache to beacon chain * mock chain fixes * test and perf fixes * fee recipient validation change * track and use free recipient as string instead of ExecutionAddress * fix unit test * fix merge test * use dummy address * refac and add proposer cache pruning * tests for beacon proposer cache * merge interop fee recipient check * fix the optional * feeRecipient confirmation and small refac * add the missing map * add flag to enable strict fee recipient check * Small refactor to setup merge for ropsten using baked in configs (#4053) * Issue advance fcU for builing the EL block (#3965) rebaseing to the refactored prepare beacon proposer refac payload id cache as separate class and add pruning issue payload fcus if synced rename issueNext.. to maybeIssueNext... * Simplify release process (#4030) * Simplify release process * Remove old postrelease script * Add lerna version check * Tweak RELEASE.md * Add force-publish to lerna version command * Update the proposer boost percentage to 40% (#4055) * ESM Support (#3978) * ESM changes * Fix root lodestar script * Fix some linter errors * trying directly re-exporting under an alias from networks module * Fix types exports * Fix more linter errors * Fix spec test download * Update bls to 7.1.0 * Fix spec tests * temp reverting eslint parser option to 10 and disabling the check of .js file extenstion. Should fix lint errors * temp commented out file-extension-in-import * Disable readme checks * Fix check-build * Fix params e2e tests * Bump @chainsafe/threads * Bump bls to v7.1.1 * Add timeouts after node initialization but before sim test run * Tweak timeouts * Tweak timeout * Tweak sim merge timeout * Tweak sim merge timeout * Tweak sim merge timeout * Tweak sim merge timeout * Add more timeouts * Add another timeout * Fix linter errors * Fix some tests * Fix some linter errors and spec tests * Fix benchmarks * Fix linter errors * Update each bls dependency * Tweak timeouts * Add another timeout * More timeouts * Fix bls pool size * Set root package.json to ESM * Remove old linter comment * Revert "Set root package.json to ESM" This reverts commit 347b0fd. * Remove stray file (probably old) * Undo unnecessary diff * Add comment on __dirname replacement * Import type @chainsafe/bls/types * Use lodestar path imports * Revert multifork to lodestar package * Format .mocharc.yaml * Use same @chainsafe/as-sha256 version * Fix lodash path imports * Use src instead of lib * Load db metrics * Remove experimental-specifier-resolution * Remove lodestat/chain export * Add stray missing file extension * Revert ValidatorDir changes * Fix stray missing file extensions * Fix check-types Co-authored-by: Dadepo Aderemi <dadepo@gmail.com> Co-authored-by: dapplion <35266934+dapplion@users.noreply.github.com> * chore(release): v0.37.0-beta.0 * Bump to v0.37.0 Co-authored-by: tuyennhv <vutuyen2636@gmail.com> Co-authored-by: g11tech <76567250+g11tech@users.noreply.github.com> Co-authored-by: dadepo <dadepo@gmail.com> Co-authored-by: Cayman <caymannava@gmail.com> Co-authored-by: Phil Ngo <58080811+philknows@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: g11tech <gajinder@g11.in>
Motivation
Description
Provides the ability to request block proposers one epoch in the future. See the corresponding issue #3746 for more background information
Closes #3746
Steps to test or reproduce
If current Epoch is N,
Make a request to get the proposer duties for epoch N+1:
Take note of the returned validator indices.
Using a chain explorer like https://prater.beaconcha.in/, wait till the N+1 epoch becomes the current epoch. Then confirm that the validator indices now listed as part of N+1 are the same returned from the previous curl request to the beacon endpoint.