-
Notifications
You must be signed in to change notification settings - Fork 975
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Withdrawals into queue in single function #3042
Conversation
e6952b3
to
9bf10af
Compare
Like the simplicity of this. the downside is that full withdrawals won't necessarily be processed immediately the validator is withdrawable, but I don't see that being a major issue. I do want to go back to The other related, but somewhat more nebulous, concern I have is that I can envisage smart contracts wanting to obtain proof on-chain that a withdrawal has come from a certain validator. Providing and verifying a proof for a 0.025 ETH amount could end up eating a fair chunk of the value of the withdrawal. If |
I'll leave the same comment here as in Paul's PR, an option is to process full withrawals in the same loop as effective balance updates. That may be equivalent to the original approach without the extra loop |
At least for Lodestar, (on a preliminary look, not implemented yet) the previous approach won't require a full sweep of the validator set. We do all sweeps in epoch processing in one go, essentially merging multiple spec functions into one. |
In order to do this you need to prove that doing it that way is equivalent to the specified version which explicitly adds a loop after balances changes. It needs to be checked that these processes are independent |
Right. Then you can collect an initial list of fully withdraw-able validators at the beginning of epoch processing, and then re-check that balance > 0. The diff between those lists will be none in most cases. I can only see cases like a validator being slashed for all its balance causing a diff there |
To me, I don't see why full withdrawals should be prioritised, although there are definite periods where you may exit second and get processed before an earlier full exit. Full withdrawals getting bumped to the top of the queue immediately does basically unfairly advantage them compared to active contributing members of the network, I'm not sure why the prioritisation call was made in the initial instance... The 'fairness' is not the reason I made the change though, it was purely reducing processing for the epoch transition. |
I fixed the tests. Some notes:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new queue solution looks good to me.
@potuz: I'll leave the same comment here as in Paul's PR, an option is to process full withrawals in the same loop as effective balance updates. That may be equivalent to the original approach without the extra loop
I think the drawback is that the optimized solution is less easy to test and debug the new withdrawal feature?
To all CL clients: we are going to cut a new spec release soon and are likely to include this PR. Let me know if there is any strong objection 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great job!
tests/core/pyspec/eth2spec/test/capella/epoch_processing/test_process_withdrawals_into_queue.py
Outdated
Show resolved
Hide resolved
@with_capella_and_later | ||
@spec_state_test | ||
def test_random_5(spec, state): | ||
yield from run_random_partial_withdrawals_test(spec, state, Random(5)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably want a randomizer that does both but the full randomized block tests do as well
IMO opinion this PR does not add any tangible benefits in performance, while slowing full withdrawals for no reason. Partial withdrawals are processed "slowly" because we have to. Full withdrawals are processed "faster" because we can. It's purely technical, no fairness argument influences those facts. |
I think we should benchmark this. How long does it realistically take to loop through the validator set? Isn't this premature optimization? |
We know in teku that looping through the full set of validators is expensive and it makes any algorithm O(n) in the number of validators. So yes, we would definitely have to use an optimised version of this. The argument for having a separate loop for full vs partial withdrawals was to make testing simpler, but that isn't the case once we need to add a cache for which validators have become withdrawn since tests would need to cover that caching. In other words, a separate loop actually expands the scope of testing required rather than reducing it since we now need to cover every way a validator may become withdrawable. Besides which, we aim for simplicity in the way consensus works and having a single approach to withdrawal is simpler than having two different approaches depending on why the validator has balance that could be withdrawn. |
I tried to make a summary table:
My impression:
My read is that it’s slightly toward 👍…? 😬 |
There are good arguments on both sides here and I'll chime in with another view against this change :) I think we should prioritize spec simplicity above all else, weighted against speed to ship the feature to mainnet. There will always be things we need to do to optimize implementations in clients so I don't think this is a big ask for client teams. Caches etc have been a large source of bugs in the past and to mitigate the concern we develop/maintain a large corpus of consensus test cases. And the more the spec itself is optimized for the production/maintenance of these test cases, the better mitigation it provides against bugs due to implementation complexity. And alongside this, I want to get to a place where we stop changing the capella spec as soon as possible so there is something stable for client teams to target by end of the year. And the best way to get stability is to stop changing the spec, i.e. do not make this change. |
One big difference that this PR introduces is that the withdrawal queue is much smaller. This is the first time we have a queue in the specs and I am worried about memory fragmentation. I started implementing this in Go and all implementations we are evaluating either leak memory or suffer from continuous reallocations. Either way, if this PR passes I think it would be good to put Having the queue of constant size allows us to optimize this further by keeping a rolling index and adding on place. This would be application specific, but I imagine all languages will benefit from having a constant size rather than potentially a big chunk of the validator set. |
Coming to think about this: if we go with rolling sweeps as in this PR, there's no need for the queue at all: knowing what are the next withdrawal indices determines uniquely the next withdrawals that a block can include, so we can just check against that and there's not even the need to keep the queue in the state it seems. |
@potuz One of the nice things about the queue is that it fixes the |
Balances can already change on each slot due to slashing. |
@etan-status That doesn't effect the queued approach though, because the balance decrement for the withdrawal happens immediately on the epoch transition. If a validator that is getting partially withdrawn gets slashed before their withdrawal is processed then the slashing just applies to the remaining balance (which I think is fine). |
closing due to #3068 |
Rework of @rolfyone's PR (#3032)
There were some minor bugs introduced in the last one. I think I cleaned it up but have not reworked tests.
I'd prefer to only do so once we have consensus on the general feature
Original comment:
Partially to initiate a discussion.
Basically use a single pipeline for withdrawal processing to simplify the flow and reduce the need for full validator scans during epoch processing.
This should make the feature much more testable amongst other benefits, and makes withdrawal of partial or full amounts an equal priority based on your position in the validator list.
The timing remains approximately equal to previous, except there may have previously been 4 exits per epoch to add that are now just done during sweeping, where previously they were done as a priority
Danny's comments:
Note, that this does have some shifts in "fairness" wrt when you become withdrawable and when you actually are withdrawn. It used to be immediate but not it could be quite a long time between when you become withdrawable and when withdrawn (on the order of a week at current vset size). Additioanlly, some people might become withdrawable sooner than others (thus exited sooner in some cases) but might then be later in the sweep.
I like the previous iteration a bit better for a few reasons:
Re (2). If this is a major engineering complexity/hurdle, then I'm okay with the changes. But it does seem a bit weird to prioritize partial withdrawals (which usually all the set are ready for) over the much smaller (and more important) set of full withdrawals.