Withdrawals into queue in single function #3042

djrtwo · 2022-10-16T21:02:07Z

There were some minor bugs introduced in the last one. I think I cleaned it up but have not reworked tests.
I'd prefer to only do so once we have consensus on the general feature

Original comment:

Partially to initiate a discussion.

Basically use a single pipeline for withdrawal processing to simplify the flow and reduce the need for full validator scans during epoch processing.

This should make the feature much more testable amongst other benefits, and makes withdrawal of partial or full amounts an equal priority based on your position in the validator list.

The timing remains approximately equal to previous, except there may have previously been 4 exits per epoch to add that are now just done during sweeping, where previously they were done as a priority

Danny's comments:

Note, that this does have some shifts in "fairness" wrt when you become withdrawable and when you actually are withdrawn. It used to be immediate but not it could be quite a long time between when you become withdrawable and when withdrawn (on the order of a week at current vset size). Additioanlly, some people might become withdrawable sooner than others (thus exited sooner in some cases) but might then be later in the sweep.

I like the previous iteration a bit better for a few reasons:

The two mechanisms were isolated so were easier to test on the spec side. That said reworking tests shouldn't be too hard
The timing and fairness thing. In the prior mechanism, once you become withdrawable you get to get all of your ETH out immediately rather than waiting for what might be a very long partial withdrawals sweep. I understand the complexity of doing a full validator sweep to find the withdrawable folks but note that it's not like those that are withdrawable can change quickly, you can easily have the ones to check precomputed.

Re (2). If this is a major engineering complexity/hurdle, then I'm okay with the changes. But it does seem a bit weird to prioritize partial withdrawals (which usually all the set are ready for) over the much smaller (and more important) set of full withdrawals.

mcdee · 2022-10-16T21:25:32Z

Like the simplicity of this. the downside is that full withdrawals won't necessarily be processed immediately the validator is withdrawable, but I don't see that being a major issue.

I do want to go back to MAX_WITHDRAWALS_PER_EPOCH, however. The current value of 256 will work through the entire current validator active set in around 7 days. Looking at current rewards, an average validator would expect to earn around 0.025 ETH in that 7 day period, That seems like quite a small number as the "unit of value" to be going to the effort of transferring across chains.

The other related, but somewhat more nebulous, concern I have is that I can envisage smart contracts wanting to obtain proof on-chain that a withdrawal has come from a certain validator. Providing and verifying a proof for a 0.025 ETH amount could end up eating a fair chunk of the value of the withdrawal.

If MAX_WITHDRAWALS_PER_EPOCH was reduced to 64 it would still get through the current active set in a month, and would be skimming values of around 0.1 ETH for active validators. It would reduce the number of withdrawal operations on-chain, and would results in proofs taking up significantly less ETH as a percentage of the transfer. Is there a reason outside of responsiveness (which I do understand) to keep this at 256?

potuz · 2022-10-16T22:19:40Z

I'll leave the same comment here as in Paul's PR, an option is to process full withrawals in the same loop as effective balance updates. That may be equivalent to the original approach without the extra loop

dapplion · 2022-10-17T13:28:13Z

At least for Lodestar, (on a preliminary look, not implemented yet) the previous approach won't require a full sweep of the validator set. We do all sweeps in epoch processing in one go, essentially merging multiple spec functions into one.

potuz · 2022-10-17T13:32:34Z

At least for Lodestar, (on a preliminary look, not implemented yet) the previous approach won't require a full sweep of the validator set. We do all sweeps in epoch processing in one go, essentially merging multiple spec functions into one.

In order to do this you need to prove that doing it that way is equivalent to the specified version which explicitly adds a loop after balances changes. It needs to be checked that these processes are independent

dapplion · 2022-10-17T13:37:42Z

At least for Lodestar, (on a preliminary look, not implemented yet) the previous approach won't require a full sweep of the validator set. We do all sweeps in epoch processing in one go, essentially merging multiple spec functions into one.

In order to do this you need to prove that doing it that way is equivalent to the specified version which explicitly adds a loop after balances changes. It needs to be checked that these processes are independent

Right. Then you can collect an initial list of fully withdraw-able validators at the beginning of epoch processing, and then re-check that balance > 0. The diff between those lists will be none in most cases. I can only see cases like a validator being slashed for all its balance causing a diff there

rolfyone · 2022-10-17T19:58:01Z

To me, I don't see why full withdrawals should be prioritised, although there are definite periods where you may exit second and get processed before an earlier full exit.

Full withdrawals getting bumped to the top of the queue immediately does basically unfairly advantage them compared to active contributing members of the network, I'm not sure why the prioritisation call was made in the initial instance...
Processing them during updated balance calculation is probably ok, but the spec calling out that it's right at the end and the code being completely at odds does seem to be a little odd (if balance calculation also exits validators)...

The 'fairness' is not the reason I made the change though, it was purely reducing processing for the epoch transition.

hwwhww · 2022-10-17T22:40:14Z

I fixed the tests. Some notes:

Merged run_process_full_withdrawals and run_process_partial_withdrawals into run_process_withdrawals_into_queue
Since full and partial withdrawals are now in the same function, I removed randomize_state call from run_random_partial_withdrawals_test because random_state might also trigger the full withdrawal cases and lead to unexpected partial withdrawal num.

hwwhww

This new queue solution looks good to me.

@potuz: I'll leave the same comment here as in Paul's PR, an option is to process full withrawals in the same loop as effective balance updates. That may be equivalent to the original approach without the extra loop

I think the drawback is that the optimized solution is less easy to test and debug the new withdrawal feature?

To all CL clients: we are going to cut a new spec release soon and are likely to include this PR. Let me know if there is any strong objection 🙏

djrtwo

great job!

tests/core/pyspec/eth2spec/test/capella/epoch_processing/test_process_withdrawals_into_queue.py

djrtwo · 2022-10-18T14:02:04Z

tests/core/pyspec/eth2spec/test/capella/epoch_processing/test_process_withdrawals_into_queue.py

+@with_capella_and_later
+@spec_state_test
+def test_random_5(spec, state):
+    yield from run_random_partial_withdrawals_test(spec, state, Random(5))


We probably want a randomizer that does both but the full randomized block tests do as well

dapplion · 2022-10-18T15:00:22Z

IMO opinion this PR does not add any tangible benefits in performance, while slowing full withdrawals for no reason. Partial withdrawals are processed "slowly" because we have to. Full withdrawals are processed "faster" because we can. It's purely technical, no fairness argument influences those facts.

rkapka · 2022-10-18T15:09:03Z

I think we should benchmark this. How long does it realistically take to loop through the validator set? Isn't this premature optimization?

ajsutton · 2022-10-19T12:34:14Z

We know in teku that looping through the full set of validators is expensive and it makes any algorithm O(n) in the number of validators. So yes, we would definitely have to use an optimised version of this.

The argument for having a separate loop for full vs partial withdrawals was to make testing simpler, but that isn't the case once we need to add a cache for which validators have become withdrawn since tests would need to cover that caching. In other words, a separate loop actually expands the scope of testing required rather than reducing it since we now need to cover every way a validator may become withdrawable.

Besides which, we aim for simplicity in the way consensus works and having a single approach to withdrawal is simpler than having two different approaches depending on why the validator has balance that could be withdrawn.

hwwhww · 2022-10-20T16:11:22Z

I tried to make a summary table:

Solution	Pros	Cons
Separate partial func and full func	prioritize full withdrawal? (but most devs don't think it's benefit)	[@ajsutton] we're all saying we'll optimise it in clients, but then the tests for it will be written based on the simple implementation in the spec and won't provide coverage for the optimisations actual clients would need to do. Given Danny's argument for the separatation of withdrawals was to make it simpler to test, the fact it makes the tests less effective seems to be shooting ourselves in the foot.
#3042	simplest	[@dapplion] I don't find justifiable to add delays on withdrawals for the rationale of the PR. There are optimizations possible that achieve the same end without worsen the validator UX [@etan-status] Downside is that it adds this arbitrary delay of 3-4 days on avg (but more in future) for trying to get out, during which no rewards are earned anymore. This means that exits need to be scheduled as late as possible into a withdrawal interval for a specific validator to not miss out on those rewards. This can be quite tricky because of the exit queue. Tracking the exited-but-not-yet-withdrawn validators separately is straight forward as an optimization. But please provide tests for edge case such as depositing into a validator that has already exited or fully withdrawn to re-enter the priority queue accordingly.
#3042 + loop in effective balance updates (@potuz's suggestion)	most efficient probably closer to client's optimization?	[hww] more difficult to debug and test (but only in specs?)

My impression:

Clients will optimize the loop anyway, so the spec is less likely to be as same as the production implementation.
If (1) is true, the epoch processing tests (process_withdrawals_into_queue) may not be applied on the client side anyway.
Most devs like the simplicity of Withdrawals into queue in single function #3042 on the spec side.
Withdrawals into queue in single function #3042 may introduce some side effects, i.e., the delay. It needs to be tested properly if we go with Withdrawals into queue in single function #3042.

My read is that it’s slightly toward 👍…? 😬

ralexstokes · 2022-10-25T17:48:29Z

There are good arguments on both sides here and I'll chime in with another view against this change :)

I think we should prioritize spec simplicity above all else, weighted against speed to ship the feature to mainnet.

There will always be things we need to do to optimize implementations in clients so I don't think this is a big ask for client teams. Caches etc have been a large source of bugs in the past and to mitigate the concern we develop/maintain a large corpus of consensus test cases. And the more the spec itself is optimized for the production/maintenance of these test cases, the better mitigation it provides against bugs due to implementation complexity.

And alongside this, I want to get to a place where we stop changing the capella spec as soon as possible so there is something stable for client teams to target by end of the year. And the best way to get stability is to stop changing the spec, i.e. do not make this change.

potuz · 2022-10-28T12:57:21Z

One big difference that this PR introduces is that the withdrawal queue is much smaller. This is the first time we have a queue in the specs and I am worried about memory fragmentation. I started implementing this in Go and all implementations we are evaluating either leak memory or suffer from continuous reallocations. Either way, if this PR passes I think it would be good to put WITHRAWAL_QUEUE_LIMIT = MAX_WITHRAWALS_PER_EPOCH or some aggressive limit like this, effectively capping the queue size to a constant bound since anyway we add to this queue on epoch transition and we cannot empty it during the coming epoch.

Having the queue of constant size allows us to optimize this further by keeping a rolling index and adding on place. This would be application specific, but I imagine all languages will benefit from having a constant size rather than potentially a big chunk of the validator set.

potuz · 2022-10-28T13:02:31Z

Coming to think about this: if we go with rolling sweeps as in this PR, there's no need for the queue at all: knowing what are the next withdrawal indices determines uniquely the next withdrawals that a block can include, so we can just check against that and there's not even the need to keep the queue in the state it seems.

michaelsproul · 2022-10-29T23:19:52Z

@potuz One of the nice things about the queue is that it fixes the balance of the withdrawal at the point where it was processed in process_epoch. One thing that concerns me about the queueless approach (and #3068) is that we have to deal with the complexity of balances changing after every block, meaning that whatever cache we use for withdrawals needs to be updated on every slot and needs to handle more frequent re-orgs, etc. It makes withdrawals feel a lot more like attestations, which we know are more complex to pack than e.g. voluntary exits which are epoch-based.

etan-status · 2022-10-29T23:34:25Z

Balances can already change on each slot due to slashing.

michaelsproul · 2022-10-29T23:39:19Z

@etan-status That doesn't effect the queued approach though, because the balance decrement for the withdrawal happens immediately on the epoch transition. If a validator that is getting partially withdrawn gets slashed before their withdrawal is processed then the slashing just applies to the remaining balance (which I think is fine).

hwwhww · 2022-11-11T20:52:23Z

closing due to #3068

djrtwo mentioned this pull request Oct 16, 2022

Capella processing #3032

Closed

rework PR onto current dev from @rolfyone

9bf10af

djrtwo force-pushed the single-withdrawal-func branch from e6952b3 to 9bf10af Compare October 16, 2022 21:05

ethereum deleted a comment from Where-2 Oct 16, 2022

Fix withdrawals tests

b452d71

hwwhww added the Capella label Oct 17, 2022

ethereum deleted a comment Oct 18, 2022

hwwhww approved these changes Oct 18, 2022

View reviewed changes

djrtwo commented Oct 18, 2022

View reviewed changes

Remove debug print

8794d86

timbeiko mentioned this pull request Oct 26, 2022

Consensus-layer Call 97 ethereum/pm#630

Closed

potuz mentioned this pull request Oct 27, 2022

Withdrawal epoch transition prysmaticlabs/prysm#11590

Closed

james-prysm mentioned this pull request Oct 28, 2022

[Tracking] - Withdrawals prysmaticlabs/prysm#11077

Closed

28 tasks

hwwhww closed this Nov 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Withdrawals into queue in single function #3042

Withdrawals into queue in single function #3042

djrtwo commented Oct 16, 2022 •

edited

Loading

mcdee commented Oct 16, 2022

potuz commented Oct 16, 2022

dapplion commented Oct 17, 2022 •

edited

Loading

potuz commented Oct 17, 2022

dapplion commented Oct 17, 2022 •

edited

Loading

rolfyone commented Oct 17, 2022

hwwhww commented Oct 17, 2022 •

edited

Loading

hwwhww left a comment

djrtwo left a comment

djrtwo Oct 18, 2022

dapplion commented Oct 18, 2022

rkapka commented Oct 18, 2022

ajsutton commented Oct 19, 2022

hwwhww commented Oct 20, 2022 •

edited

Loading

ralexstokes commented Oct 25, 2022

potuz commented Oct 28, 2022

potuz commented Oct 28, 2022

michaelsproul commented Oct 29, 2022 •

edited

Loading

etan-status commented Oct 29, 2022

michaelsproul commented Oct 29, 2022

hwwhww commented Nov 11, 2022

Withdrawals into queue in single function #3042

Withdrawals into queue in single function #3042

Conversation

djrtwo commented Oct 16, 2022 • edited Loading

Original comment:

Danny's comments:

mcdee commented Oct 16, 2022

potuz commented Oct 16, 2022

dapplion commented Oct 17, 2022 • edited Loading

potuz commented Oct 17, 2022

dapplion commented Oct 17, 2022 • edited Loading

rolfyone commented Oct 17, 2022

hwwhww commented Oct 17, 2022 • edited Loading

hwwhww left a comment

Choose a reason for hiding this comment

djrtwo left a comment

Choose a reason for hiding this comment

djrtwo Oct 18, 2022

Choose a reason for hiding this comment

dapplion commented Oct 18, 2022

rkapka commented Oct 18, 2022

ajsutton commented Oct 19, 2022

hwwhww commented Oct 20, 2022 • edited Loading

ralexstokes commented Oct 25, 2022

potuz commented Oct 28, 2022

potuz commented Oct 28, 2022

michaelsproul commented Oct 29, 2022 • edited Loading

etan-status commented Oct 29, 2022

michaelsproul commented Oct 29, 2022

hwwhww commented Nov 11, 2022

djrtwo commented Oct 16, 2022 •

edited

Loading

dapplion commented Oct 17, 2022 •

edited

Loading

dapplion commented Oct 17, 2022 •

edited

Loading

hwwhww commented Oct 17, 2022 •

edited

Loading

hwwhww commented Oct 20, 2022 •

edited

Loading

michaelsproul commented Oct 29, 2022 •

edited

Loading