Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Withdrawals without queues #3068

Merged
merged 37 commits into from
Nov 10, 2022
Merged

Conversation

potuz
Copy link
Contributor

@potuz potuz commented Oct 28, 2022

This PR implements withdrawals without any withdrawal queue. I want to explore the space for solutions like this since in principle seems to be much simpler than the current logic so I may be missing something and I'm using this space to find out what's wrong.

This solution adds only one field latest_withdrawal_validator_index to the BeaconState, that keeps track of the last validator that has performed a withdrawal. Proposers pack up to MAX_WITHDRAWALS_PER_PAYLOAD in their execution blocks starting from the next validator after latest_withdrawal_validator_index.

During block processing, we simply check that the the next validators that can withdraw agree with the ones packed in the block, and that their withdrawal amount agree with the ones we expect from the state.

There is no epoch processing logic as there is no queue to include withdrawals.

This also removes unnecessary constants like MAX_WITHDRAWALS_PER_EPOCH and WITHDRAWAL_QUEUE_LIMIT

I suppose there can be a problem with validators being slashed right after doing a partial withdrawal. This can only be a bad issue for validators that get slashed for more than MAX_EFFECTIVE_BALANCE which is very rare, another issue is that a validator may end up with a slightly bellow MAX_EFFECTIVE_BALANCE right after performing a partial withdrawal, but I don't see anything seriously wrong with this either.

Fixes: #3054

@potuz potuz changed the base branch from dev to single-withdrawal-func October 28, 2022 14:52
Copy link
Member

@ralexstokes ralexstokes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added some comments to consider

specs/capella/beacon-chain.md Outdated Show resolved Hide resolved
specs/capella/beacon-chain.md Outdated Show resolved Hide resolved
specs/capella/beacon-chain.md Outdated Show resolved Hide resolved
specs/capella/beacon-chain.md Outdated Show resolved Hide resolved
specs/capella/fork.md Outdated Show resolved Hide resolved
specs/capella/beacon-chain.md Outdated Show resolved Hide resolved
specs/capella/beacon-chain.md Outdated Show resolved Hide resolved
@rkapka
Copy link
Contributor

rkapka commented Oct 31, 2022

This feels simpler than the queue approach, which is why I am in favor of this change. We are at such an early stage with withdrawals (at least judging from https://notes.ethereum.org/@ralexstokes/SyDqVjr4i#CL-client-implementation), that major changes like this are OK.

@potuz potuz changed the base branch from single-withdrawal-func to dev October 31, 2022 18:36
@hwwhww hwwhww added the Capella label Nov 1, 2022
@g11tech
Copy link
Contributor

g11tech commented Nov 1, 2022

I actually love the simplicity of this, there is no point popping the withdrawals into state queue and then later process it at block. This will also reduce/remove pressure on epoch processing, and the lookups for finding withdrawals for a block are small since per block limit is small (assuming high success rate of finding a valid withdrawal). The two step withdrawal isn't giving any advantage.

the full having priority over partial can still be addressed in this mode as well, but only depends on what UX we want to go for.

so 👍 from lodestar on this approach overall.

@ajsutton
Copy link
Contributor

ajsutton commented Nov 1, 2022

I'm in two minds about this approach. I really like the simplicity but I'm a bit nervous about the potential need to have to scan a lot of validators to find the required number of withdrawals given that has to happen on every block import. My concern there is two-fold:

  1. 99.9% of the time we won't be scanning far which means the potential performance issues only show up in rare situations - typically when there are already other issues going on.
  2. The mitigation for those performance problems involves introducing and keeping in sync separate caches which significantly reduces testability and adds a lot of complexity. I was keen on not having the separate loops for full vs partial withdrawals primarily to avoid the need to introduce caches tracking the fully withdrawable validators which are unlikely to be fully exercised by spec tests. The optimisations to avoid a full scan in this case have similar testability issues.

One option to resolve this would be to simple require scanning a fixed number of validators rather than requiring a fixed number of withdrawals be included. ie block 0 includes any withdrawals available for validators 0-3, block 1 withdrawals for validators 4-8 etc. We may not use the full withdrawal capacity but given there's at least some desire to have a minimum withdrawable amount, going at a fixed rate round the validators doesn't seem like an issue.

@potuz
Copy link
Contributor Author

potuz commented Nov 1, 2022

I'm in two minds about this approach. I really like the simplicity but I'm a bit nervous about the potential need to have to scan a lot of validators to find the required number of withdrawals given that has to happen on every block import.

I thought about simply changing the for loop to have a fixed bound which we can impose right now and I wouldn't be too opposed to it. However, I find that the situation when the scan is not immediate only happens if the network has been leaking for about 5 days (and this gets longer and longer each day), and if that happens, no matter what withdrawal mechanism we have, we will be in trouble anyway and this one does not make it particularly worse since it's a scan that you can perform after block processing during the beacon's "dead time" and in parallel, as opposed to during epoch transition. The biggest problem with adding a bound is that we do not reuse validator indices, and this means that the bound is actually less effective over time and even during happy times we will start to see decreased capacity.

@ajsutton
Copy link
Contributor

ajsutton commented Nov 1, 2022

However, I find that the situation when the scan is not immediate only happens if the network has been leaking for about 5 days (and this gets longer and longer each day), and if that happens, no matter what withdrawal mechanism we have, we will be in trouble anyway and this one does not make it particularly worse since it's a scan that you can perform after block processing during the beacon's "dead time" and in parallel, as opposed to during epoch transition.

Having to perform the scan separate to block transition is the complexity I'd like to avoid - and it will generally only be done for what's seen as the canonical chain so may be invalidated by a late block etc. Having sub-optimal processing kick in when the chain hasn't been finalising for 5 days is my nightmare scenario as it increases the risk of cascading problems. I'd actually prefer to have to scan large spans regularly rather than having it be so rare as it would then be better tested and optimised.

Also while this particular case probably isn't an issue even if we have to do a full scan, we should be really cautious about depending on caches to ensure good performance. If the cache-miss scenario is noticeably slower it can open up attack scenarios where the attacker releases a block late or something like that to deliberately trigger cache misses (and missing multiple different caches can combine in lots of complex ways to cause problems). Hence things like gas prices are always set based on worst case scenarios where caches are missed.

The biggest problem with adding a bound is that we do not reuse validator indices, and this means that the bound is actually less effective over time and even during happy times we will start to see decreased capacity.

Given the current requests are to slow down the rate of withdrawals I'm not worried about this. At some point we likely need to start reusing validator indices (or find some other way of dropping exited validators) just to bound the growth of the validator set anyway. And if we were unhappy with reduced capacity before that happened it would be easy to say we'd scan until we've either found 4 withdrawals or scanned 50 validators. I'm not against having that dual limit now if we wanted to future proof but I don't think we need it.

@potuz
Copy link
Contributor Author

potuz commented Nov 2, 2022

Having to perform the scan separate to block transition is the complexity I'd like to avoid - and it will generally only be done for what's seen as the canonical chain so may be invalidated by a late block etc. Having sub-optimal processing kick in when the chain hasn't been finalising for 5 days is my nightmare scenario as it increases the risk of cascading problems. I'd actually prefer to have to scan large spans regularly rather than having it be so rare as it would then be better tested and optimised.

Also while this particular case probably isn't an issue even if we have to do a full scan, we should be really cautious about depending on caches to ensure good performance. If the cache-miss scenario is noticeably slower it can open up attack scenarios where the attacker releases a block late or something like that to deliberately trigger cache misses (and missing multiple different caches can combine in lots of complex ways to cause problems). Hence things like gas prices are always set based on worst case scenarios where caches are missed.

FWIW, the scan is so trivial in the non-leaking case that the way we plan to implement this to simply perform it after block operations and after any change of head. No need to optimize anything. If the network hasn't been finalizing for 5 days. The penalty is a full linear scan on each block indeed, but the current system is not clear that it is better: you still rely on reorg resistant caches for the full queues, you need to be efficient in implementing the queue itself to avoid memory fragmentation or even leaking it. It is the very first time we add a queue to a consensus object. While it's true that one can envision that the nightmare scenario is bad, a general rule is that protocol complexity has many more edge and untested cases, and there's not doubt that the current specification, with a queue in the beacon state that can be very large or very small depending on the number of exiting validators, with separate logic for full withdrawals and partial withdrawals, with separate logic for epoch transition, etc. is far more complicated.

Given the current requests are to slow down the rate of withdrawals I'm not worried about this. At some point we likely need to start reusing validator indices (or find some other way of dropping exited validators) just to bound the growth of the validator set anyway. And if we were unhappy with reduced capacity before that happened it would be easy to say we'd scan until we've either found 4 withdrawals or scanned 50 validators. I'm not against having that dual limit now if we wanted to future proof but I don't think we need it.

With this I agree, we are very likely going to be needing to reuse validator indices. I'd be happy to see some expected values perhaps. If we think we will be reusing indices when we hit something like 50K exited validators, then capping the loop at 5K will keep it in the microseconds.

Notice also that all clients are already performing a full validator set scan within forkchoice on each block insertion.

@rolfyone
Copy link
Contributor

rolfyone commented Nov 2, 2022

limiting the loop to a tweakable constant would enforce a time restriction on how often withdrawals get processed, which would help ensure that we're not processing too fast as well, which would be positive....
I wouldn't be surprised if that number is low, like 16 ish... which would ensure we're not going through the entire validator set while we're leaking, it'd be about 3 days per full loop of 400k validators if all validators are leaking, and probably drop max_withdrawals_per_payload to something like 4 which would loop through at full inclusion in around 12 days for 400k validators... 8 would be roughly 6 days which is probably on the 'fast' side...
But doing this we're close to the '6 days for a full withdrawal is too long' discussion?

@ajsutton
Copy link
Contributor

ajsutton commented Nov 2, 2022

Ok I ran some benchmarks and had a poke around the caches we already have and I don't think I'm worried anymore.

I was wary because while iterating balances is cheap in Teku, iterating the actual validator list is fairly expensive because of the way we maintain the state. Any balance above 32ETH must be withdrawable and any 0 balance isn't withdrawable, so the only time we have to access the actual validator without always getting a withdrawal out of it is when the balance is less than 32ETH. Most withdrawable validators will have a balance of 0 already and can be skipped. Thus the worst case scenario would be when all the active validators have leaked to have a balance below 32ETH. The timing on this isn't too bad and we can probably just wear the cost but if needed we already have a cache of active validators since that's needed for all kinds of things and that would let us very quickly skip over the vast majority of these additional checks without any extra complexity for maintaining a new cache.

Also iterating the full validator set is faster than I remember it anyway.

So I support this proposal.

@paulhauner
Copy link
Contributor

paulhauner commented Nov 2, 2022

I am in support of this PR and I think it's a great iteration. Nice work @potuz 👏 Also, thanks to those who did prior iterations!

I think there are two scenarios in which to consider this PR:

  1. During a healthy, finalizing network.
  2. During an unhealthy, non-finalizing network.

In a healthy network, I like this PR because it adds fewer items to the state. It appears to be more "functional" and I like that for reasons that are probably subjective. There's an argument that it doesn't prioritize full withdrawals, but I don't feel strongly about that. I think prioritization can be a separate argument (i.e., this PR could also support prioritized full withdrawals in a subsequent PR).

When it comes to an unhealthy network, I think I also prefer this PR. @ajsutton has done well to highlight how this PR suffers on an unhealthy network, but I believe that the withdrawal_queue (i.e. the status quo) suffers on an unhealthy network, too.

In the "Appendix" section, I describe how the withdrawal_queue has CPU (hashing) and memory (beacon state + hashing cache) downsides when doing "skip slots" across long distances.

As an argument against this PR, it has the unfortunate downside that, in some scenarios (e.g. inactivity leak), we must suffer an iteration across the validator set for every block. The withdrawal_queue suffers for for each skipped epoch, which I expect to be less frequent. That is, I propose that a penalty for each block is worse than an equal penalty for each skipped epoch.

However, the penalties suffered are not equal. The withdrawal_queue suffers CPU and memory penalties that are (a) challenging to comprehend and (b) at times significantly worse than a validator set iteration. This PR suffers only for CPU in a tightly-bounded and easy-to comprehend way.

Another thing about this PR is that I think the "iteration every block" problem is solvable. For example, we could attach to the state a [uint64; MAX_WITHDRAWALS_PER_PAYLOAD * SLOTS_PER_EPOCH] which contains an ordered list of validator indices that are available for withdrawal. When that list is empty or the last value equals last_withdrawal_validator_index then we know there are no more withdrawals to dequeue.

So, I think this PR is certainly a step in the right direction if not a candidate for production. I'm unsure about the latter at this point 🤔

Appendix

This appendix describes my concerns about how the withdrawal_queue behaves during skip slots. Whilst "skipping slots" (i.e., running process_slot from parent.slot to block.slot) we must maintain the withdrawals_queue and compute its hash tree root for each epoch so that we may verify the block.state_root.

The rate at which the withdrawals queue can grow is a little unclear due to "unbounded" full withdrawals, however let's assume it grows at MAX_PARTIAL_WITHDRAWALS_PER_EPOCH == 256 (for brevity, let's call this variable WITHDRAW_PER_EPOCH).

Running some benchmarks on my ~3 year old laptop (i7-10510U), I'm seeing 0.4ms to compute the hash tree root of a withdrawals_queue of length 256.

As we know, it's possible for validators to produce blocks upon the finalized checkpoint and cause all honest clients to run skip-slots from the finalized checkpoint to the wall clock slot. Since EPOCHS_PER_DAY = 225, a day of skip slots will involve at least 90ms (225 * 0.4) of hashing the withdrawal_queue. Notably, 90ms assumes perfect tree-hash caching. A cache-less implementation would end up at ~13s.

Along with this time cost, there's also a memory cost; a days worth of Withdrawal objects (44 bytes each) is 44 * EPOCHS_PER_DAY * WITHDRAW_PER_EPOCH ~= 2.5mb added to the BeaconState. Assuming we're doing current-generation tree hash caching, we're looking at a cached tree of ~3.6mb (((EPOCHS_PER_DAY * WITHDRAW_PER_EPOCH * 2) - 1) * 32).

So, if we assume that there were 450k validators viable to enter the queue before non-finality started, we'll reach a maximum withdrawal_queue size of 450k within ~7.7 days. By that time we'll have done 693ms of hashing (in the best case), added ~19mb to the BeaconState and ~33mb of tree hash cache. Without tree hash caching we're looking at 12 minutes of hashing!

EDIT: I think my tree hash cache sizes are wrong. I was assuming that the number of leaves is a power of two, which is incorrect. Computing the real size is a bit more complicated and I'm not sure it's necessary for the argument. Happy to do it later if it turns out to be useful.

@dapplion
Copy link
Collaborator

dapplion commented Nov 2, 2022

I am in support of this PR expanding on others points including:

  • Ensures block space for withdrawals is optimally used when needed. Current dev branch will leave withdrawal spots empty unless churn limit is high enough.

However I want to point out that this approach could support an implementation where full withdrawals do not have to incur extra wait time. Snippet below is naive implementation for that goal. Splitting full and partial withdrawals in half in an arbitrary decision, but as long as churn is low and there's not a mass slashing event partial withdrawals will always dominate. Also, since the full withdrawal iteration does not have a pointer, older indexes will be prioritized always. To prevent having to loop on each block, one can keep an ordered hash set of indexes who satisfy the withdrawable_epoch condition and only evaluate those.

def get_expected_withdrawals(state: BeaconState) -> Sequence[Withdrawal]:
    epoch = get_current_epoch(state)
    withdrawal_index = state.next_withdrawal_index
    withdrawals: List[Withdrawal] = []

    # Full

    for index in range(len(state.validators)):
        balance = state.balances[index]
        validator = state.validators[index]
        if is_fully_withdrawable_validator(validator, balance, current_epoch):
            withdrawals.append(Withdrawal(
                index=WithdrawalIndex(withdrawal_index + len(withdrawals)),
                validator_index=index,
                address=ExecutionAddress(val.withdrawal_credentials[12:]),
                amount=balance,
            ))
        if len(withdrawals) == MAX_WITHDRAWALS_PER_PAYLOAD / 2:
            break

    # Partial
    index = ValidatorIndex((state.last_withdrawal_validator_index + 1) % len(state.validators))

    for probed in range(len(state.validators)):
        val = state.validators[index]
        balance = state.balances[index]
        if is_partially_withdrawable_validator(val, balance):
            withdrawals.append(Withdrawal(
                index=WithdrawalIndex(withdrawal_index + len(withdrawals)),
                validator_index=index,
                address=ExecutionAddress(val.withdrawal_credentials[12:]),
                amount=balance - MAX_EFFECTIVE_BALANCE,
            ))
        if len(withdrawals) == MAX_WITHDRAWALS_PER_PAYLOAD:
            break
        probed += 1
        index = ValidatorIndex((index + probed) % len(state.validators))
    return withdrawals

@ethDreamer
Copy link
Contributor

ethDreamer commented Nov 2, 2022

require scanning a fixed number of validators rather than requiring a fixed number of withdrawals be included

Why not both? Scan until you've found MAX_WITHDRAWALS_PER_PAYLOAD withdrawable validators, or you've hit VALIDATOR_SEEK_LIMIT, whichever comes first. This avoids the "sub-optimal processing kick in when the chain hasn't been finalising" death spiral.

Setting VALIDATOR_SEEK_LIMIT = 512 would ensure the entire validator set is processed once per day, even in the worst case scenario of no one being withdrawable, with 98% of ETH staked..

@potuz
Copy link
Contributor Author

potuz commented Nov 2, 2022

require scanning a fixed number of validators rather than requiring a fixed number of withdrawals be included

Why not both? Scan until you've found MAX_WITHDRAWALS_PER_PAYLOAD withdrawable validators, or you've hit VALIDATOR_SEEK_LIMIT, whichever comes first. This avoids the "sub-optimal processing kick in when the chain hasn't been finalising" death spiral.

Setting VALIDATOR_SEEK_LIMIT = 512 would ensure the entire validator set is processed once per day, even in the worst case scenario of no one being withdrawable, with 98% of ETH staked..

It seems that we are all converging into the usage of a constant for that for loop. However, I would strongly advice that this constant is set to a much higher limit than what's being suggested here. I'd suggest something along the lines of 2**17 ~ 130K . The reason being that

  • The scan takes less than 1ms on a laptop
  • It's relatively safe against large chunks of exits after the fork.

The main reason to not have this constant nearly close to 2**9 is that exited validators are unbounded, and until we reuse validator indices we will need to deal with these holes in the registry. As pointed above, we are happily processing a full validator registry scan every block on forkchoice at less than 5ms per loop.

@potuz
Copy link
Contributor Author

potuz commented Nov 4, 2022

ah, can you also update the BeaconState in upgrade_to_capella infork.md with the new fields?

Isn't this done already?

Copy link
Contributor

@hwwhww hwwhww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :shipit:

@djrtwo
Copy link
Contributor

djrtwo commented Nov 7, 2022

A quick note, there were two main reasons I did the withdrawals queue initially:

  1. To ensure that load is bound and spread across slots even in exceptional scenarios (e.g. withdrawals are initially enabled and tons of validators are withdrawable). With the combined sweep that bounds per slot, this is no longer an issue
  2. To ensure that CL and EL can be processed in parallel. I was misguided on this item. By not having a queue, we add complexity to block building for CLs -- they must ensure that the withdrawals created at after any balance updates that happen prior to the process_withdrawals call (which can happen due to process_slots being before process_block) . BUT because withdrawals show up independently in the Execution Payload for block processing (placed there by the proposer), they can immediately be passed into EL even if validated a bit later in CL. Point being -- I was incorrect on this and it was my sticking point so this PR seems fine. Will do a deep pass on it but we can merge shortly

Edit: an a final thing would be to not have balance updates mid-epoch but we already wrecked that with sync committees 😢 so what's another 16.

Copy link
Contributor

@g11tech g11tech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Contributor

@djrtwo djrtwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic work @potuz!

@@ -11,11 +11,8 @@
- [Constants](#constants)
- [Domain types](#domain-types)
- [Preset](#preset)
- [Misc](#misc)
- [State list lengths](#state-list-lengths)
- [Max operations per block](#max-operations-per-block)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests/core/pyspec/eth2spec/test/helpers/execution_payload.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

withdraw_balance decreases up to the validator's balance but produces a receipt up to the parameter