-
Notifications
You must be signed in to change notification settings - Fork 975
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Withdrawals without queues #3068
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added some comments to consider
This feels simpler than the queue approach, which is why I am in favor of this change. We are at such an early stage with withdrawals (at least judging from https://notes.ethereum.org/@ralexstokes/SyDqVjr4i#CL-client-implementation), that major changes like this are OK. |
I actually love the simplicity of this, there is no point popping the withdrawals into state queue and then later process it at block. This will also reduce/remove pressure on epoch processing, and the lookups for finding withdrawals for a block are small since per block limit is small (assuming high success rate of finding a valid withdrawal). The two step withdrawal isn't giving any advantage. the full having priority over partial can still be addressed in this mode as well, but only depends on what UX we want to go for. so 👍 from lodestar on this approach overall. |
I'm in two minds about this approach. I really like the simplicity but I'm a bit nervous about the potential need to have to scan a lot of validators to find the required number of withdrawals given that has to happen on every block import. My concern there is two-fold:
One option to resolve this would be to simple require scanning a fixed number of validators rather than requiring a fixed number of withdrawals be included. ie block 0 includes any withdrawals available for validators 0-3, block 1 withdrawals for validators 4-8 etc. We may not use the full withdrawal capacity but given there's at least some desire to have a minimum withdrawable amount, going at a fixed rate round the validators doesn't seem like an issue. |
I thought about simply changing the |
Having to perform the scan separate to block transition is the complexity I'd like to avoid - and it will generally only be done for what's seen as the canonical chain so may be invalidated by a late block etc. Having sub-optimal processing kick in when the chain hasn't been finalising for 5 days is my nightmare scenario as it increases the risk of cascading problems. I'd actually prefer to have to scan large spans regularly rather than having it be so rare as it would then be better tested and optimised. Also while this particular case probably isn't an issue even if we have to do a full scan, we should be really cautious about depending on caches to ensure good performance. If the cache-miss scenario is noticeably slower it can open up attack scenarios where the attacker releases a block late or something like that to deliberately trigger cache misses (and missing multiple different caches can combine in lots of complex ways to cause problems). Hence things like gas prices are always set based on worst case scenarios where caches are missed.
Given the current requests are to slow down the rate of withdrawals I'm not worried about this. At some point we likely need to start reusing validator indices (or find some other way of dropping exited validators) just to bound the growth of the validator set anyway. And if we were unhappy with reduced capacity before that happened it would be easy to say we'd scan until we've either found 4 withdrawals or scanned 50 validators. I'm not against having that dual limit now if we wanted to future proof but I don't think we need it. |
FWIW, the scan is so trivial in the non-leaking case that the way we plan to implement this to simply perform it after block operations and after any change of head. No need to optimize anything. If the network hasn't been finalizing for 5 days. The penalty is a full linear scan on each block indeed, but the current system is not clear that it is better: you still rely on reorg resistant caches for the full queues, you need to be efficient in implementing the queue itself to avoid memory fragmentation or even leaking it. It is the very first time we add a queue to a consensus object. While it's true that one can envision that the nightmare scenario is bad, a general rule is that protocol complexity has many more edge and untested cases, and there's not doubt that the current specification, with a queue in the beacon state that can be very large or very small depending on the number of exiting validators, with separate logic for full withdrawals and partial withdrawals, with separate logic for epoch transition, etc. is far more complicated.
With this I agree, we are very likely going to be needing to reuse validator indices. I'd be happy to see some expected values perhaps. If we think we will be reusing indices when we hit something like 50K exited validators, then capping the loop at 5K will keep it in the microseconds. Notice also that all clients are already performing a full validator set scan within forkchoice on each block insertion. |
limiting the loop to a tweakable constant would enforce a time restriction on how often withdrawals get processed, which would help ensure that we're not processing too fast as well, which would be positive.... |
Ok I ran some benchmarks and had a poke around the caches we already have and I don't think I'm worried anymore. I was wary because while iterating balances is cheap in Teku, iterating the actual validator list is fairly expensive because of the way we maintain the state. Any balance above 32ETH must be withdrawable and any 0 balance isn't withdrawable, so the only time we have to access the actual validator without always getting a withdrawal out of it is when the balance is less than 32ETH. Most withdrawable validators will have a balance of 0 already and can be skipped. Thus the worst case scenario would be when all the active validators have leaked to have a balance below 32ETH. The timing on this isn't too bad and we can probably just wear the cost but if needed we already have a cache of active validators since that's needed for all kinds of things and that would let us very quickly skip over the vast majority of these additional checks without any extra complexity for maintaining a new cache. Also iterating the full validator set is faster than I remember it anyway. So I support this proposal. |
I am in support of this PR and I think it's a great iteration. Nice work @potuz 👏 Also, thanks to those who did prior iterations! I think there are two scenarios in which to consider this PR:
In a healthy network, I like this PR because it adds fewer items to the state. It appears to be more "functional" and I like that for reasons that are probably subjective. There's an argument that it doesn't prioritize full withdrawals, but I don't feel strongly about that. I think prioritization can be a separate argument (i.e., this PR could also support prioritized full withdrawals in a subsequent PR). When it comes to an unhealthy network, I think I also prefer this PR. @ajsutton has done well to highlight how this PR suffers on an unhealthy network, but I believe that the In the "Appendix" section, I describe how the As an argument against this PR, it has the unfortunate downside that, in some scenarios (e.g. inactivity leak), we must suffer an iteration across the validator set for every block. The However, the penalties suffered are not equal. The Another thing about this PR is that I think the "iteration every block" problem is solvable. For example, we could attach to the state a So, I think this PR is certainly a step in the right direction if not a candidate for production. I'm unsure about the latter at this point 🤔 AppendixThis appendix describes my concerns about how the The rate at which the withdrawals queue can grow is a little unclear due to "unbounded" full withdrawals, however let's assume it grows at Running some benchmarks on my ~3 year old laptop (i7-10510U), I'm seeing 0.4ms to compute the hash tree root of a As we know, it's possible for validators to produce blocks upon the finalized checkpoint and cause all honest clients to run skip-slots from the finalized checkpoint to the wall clock slot. Since Along with this time cost, there's also a memory cost; a days worth of So, if we assume that there were 450k validators viable to enter the queue before non-finality started, we'll reach a maximum EDIT: I think my tree hash cache sizes are wrong. I was assuming that the number of leaves is a power of two, which is incorrect. Computing the real size is a bit more complicated and I'm not sure it's necessary for the argument. Happy to do it later if it turns out to be useful. |
I am in support of this PR expanding on others points including:
However I want to point out that this approach could support an implementation where full withdrawals do not have to incur extra wait time. Snippet below is naive implementation for that goal. Splitting full and partial withdrawals in half in an arbitrary decision, but as long as churn is low and there's not a mass slashing event partial withdrawals will always dominate. Also, since the full withdrawal iteration does not have a pointer, older indexes will be prioritized always. To prevent having to loop on each block, one can keep an ordered hash set of indexes who satisfy the withdrawable_epoch condition and only evaluate those. def get_expected_withdrawals(state: BeaconState) -> Sequence[Withdrawal]:
epoch = get_current_epoch(state)
withdrawal_index = state.next_withdrawal_index
withdrawals: List[Withdrawal] = []
# Full
for index in range(len(state.validators)):
balance = state.balances[index]
validator = state.validators[index]
if is_fully_withdrawable_validator(validator, balance, current_epoch):
withdrawals.append(Withdrawal(
index=WithdrawalIndex(withdrawal_index + len(withdrawals)),
validator_index=index,
address=ExecutionAddress(val.withdrawal_credentials[12:]),
amount=balance,
))
if len(withdrawals) == MAX_WITHDRAWALS_PER_PAYLOAD / 2:
break
# Partial
index = ValidatorIndex((state.last_withdrawal_validator_index + 1) % len(state.validators))
for probed in range(len(state.validators)):
val = state.validators[index]
balance = state.balances[index]
if is_partially_withdrawable_validator(val, balance):
withdrawals.append(Withdrawal(
index=WithdrawalIndex(withdrawal_index + len(withdrawals)),
validator_index=index,
address=ExecutionAddress(val.withdrawal_credentials[12:]),
amount=balance - MAX_EFFECTIVE_BALANCE,
))
if len(withdrawals) == MAX_WITHDRAWALS_PER_PAYLOAD:
break
probed += 1
index = ValidatorIndex((index + probed) % len(state.validators))
return withdrawals |
Why not both? Scan until you've found Setting |
It seems that we are all converging into the usage of a constant for that
The main reason to not have this constant nearly close to 2**9 is that exited validators are unbounded, and until we reuse validator indices we will need to deal with these holes in the registry. As pointed above, we are happily processing a full validator registry scan every block on forkchoice at less than 5ms per loop. |
Isn't this done already? |
tests/core/pyspec/eth2spec/test/capella/block_processing/test_process_withdrawals.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
A quick note, there were two main reasons I did the withdrawals queue initially:
Edit: an a final thing would be to not have balance updates mid-epoch but we already wrecked that with sync committees 😢 so what's another 16. |
tests/core/pyspec/eth2spec/test/capella/block_processing/test_process_withdrawals.py
Outdated
Show resolved
Hide resolved
…nsus-specs into withdrawals_without_queues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic work @potuz!
@@ -11,11 +11,8 @@ | |||
- [Constants](#constants) | |||
- [Domain types](#domain-types) | |||
- [Preset](#preset) | |||
- [Misc](#misc) | |||
- [State list lengths](#state-list-lengths) | |||
- [Max operations per block](#max-operations-per-block) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests/core/pyspec/eth2spec/test/helpers/execution_payload.py
This PR implements withdrawals without any withdrawal queue. I want to explore the space for solutions like this since in principle seems to be much simpler than the current logic so I may be missing something and I'm using this space to find out what's wrong.
This solution adds only one field
latest_withdrawal_validator_index
to the BeaconState, that keeps track of the last validator that has performed a withdrawal. Proposers pack up toMAX_WITHDRAWALS_PER_PAYLOAD
in their execution blocks starting from the next validator afterlatest_withdrawal_validator_index
.During block processing, we simply check that the the next validators that can withdraw agree with the ones packed in the block, and that their withdrawal amount agree with the ones we expect from the state.
There is no epoch processing logic as there is no queue to include withdrawals.
This also removes unnecessary constants like
MAX_WITHDRAWALS_PER_EPOCH
andWITHDRAWAL_QUEUE_LIMIT
I suppose there can be a problem with validators being slashed right after doing a partial withdrawal. This can only be a bad issue for validators that get slashed for more than
MAX_EFFECTIVE_BALANCE
which is very rare, another issue is that a validator may end up with a slightly bellowMAX_EFFECTIVE_BALANCE
right after performing a partial withdrawal, but I don't see anything seriously wrong with this either.Fixes: #3054