-
Notifications
You must be signed in to change notification settings - Fork 997
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rebalance duty times #3433
base: dev
Are you sure you want to change the base?
Rebalance duty times #3433
Conversation
As the chain keeps growing and duties are being added to the block construction and verification pipeline, it becomes increasingly difficult for clients to complete block duties on time leading to poor attestation performance and reorg frequency increases. This PR proposes to rebalance the relative timings of the 3 main activites, namely block production, attestation and aggregation such that they happen on 0/6/9 seconds into each slot instead of 0/4/8. This reduces the time for attestations to reach aggregators and aggregates to reach block producers but increases the time the consensus and execution clients have to produce and validate blocks. Each upgrade has so far increased complexity and processing requirements around block production and so will future upgraddes: due to increased size, blocks with blobs will take longer to dissemenate and additional verification / cryptography is needed to validate them.
As we talked on Discord these proposed numbers may represent a considerable rework of Prysm design at least. The way we handle aggregation requires us currently to have by default all attestations arriving 1.5 seconds before the aggregation count (this would be 7.5 seconds into the slot). Similarly, our implementation of #3034 requires a couple of forkchoice calls before the end of the slot and this in turn requires all aggregations to have arrived to the node or this would risk several orphaned blocks on chain. I do believe that these numbers are still within manageable boundaries for our implementation and I fully support increasing the first bracket of the slot. However I would want to see very good benchmarks of all client implementations and their handling with different parameters to justify these changes. The benchmarks we used to justify the cut at 1.5 seconds before the aggregation time were here prysmaticlabs/prysm#12350 Similarly we use 10 seconds as a time in the slot to decide or not a reorg, shortening this (like moving closer to the boundary as Lighthouse does) works fine in general but it hurts local execution builders in case of a failed reorg attempt. |
Personally I would like to see this happen first, and then measure again the distribution of attestation marks to see what impact increasing the attestation space in the slot would provide. As an example: Vouch follows the spec and attests as soon as possible i.e. at 4 seconds, or before if a block arrives. A sample distribution of attestation marks (this is when the attestation process completes) for one of our nodes is as follows: It can be seen that the majority of attestations are created by the 3s mark. and the tail past 4s is basically down to missed slots. If all validator clients followed the spec here it could well result in less pressure later in the slot, and somewhat mitigate the requirement to extend the first part of the slot. At the least, it would give a better view of what the in-slot timings should be. Separately, I would like to point out that a significant jump in orphan rates started in April, at the point that MEV relays (unilaterally) took over the proposal broadcast responsibilities for MEV blocks. A breakdown of the orphan rate for locally-proposed blocks only would give a better view as to the health of the network. |
specs/deneb/fork-choice.md
Outdated
@@ -95,7 +95,7 @@ def on_block(store: Store, signed_block: SignedBeaconBlock) -> None: | |||
|
|||
# Add proposer score boost if the block is timely | |||
time_into_slot = (store.time - store.genesis_time) % SECONDS_PER_SLOT | |||
is_before_attesting_interval = time_into_slot < SECONDS_PER_SLOT // INTERVALS_PER_SLOT | |||
is_before_attesting_interval = time_into_slot < SECONDS_PER_SLOT // 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a constant
It can be also seen in the graph that we're approaching the point where the current 4s rule will take more and more slots over due to an ever increasing amount of work the client has to do in order to be able to send out an attestation (ie in your graph, the 3-4s range is not empty either) - ie what's happening is that we're pushing back the "block arrived" time with additional validation (blob dispersal and validation): future versions will push this back further. The "attest-early" rule only helps those blocks which arrive early and are easy to validate, not the worst-case blocks that nonetheless are permitted by the spec (ie up to MAX_BLOBS_PER_BLOCK blobs etc). It is good for the network that clients support it, but we also need to acknowledge that the "staggering" effect where the network takes time to observe a block as it's being disseminated through the network will further increase with Deneb.
I've zoomed in the graph a little (the gap is an out-of-disk-space event on the collector): I've previously assumed that capella was more strongly involved, but it is indeed the case that the overall increase predates capella (Apr 12) by a few days. |
I agree, so perhaps attestation times aren't the best way of measuring this. You mention the "block arrived" time, I think that this roughly equates to the emission of the "head" event on the event stream, which is after the block has been processed and validated. In which case, here is a graph showing the average time in the slot in which we receive the "head" event (specifically, we take the median arrival time of the "head" event for all beacon nodes in our mainnet environment per slot, and then average this per day to give a single value for each day): So yes we are definitely seeing an increase in the time at which these events are emitted. That said, the average is below 2s at time of writing (but then again, there are various other ways in which we could slice this data that may bring different conclusions as to the headroom that we really have, and note that this methodology definitely misses out the impact of blocks that never make it to be head). One thing I'll try to do is to see if I can find the difference between the "block" and "head" events emitted, as that could give a better indicator of the actual processing time. |
Following on from the previous comment, here is a graph of the processing time for blocks. Methodology here is that we gather the timestamp of nodes emitting the 'block' and 'head' events on the Broadly similar shape to that of the head delay graph. Not as high an increase at the capella hard fork as I would have guessed prior to seeing the data, and I assume that the pull-back soon after was due to improvements in client code, but still definitely seeing an upwards trend. However, the actual number still seems pretty low in the grand scheme of things, and I wonder if there are more places to optimize before looking at changing the timings. (And unrelated to the above, but if we do change the timings I wonder if we should stop trying to stick to second boundaries and break a slot in to 128 increments, which would allow us finer-grained control in future if necessary.) |
Additional data on the graph, showing the 90th percentiles and 95th percentiles. 90th percentile is probably the most interesting, in that it suggests that the increase in number of validators has had less of an impact than originally thought in the "normal" case. The 95th percentile does show that the work done in worst case situations has been increasing a fair bit since the merge. And the same graph but topping the Y axis out at 200ms: |
Not all networks are 12 second slots, and in the past there's been curious timings on account of networks choosing odd slot times that aren't divisible by 3... Are we better off just making This would allow someone with an odd timing to be able to tweak for their network rather than get stuck on division issues like has happened in the past with some clients using ms and some clients using second granularity... you'd end up with something like
or in mainnet
Potentially could set a sync committee message due setting, or expect it to align with attestation duties... or bake MS into the numbers so that we used something like
Anyway this isn't a fully formed idea but figured it was worth sharing given we're talking about changing this area... I should say I agree with the stronger wording about attestation production... |
Agree it make more sense to specify as this |
I am personally in favour of integer parameters, from my perspective milliseconds should work better |
The other thing I wasn't sure of was if we have something for specifying the cutoff for blocks like the late block reorg PR... If we're doing these, that could also be just another parameter, so it could be set at whatever value was decided to allow us a consistent way of referencing the drop dead point in a slot for late blocks...
as an example |
Also note potential for out-of-order messages
af72748 introduces constants and notes that within a slot, messages may arrive out-of-order and that clients should be prepared to handle this case. |
| `ATTESTATION_DUE_MS` | `6000` | | ||
| `AGGREGATE_DUE_MS` | `9000` | | ||
| `SYNC_MESSAGE_DUE_MS` | `6000` | | ||
| `CONTRIBUTION_DUE_MS` | `9000` | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah good idea splitting sync committee and contributions to their own constants, I like it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think sync messages should be aligned with attestations. the logic is the same as for attestations in that they should be created once the block has been fully validated and it is reasonable to assume that it propagated.
Do we need to specifically call out that |
the possibility is on the receiving end - ie an attestation might propagate faster than a block therefore you might receive them out of order - to handle such cases, clients must cache attestations for the given slot if they have not yet observed the block (and wait with propagation until they've observed the block). On the sending end, I don't think we need to point it out (ie it follows from the fact that you're aggregating attestations and not the other way around) |
Hey @arnetheduck, I wanted to fix the conflicts, but I don't have permission to push commits to Do teams want to experiment with it at dencun-devnet-9? |
@arnetheduck To test it with minimal specs ( |
|
Seems like there are trends to voluntarily reduce the first slot interval by 1s down to 3s by proposing late intentionally: That's sort of the.. opposite direction of this PR |
Rebalancing duty times, The current honest validator specifications ask block proposers to proposer their block at the beginning of the slot, which is not rational to do. A rational block proposer builds their block later in the slot to capture more MEV. They only need to release the block early enough for attestors to see it in time such that they vote for it, effectively making the block canonical.
We have not seen these timing games played to their fullest extent either imo. |
I agree with Caspar, before clients shipped the late block reorg feature, seeing blocks at 11 seconds was common. However, I would support increasing the attestation deadline if it's shown that 4 seconds is not enough for honest validators building locally at Deneb. |
These are two separate problems - timing games exist before and after this PR. The PR just rebalances the timings to better reflect the reality of relative weights of the actions involved in a slot - it addresses a "cost misalignment" in the spec where creating and propagating hundreds-of-kilobytes block with hundreds of signatures, transactions and so on is assumed to have the same cost as sending a 200-byte attestation with a single signature in it. Whether these timings are appropriate for everyone is a topic of separate discussion and mitigating timing games does not involve picking a specific value for the timeouts over another - it requires a different mechanism entirely - as such, I'd keep timing game discussions entirely out of this thread. |
Would a rational attester also delay the attestation to capture extra rewards for timely voting? They only need to release the attestation early enough for aggregators to see it in time. If they were to attest on time, they may miss out on rewards if a block is received late. |
No, rational attesters would attest as early as possible, there's no need to delay the attestation and in any case there's no need to attest to anything that came after the deadline that is very likely to be reorged. |
It's not necessarily likely to be reorged, for example, if it was received on time but is still pending validation at the 4s mark, e.g., low resource system. |
Rational attestors vote as soon as possible, but might delay their attestation deadline still: At the limit, a rational proposer knows their block needs to receive only 40% of the committee's attestation votes (proposer boost). If they would maximally delay their block proposal they would target a split in the committee such that 40% of the committee hears the block before the attestation deadline (and so vote for it), and 60% do not hear the block before the attestation deadline (and so vote for parent). An attestor wants to get their head vote correct and thus wants ensure to not be part of the 60%. They can achieve this by delaying their attestation slightly (while making sure they propagate it in time for aggregation - timing games all over). This could allow block proposers to delay their blocks even further, incentivizing attestors to further delay their attestation deadline... This could spiral towards the end of the slot. Attestation committees are large enough that targeting splits should be feasible. Practically a proposer would add safety margins around those timings, but in the extreme this is where this could be headed... |
If, say, attestation deadline would move to 6s, could the timing games be limited by.. simply stopping to gossip blocks 3s into the slot? As in, if late blocks are reorged anyway, does it make sense to still gossip them when it's already late? |
So you make the slot longer, from 4s to 6s, but effectively the slot is now shorter in terms of allowed block propagation time (3s)? I don't follow. |
Right now, the timings are:
the PR here proposes to change the deadline timings to 0/6/9, effectively promoting this behaviour:
If 3s is enough for block propagation, what could be done is:
|
As in, if you run a low resource system, you would be guaranteed to have at least 3s to process the block and determine validity verdict. That could not be reduced by MEV games |
If it really is enough for the whole block creation/signing/propagation process then there should be no need to change the timings. But as @arnetheduck says in the OP this is increasing not going to be the case, especially as blocks become larger. I'd be against stopping block gossip at an arbitrary time within the first period; it would likely push people towards higher-end hardware and faster 'net connections, and both of those would go against home staking. |
The advantage of ignoring blocks after a deadline (maybe not stopping gossip altogether, but ignoring) would be that the people pushing later and later blocks then know the drop dead time... but you could bet they'd be getting as close to that line as possible given the financial advantages... i'm not sure what the answer is... |
Stopping gossip before the attestation deadline is simply a bad idea, any late block that becomes canonical will have to be imported by RPC, any split view that can be caused by timely delivery is exacerbated since now clients that did not get the block on time will have to wait much longer to get the block, possibly across several hops if their peers don't have them. It risks network partitioning for not reason. |
Any chance we can push this further to make it real? 0/6/9 is far better than 0/4/8. Did I miss anything? Did we have a consensus that we will keep it 0/4/8? |
I don't think that we saw much consensus to change these values. Most of the agreement was around moving to absolute values rather than fractions of the slot time in the spec, which is reasonable but doesn't involve changes in the practical operation of the chain. |
As the chain keeps growing and duties are being added to the block construction and verification pipeline, it becomes increasingly difficult for clients to complete block duties on time leading to poor attestation performance and reorg frequency increases.
This PR proposes to rebalance the relative timings of the 3 main activities, namely block production, attestation and aggregation such that they happen on 0/6/9 seconds into each slot instead of 0/4/8.
This reduces the time for attestations to reach aggregators and aggregates to reach block producers but increases the time the consensus and execution clients have to produce and validate blocks.
Each upgrade has so far increased complexity and processing requirements around block production and so will future upgraddes: due to increased size, blocks with blobs will take longer to dissemenate and additional verification / cryptography is needed to validate them.
Attached are a few graphs that show the number of reorgs growing over the last 6 months as well as typical receipt times of attestations and aggregates relative the the block start.
Generally, ~95% of attestations are submitted within 2s of the broadcast cutoff and >99% of aggregates. This gives us some margin to reach even better numbers when 3s are allotted to each of these activities.
We can see that these numbers tell a story where attestations take longer than aggregates to produce - this could have a number of underlying reasons including the fact that attestations are many and aggregates are few putting load on the network, and clients delaying attestation production slightly already due to natural block processing delays (ie if the pipeline is already clogged with block verification, clients may be blocked and not produce attestations at the same time).
One could thus introduce an uneven balance, ie 0/7/10 but this seems premature - with more time dedicated to block verification, clients should be able to produce timely attestations with higher frequency.
This PR also introduces stronger language around the already existing requirement that clients send out attestations as soon as they have observed a block - doing so would help the network distribute load more evenly and thus better absorb continued growth.
Metrics for attestation / aggregate receipt as observed by a Nimbus node:
The above metrics show that most attestations are observed between 4-6 seconds while some arrive earlier than the current 4s cutoff.
For aggregates, the cutoff is much more clear since there exists no "early broadcast" rule. A possible early broadcast rule would be that all member of the committee have voted and a perfect aggregate has been reached.
The same data, but in a graph over time.
Reorg frequencies for the past 6 months:
Among concerns are:
This PR updates the validator spec - the fork choice spec and possibly other parts would have to be updated accordingly.
Marked draft pending further investigations into relative timings.This looks like a pretty good idea :)