-
Notifications
You must be signed in to change notification settings - Fork 231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VC: Hardening and optimizing time handling. #4743
Conversation
beacon_chain/beacon_clock.nim
Outdated
@@ -69,19 +73,39 @@ proc fromNow*(c: BeaconClock, t: BeaconTime): tuple[inFuture: bool, offset: Dura | |||
proc fromNow*(c: BeaconClock, slot: Slot): tuple[inFuture: bool, offset: Duration] = | |||
c.fromNow(slot.start_beacon_time()) | |||
|
|||
proc toDuration*(c: BeaconTime): Duration = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
toDuration
doesn't really make sense - BeaconTime
is a point in time, you cannot logically convert it to a duration without a reference point - we don't really need toDuration
since we always have a reference point available when working with time (most of the time, the slot but if need be, we can introduce GENESIS_BEACON_TIME
just like we have GENESIS_SLOT
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You think that BeaconTime
is a point at time, but it is not, because BeaconTime
is a point in time with adjusted start. So when you convert it to Duration
it means that you want to calculate distance to the start
.
This procedure is used in this scenario:
info "Initializing beacon clock",
genesis_time = vc.beaconGenesis.genesis_time,
current_slot = "<n/a>", current_epoch = "<n/a>",
time_to_genesis = currentTime.toDuration()
notice "Waiting for genesis",
genesis_time = vc.beaconGenesis.genesis_time,
time_to_genesis = currentTime.toDuration()
So do you want me to but this calculation everywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, what this function returns is the "time-since-genesis-or-zero" - it's not a conversion (as toDuration
implies in its name) which is not meaningful, but rather a time one-sided time-arithmetic-function - its correct signature would be fromGenesis(t: BeaconTime): tuple[future: bool, dur: Duration]
- then the caller can choose whether to saturate it or not, but at least the name correctly represents what the function does.
Throwing away the sign here will lead to misuse because it relies on the caller to check "future-or-not" before calling this function which is easy to forget - it's better to return the sign and have the caller decide about discarding it.
chronos.nanoseconds(int64(c.ns_since_genesis)) | ||
else: | ||
chronos.nanoseconds(int64(-c.ns_since_genesis)) | ||
|
||
proc durationToNextSlot*(c: BeaconClock): Duration = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we would remove durationToNextSlot
and fromNow
- both of these are compound operations that mix "current time" from the global clock with "time arithmetic" which leads to problems and confusion in code that uses them, since getting the "current time" generally should only be done in a controlled way:
- once per "block" of arithmetic (ie we should not get the current time twice when performing arithmetic because there's an inherent race condition when the clock between the calls)
- not reused between
await
boundaries
therefore:
- remove
fromNow
,durationToNext*
etc - use
now
consistently when the current time is needed as the only function that accesses the global clock - introduce a time difference operation like
durationFrom(a, b: BeaconTime): tuple[future: bool, dur: Duration]
as is done infromNow
- replace all code in both
bn
andvc
to use the above functionsfromNow
becomesslot.start_beacon_time().durationFrom(clock.now())
durationToNextSlot
becomeslet now = clock.now(); saturate((now.slotOrZero() + 1).start_beacon_time().durationFrom(now))
more or less (slotOrZero
is incorrect for pre-genesis - it needs to return duration to genesis for any time before genesis)- the above can be added as helpers that take a
BeaconTime
and do the saturation if need be, but these helpers should never access the clock itself - more
durationFrom
can be added for slot vs time etc to make it less verbose
durationFrom
can also be durationTo
or some other name or shape - doesn't matter which direction it goes in, but since Duration
is undefined for negative values, we need to return a tuple - we also sometimes need to know that genesis hasn't happened yet, so we don't always want to saturate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proposed solution do not support negative Slot values which is happened when you are running before genesis. So it become impossible to provide such UX experience like it was done now. Because its impossible to supply negative Slot values it is very hard to find proper argument value for durationFrom/durationTo
. Its why durationToNextSlot
looks the most appropriate name for the procedure which is able to calculate duration to next slot which is either before or after genesis.
currentSlot = currentTime.slotOrZero() | ||
currentEpoch = currentSlot.epoch() | ||
|
||
if currentTime.ns_since_genesis >= 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with the above removal of fromNow
, this code can still use inFuture
etc by calling durationFrom
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code do use only now()
as you asked to do, nothing else. So why should i apply any arithmetic in here if i just trying to obtain current time and trying to understand is this time before genesis or after genesis? Why should i call to arithmetic operation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we've generally avoided the use of specific time granularity in the code (ns_since_genesis
exposes "nanoseconds") - ie we've strived to not expose the granularity of the clock in code that does time computations, instead using "abstract" types which keep this information as an internal implementation detail.
In fact, I'm not sure why ns_since_genesis
is public at all - I would generally not have considered it something that callers should use / rely on.
That said, iff we want to use ns_since_genesis
, it's important to verify that we use it consistently everywhere across the codebase (both bn and vc) - using "slightly different" ways of dealing with time in different parts of the codebase becomes technical debt.
let | ||
currentTime = vc.beaconClock.now() | ||
currentSlot = currentTime.slotOrZero() | ||
chronosOffset = chronos.nanoseconds( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto - durationFrom
gives inFuture
so there is no need to revert to nanoseconds
or other internal implementation details here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
chronosOffset
do not related to any BeaconClock
arithmetic, it just converts TimeDiff
object. Otherwise it will be 2 calls to durationTo/durationFrom
.
47b4038
to
6a50b24
Compare
8452f3b
to
4979867
Compare
…fter Genesis, but also before Genesis. Change VC pre-genesis behavior, add runPreGenesisWaitingLoop() and runGenesisWaitingLoop(). Add checkedWaitForSlot() and checkedWaitForNextSlot() to strictly check current time and print warnings. Fix VC main loop to use checkedWaitForNextSlot(). Fix attestation_service to run attestations processing only until the end of the duty slot. Change attestation_service main loop to use checkedWaitForNextSlot(). Change block_service to properly cancel all the pending proposer tasks. Use checkedWaitForSlot to wait for block proposal. Fix block_service waitForBlockPublished() to be compatible with BN. Fix sync_committee_service to avoid asyncSpawn. Fix sync_committee_service to run only until the end of the duty slot. Fix sync_committee_service to use checkedWaitForNextSlot().
Fix aggregated attestation publishing missing delay.
Fix fallback service sync status spam. Fix false `sync committee subnets subscription error`.
3173017
to
5be531a
Compare
|
||
var currentSlot: Opt[Slot] | ||
while true: | ||
# This loop could look much more nicer/better, when | ||
# https://github.com/nim-lang/Nim/issues/19911 will be fixed, so it could |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this bug is fixed now btw
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, but i have never seen extended test suite about loops and exceptions in Nim language, so i wish to be safe here.
elif wallSlot > destinationSlot + SLOTS_PER_EPOCH: | ||
if showLogs: | ||
warn "Time moved forwards by more than an epoch, skipping ahead" | ||
return Opt.some(wallSlot - SLOTS_PER_EPOCH) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does this return an "old" slot? the beacon node does this because it "replays" attestations for the old slots but I don't think the VC does? It's also inconsistent with the elif wallSlot > destination
return
No description provided.