Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

time: add ExternalNow, etc for external time and timers #36141

Open
zx2c4 opened this issue Dec 14, 2019 · 96 comments
Open

time: add ExternalNow, etc for external time and timers #36141

zx2c4 opened this issue Dec 14, 2019 · 96 comments

Comments

@zx2c4
Copy link
Contributor

zx2c4 commented Dec 14, 2019

Update May 5 2021: The current proposed API is in #36141 (comment). - rsc


Vocabulary:

  • Program time: monotonic, but stops when the computer is in S3 sleep.
  • Real time: monotonic, but continues to advance when the computer is in S3 sleep.
  • Wall time: non-monotonic thing on your wristwatch or wall clock that NTP messes with. This one plays no role in this discussion here at all.
  • Operating system: this always refers to the tuple of OS+ParticularVersion+ParticularConfiguration.

(These vocabulary terms can be nitpicked - maybe program time should be cpu time or something - but we've been using them prior in discussion, so let's continue to use them so as not to introduce confusion.)

Proposal:

  • Find some way to introduce "real time" semantics into Go, which currently mostly uses "program time", except on Windows, where it's always been "real time" for historical reasons.

Motivation:

  • Network protocols need to keep track of timeouts independent of whether a computer is asleep, since parties on a network exist in the real world, rather than virtualized on a CPU.

Landscape:

  • On some operating systems, the poll/select/kqueue/WaitFor*Object/futex family of functions takes a timeout that is measured in "real time", and on others measured in "program time".
  • Most operating systems support a "program time" counter. Some support a "real time" counter, but some do not, depending on configuration or existence of S3.
  • Most operating systems offer a notifier for resuming from sleep, though some may not, depending on configuration or existence of S3.
  • Important observation: operating systems that do not offer a notifier support "program time" rather than "real time".

Possibilities:

a. Make the existing time. and time.Timer. functions use "real time" exclusively, when possible. Introduce a function runtime.RealtimeTimers() bool to indicate whether Go successfully enabled "real time" timers rather than "program time" timers, the fallback.

b. Introduce additional duplicated functions to time. and time.Timer. that use "real time" rather than "program time". Introduce a function time.RealtimeTimersAreRealTime() bool to indicate whether Go successfully enabled "real time" timers on this new set of functions, or if the new set of functions behave identically to the old.

c. Introduce additional duplicated functions to time. and time.Timer. that use "real time" rather than "program time", and throw an error if "real time" capabilities aren't available, forcing users to introduce verbose fallback code if they only want to support "real time" opportunistically.

d. Add a function runtime.UseRealtimeTimers() error that attempts to change the runtime to use "real time" timers everywhere, like (a).

e. Add runtime function runtime.UseRealtimeTimers(yes bool) error that attempts to change the runtime to use "real time" or "program time" timers everywhere, like (a) but the ability to toggle. Add runtime function runtime.RealtimeTimers() bool to indicate the current state. The default start-state would be either OS-defined or "real time" or "program time", depending on what we decide.

f. Other options?

My personal preference would be (a) or (e), but I'm open to discussion.

CC @ianlancetaylor @bradfitz @aclements @rsc

@zephyrtronium
Copy link
Contributor

Making time sense a configuration option in runtime as in (d) or (e) means that packages which want to use real time can't be composed with packages which want to use program time, hence almost no one can depend on either behavior. That seems like probably the worst option.

From reading the discussion at #35482, it seems to me like the only cases where real time is mandatory are cases where the program interacts with the outside world, and furthermore that all cases where the program interacts with the outside world really want real time, even if they currently don't (or previously didn't) use it due to operating system behavior. That implies that the suggestion in #35482 (comment) of TimerAt, SleepUntil, &c., which I believe is a strict subset of your possibility (b), could and should be used everywhere in net. At the risk of bikeshedding, I would suggest that those new functions could be in a new x/time package that has runtime support, but then net can't use them. Maybe a new time/realtime package would help keep the API surface area of time small for typical wall time users while supporting those who must have real time.

As a supporting anecdote: I have an IRC bot that I run using Go 1.13.0 on my local Windows machine, which I also put to sleep from time to time. Sometimes-but-not-always after waking my computer, the program will actively spin for 300 seconds while waiting for the network deadlines to "actually" happen. I haven't tried it yet with Go 1.13.3+, but from what I understand, that behavior would be fixed there – the deadlines would always occur immediately after the computer wakes. This indicates to me that real time is the correct time for all network protocols, not just for WireGuard.

@zx2c4
Copy link
Contributor Author

zx2c4 commented Dec 14, 2019

This indicates to me that real time is the correct time for all network protocols, not just for WireGuard.

Yes, probably more so than not. Brad wanted "real time" on a networking CL recently too and settled for the unreliable "wall time" instead.

@ALTree ALTree added this to the Proposal milestone Dec 14, 2019
@networkimprov
Copy link

networkimprov commented Dec 14, 2019

We're expecting a separate proposal from the Go team along the lines described by @aclements in #35482 (comment).

I posted some followup questions to it in #35482 (comment)

See also #24595, #35012, #29485 (comment)

@bradfitz mentions his wall (i.e. pseudo-real) timers in #35482 (comment)

It would probably help if Go allowed us to set a suspend/resume handler.

Note that network protocol use between apps on the same machine is also common, and would generally not want real-time timers.

@ianlancetaylor ianlancetaylor changed the title proposal[s]: "real time" timer semantics for 1.15 proposal: support "real time" timer semantics Dec 14, 2019
@ianlancetaylor
Copy link
Contributor

I agree with @zephyrtronium that a global setting as proposed in possibilities d and e is a non-starter. Large programs include many packages, and those packages may legitimately have different requirements.

Possibility a is essentially #24595 as a choice made for all systems where possible. The Linux kernel attempted to make this choice, and reverted it, as discussed at https://www.spinics.net/lists/linux-tip-commits/msg43709.html. Watchdog timers failing on resume are a particular issue. So are timers for single user games. Very roughly, timers associated with behavior of other programs on the current system, or with behavior of the user of the system, seem likely to want "program time". Timers associated with behavior of other systems on the network seem likely to want "real time". Given that Go has historically used "program time" on all systems other than Windows, shifting all systems to "real time" seems likely to hurt more existing code than it helps.

The distinction between possibilities b and c is subtle, and it's not yet clear to me that it matters. If we have a way to detect that a system was suspended, which is the only case where "program time" differs from "real time", then we can always implement timers that use "real time".

@ianlancetaylor
Copy link
Contributor

From the API perspective I think the relevant functions and methods are:

  • time.Sleep
  • time.NewTimer (and its variants time.After and time.AfterFunc)
  • time.(*Timer).Reset
  • net.Conn.SetDeadline (and SetReadDeadline and SetWriteDeadline and implementations)
  • os.(*File).SetDeadline (and SetReadDeadline and SetWriteDeadline)

If there are other relevant functions/methods, please point them out.

We also need to pay particular attention to time.Sub and the related functions time.Until and time.Since. Currently time.Sub sometimes computes "real time" and sometimes computes "program time". This is documented in the time package docs, which say

On some systems the monotonic clock will stop if the computer goes to sleep. On such a system, t.Sub(u) may not accurately reflect the actual time that passed between t and u.

If we are going to make changes in this area, it would be nice to eliminate this inconsistency if possible.

@ianlancetaylor
Copy link
Contributor

As others have said, one way that an API can permit programs to distinguish between "real time" and "program time" is to use the difference between time.Duration and time.Time. Currently time.NewTimer takes an argument of type time.Duration. We could define that as meaning to send a value on the channel after that duration in "program time" (I believe that is the current meaning on all systems other than Windows 8 and later).

We could add a new function time.NewTimerAt which would take an argument of time.Time. That would be defined as sending a value on the channel on or shortly after that instant in time. Thus time.NewTimer(d) would measure time in "program time" while time.NewTimerAt(time.Now().Add(d) would measure time in "real time."

A disadvantage of this approach is that net.Conn.SetDeadline and friends already take a time.Time value. By analogy with the change in timers, we would expect to change the existing meaning of net.Conn.SetDeadline to measure in "real time" rather than "program time". But that change might affect existing programs that use deadlines for network communication on the same machine.

@ianlancetaylor
Copy link
Contributor

Turning to implementation, I believe that we can straightforwardly implement both "real time" and "program time" timers if we have a way to receive a notification that the system has woken up from a suspension. Unfortunately, as far as I know, GNU/Linux provides no such notification capability. We can detect a suspension after the fact by comparing the results of clock_gettime using CLOCK_MONOTONIC and CLOCK_BOOTTIME. But we don't want to be continually checking those times.

The problem is that for any sort of timer we need to have a way to sleep until the timer is ready to fire. To support both timers that work in "program time" and timers that work in "real time" we need to have a way to sleep in "program time" and in "real time", or a way to detect that we've woken up from suspend and need to recalculate sleep times.

As far as I know, the only way to sleep in "real time" on GNU/Linux is via timer_create, which is somewhat horrible in that it reports timer completion by sending a signal.

I don't know what is available on other systems.

@networkimprov
Copy link

systemd sees suspend/resume events. Go could provide an optional resume hook that pings a unix socket or other pollable widget.

@zx2c4
Copy link
Contributor Author

zx2c4 commented Dec 14, 2019

As far as I know, the only way to sleep in "real time" on GNU/Linux is via timer_create, which is somewhat horrible in that it reports timer completion by sending a signal.

There's timerfd_create(2), actually, which gives you a normal fd that you can then netpoll on. I talked to tglx, the kernel timer maintainer, a few months ago about adding boottime support to futex(2), and he seemed interested, but of course kernel changes mean something far off in the future.

I'll do some research on Linux sleep resumption notification mechanisms. Tailing dmesg sounds pretty bad. Hooking into systemd or android power binder misery or similar sounds less bad but still bad. I wonder if there's some magic sysfs file we can epoll on. I'll look around.

I agree with @zephyrtronium that a global setting as proposed in possibilities d and e is a non-starter. Large programs include many packages, and those packages may legitimately have different requirements.

Yea, I was afraid of that. I guess runtime has a lot of global nobs, but I understand that you don't want to add more, especially important ones.

Possibility a is essentially #24595 as a choice made for all systems where possible.

Sounds like you don't like that either. I'm biased based on my own needs of course, but something to consider is the slight misunderstanding implied by @aclements's original nomenclature we've been using. "Program time" isn't actually the time spent by a program, since the scheduler implies it might not actually be running that whole time, or might be SIGSTOP'd or something. That means that ordinary programs may very well see large leaps in "program time", just as they will for "real time" in the case of resuming from suspend. With large leaps being a potential either way, what's the point of the distinction? Seems like programs are better off coding around "real time" assumptions since "program time" assumptions don't actually apply in the case of SIGSTOP or scheduler contention and such. Maybe the distinction between the two is only important for something like watchdog timers? That seems like more of the exception than the rule though. But, as said, I'm a bit biased here.

The distinction between possibilities b and c is subtle, and it's not yet clear to me that it matters. If we have a way to detect that a system was suspended, which is the only case where "program time" differs from "real time", then we can always implement timers that use "real time".

Do you mean to say that we could perhaps add the new API without needing a function to determine whether or not we have a detection mechanism (or without returning an error), because in the case where we don't have a detection mechanism, there is no S3 sleep anyway, so we might as well consider them the same? That would be fine with me, I suppose.

If there are other relevant functions/methods, please point them out.

time.Ticker is another one.

We also need to pay particular attention to time.Sub and the related functions time.Until and time.Since. Currently time.Sub sometimes computes "real time" and sometimes computes "program time". This is documented in the time package docs, which say

On some systems the monotonic clock will stop if the computer goes to sleep. On such a system, t.Sub(u) may not accurately reflect the actual time that passed between t and u.

Right now a time object has internally both wall time and program time. We could slow it down further and have it always fetch real time. And then introduce more comparison functions. But that seems tricky. Alternatively, we could change the construction functions, for NowReal() and NowProgram() or something, but then there are hairy issues on what to do when comparing unlike objects. Or maybe we'll just want to introduce a whole new object all together?

A disadvantage of this approach is that net.Conn.SetDeadline and friends already take a time.Time value. By analogy with the change in timers, we would expect to change the existing meaning of net.Conn.SetDeadline to measure in "real time" rather than "program time". But that change might affect existing programs that use deadlines for network communication on the same machine.

But actually, isn't the net case the one area where we actually do want to prefer "real time"? For example, see Brad's CLs where he tries to approximate "real time" by using "wall time" (via Round(0)). So maybe that's okay.

@networkimprov
Copy link

On laptops, programs are rarely stopped, and if they are, it affects network peers on the same machine. Suspend/resume is frequent, and does not affect peers on the same machine.

@ianlancetaylor
Copy link
Contributor

Yea, I was afraid of that. I guess runtime has a lot of global nobs, but I understand that you don't want to add more, especially important ones.

As far as I know the runtime package does not have any knobs that change the meaning of any APIs.

@ianlancetaylor
Copy link
Contributor

I did forget about timerfd_create, thanks for pointing it out.

@ianlancetaylor
Copy link
Contributor

But actually, isn't the net case the one area where we actually do want to prefer "real time"?

In general, yes, but not always. For processes communicating on the same system, "program time" may be expected. We could go ahead and decide to change it anyhow, but it requires thought.

@ianlancetaylor
Copy link
Contributor

Sounds like you don't like that either. I'm biased based on my own needs of course, but something to consider is the slight misunderstanding implied by @aclements's original nomenclature we've been using. "Program time" isn't actually the time spent by a program, since the scheduler implies it might not actually be running that whole time, or might be SIGSTOP'd or something. That means that ordinary programs may very well see large leaps in "program time", just as they will for "real time" in the case of resuming from suspend. With large leaps being a potential either way, what's the point of the distinction? Seems like programs are better off coding around "real time" assumptions since "program time" assumptions don't actually apply in the case of SIGSTOP or scheduler contention and such. Maybe the distinction between the two is only important for something like watchdog timers? That seems like more of the exception than the rule though. But, as said, I'm a bit biased here.

It's true of course that "program time" (which I agree is a minomer) can see delays. But consider the case of an interactive game with no network connection that sets up dozens of timers for different times that cause events to occur in the game. If the user suspends a laptop while playing the game, those timers should also suspend; they shouldn't all fire when the laptop is resumed.

@networkimprov
Copy link

Assuming the Windows runtime switches to program/monotonic time, it might help to provide a switch (e.g. env var) to return it to real/boot time.

Conversely, it would help for 1.14 to have a switch to enable program time; the work for that is mostly done.

@zx2c4
Copy link
Contributor Author

zx2c4 commented Dec 16, 2019

Alright here's where we are with technical solutions for supporting "real time":

Windows

  • Reading from 0x7ffe0008 returns "real time", as we do now.
  • The WaitFor functions we're using for sleeps use "real time" on Windows 7.
  • The WaitFor functions we're using for sleeps use "program time" on Windows 8+, but PowerRegisterSuspendResumeNotification is also available for Windows 8+, so we can interrupt the sleep and recalculate. This is what the Go runtime does now.

Linux

  • clock_gettime(CLOCK_BOOTTIME) returns "real time" and is in the VDSO in modern kernels.
  • timerfd_create(CLOCK_BOOTTIME) creates something we can sleep with in "real time".

macOS

  • mach_continuous_time returns "real time".
  • Passing in NOTE_MACH_CONTINUOUS_TIME as the flags argument to any of the event function causes the sleep to be in "real time".

I haven't looked into the BSDs yet.

@networkimprov
Copy link

It might be essential to make real timers on Windows use a high-precision source; current resolution is 2ms, which isn't enough for some protocols, for example #29485 (comment).

Instead of changing time.Sub() to compute real time, we could consider an API which provides any delta between program & real time, e.g. func SuspendedTotal() Duration.

Re SetDeadline() computing a duration internally, I'd suggest this API, and deprecating Set*Deadline().

func (c *T) SetDeadlineAt(t time.Time)
func (c *T) SetDeadlineFor(d time.Duration)

@balasanjay
Copy link
Contributor

SetDeadline is also on interfaces (like net.Conn), where we cannot add methods directly.

@networkimprov
Copy link

True; requiring a type-assert to call a new method.

Maybe new fields in net.Dialer & net.ListenConfig should set the meaning of the Set*Deadline() time.Time argument.

@beoran
Copy link

beoran commented Dec 17, 2019

I would prefer a whole new API, in a separate package, 'time/realtime' that has all the functionality of the time package but that explicitly uses real time clocks on all platform. The current time package then should use program time clocks on all platforms. That is IMHO the cleanest solution.

And @ianlancetaylor, as an aside, thanks of thinking about game programming using Go. I definitely appreciate that core Go developers are starting to take non-server applications for Go seriously. As you say, for single player games, timers of game events really need to be suspended when the program is suspended.

As an aside, over 20 years ago, I learned programming because I wanted to make games. Likewise I want to teach my daughter how to program Go by letting her make games. If we want Go to become more popular, then we have to make it even easier to use for game programming as well.

@networkimprov
Copy link

@ianlancetaylor is the Go team working on a concrete proposal? If so, is there an ETA?

Presumably we'd want to see a draft implementation early in 1.15 cycle...

@ianlancetaylor
Copy link
Contributor

There is no ETA.

@beoran
Copy link

beoran commented Jan 15, 2020

@networkimprov Perhaps this doesn't need to be in the Go standard library? I think it would not be that hard to write an external realtime module that uses real time timers with high precision.

@networkimprov
Copy link

@beoran, it's discussed at length in #35482

@beoran
Copy link

beoran commented Jan 15, 2020

I looked at that discussion, and after that I'm not very happy with the current state of things. What I would like is consistency between platforms. The WireGuard project should not have to use a patched runtime on Linux, but to me it looks like they got a patch into the Windows Go runtime that worsened the inconsistency overall.

As I said above, I would solve the problem by making 'time' use program time/wall time, and have a 'time/realtime' package for real time/monotonic clocks. The former can be done relatively easily in Go itself, the latter could be implemented as a third party library at first.

@rsc
Copy link
Contributor

rsc commented May 26, 2021

Overall it seems like people agree on making Since and Until just work.

@rsc
Copy link
Contributor

rsc commented May 26, 2021

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

@networkimprov
Copy link

networkimprov commented May 26, 2021

Under this plan, it's easy to mistakenly call time.Now() when you meant time.ExternalNow(), and then make a comparison that silently yields the wrong result. Isn't a panic preferable to a bug?

@hherman1
Copy link

I’m not sure about this case specifically, but sometimes crashing is much worse for users than slightly wrong results.

@networkimprov
Copy link

In this case, you'd see the panic when testing; it wouldn't be intermittent.

@rsc
Copy link
Contributor

rsc commented Jun 2, 2021

Under this plan, it's easy to mistakenly call time.Now() when you meant time.ExternalNow(), and then make a comparison that silently yields the wrong result. Isn't a panic preferable to a bug?

No.

@rsc
Copy link
Contributor

rsc commented Jun 2, 2021

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

@rsc rsc changed the title proposal: time: add ExternalNow, etc for external time and timers time: add ExternalNow, etc for external time and timers Jun 2, 2021
@rsc rsc modified the milestones: Proposal, Backlog Jun 2, 2021
@antichris
Copy link

Isn't a panic preferable to a bug?

No.

That's a surprise. I thought that failing fast and loud is preferable to silently accumulating error. Especially considering that panics are not crashes and can be recovered from.

@hherman1
Copy link

hherman1 commented Jun 3, 2021

If you make it fail-able then you make it harder to use, as devs need to consider handling that failure instead of just having a guaranteed successful function

@networkimprov
Copy link

networkimprov commented Jun 3, 2021

Hunter, what you suggest is akin to saying that out-of-bounds writes like s[:1][8] = v should silently do nothing.

@hherman1
Copy link

hherman1 commented Jun 3, 2021

I don’t think it’s black and white. I do think it’s undeniable that can-panic is part of an API and makes that API harder to use. It can still sometimes be the right decision, such as in the out of bounds array access case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Accepted
Development

No branches or pull requests