runtime: non-cooperative goroutine preemption #24543

aclements · 2018-03-26T19:40:35Z

I propose that we solve #10958 (preemption of tight loops) using non-cooperative preemption techniques. I have a detailed design proposal, which I will post shortly. This issue will track this specific implementation approach, as opposed to the general problem.

Edit: Design doc

Currently, Go currently uses compiler-inserted cooperative preemption points in function prologues. The majority of the time, this is good enough to allow Go developers to ignore preemption and focus on writing clear parallel code, but it has sharp edges that we've seen degrade the developer experience time and time again. When it goes wrong, it goes spectacularly wrong, leading to mysterious system-wide latency issues (#17831, #19241) and sometimes complete freezes (#543, #12553, #13546, #14561, #15442, #17174, #20793, #21053). And because this is a language implementation issue that exists outside of Go's language semantics, these failures are surprising and very difficult to debug.

@dr2chase has put significant effort into prototyping cooperative preemption points in loops, which is one way to solve this problem. However, even sophisticated approaches to this led to unacceptable slow-downs in tight loops (where slow-downs are generally least acceptable).

I propose that the Go implementation switch to non-cooperative preemption using stack and register maps at (essentially) every instruction. This would allow goroutines to be preempted without explicit
preemption checks. This approach will solve the problem of delayed preemption with zero run-time overhead and have side benefits for debugger function calls (#21678).

I've already prototyped significant components of this solution, including constructing register maps and recording stack and register maps at every instruction and so far the results are quite promising.

/cc @drchase @RLH @randall77 @minux

gopherbot · 2018-03-26T19:43:33Z

Change https://golang.org/cl/102600 mentions this issue: design: add 24543-non-cooperative-preemption

gopherbot · 2018-03-26T20:05:34Z

Change https://golang.org/cl/102603 mentions this issue: cmd/compile: detect simple inductive facts in prove

gopherbot · 2018-03-26T20:06:49Z

Change https://golang.org/cl/102604 mentions this issue: cmd/compile: don't produce a past-the-end pointer in range loops

aclements · 2018-03-27T14:31:40Z

Forwarding some questions from @hyangah on the CL:

Are code in cgo (or outside Go) considered non-safe points?

All of cgo is currently considered a safe-point (one of the reasons it's relatively expensive to enter and exit cgo) and this won't change.

Or will runtime be careful not to send signal to the threads who may be in cgo land?

I don't think the runtime can avoid sending signals to threads that may be in cgo without expensive synchronization on common paths, but I don't think it matters. When it enters the runtime signal handler it can recognize that it was in cgo and do the appropriate thing (which will probably be to just ignore it, or maybe queue up an action like stack scanning).

Should users or cgo code avoid using the signal?

It should be okay if cgo code uses the signal, as long as it's correctly chained. I'm hoping to use POSIX real-time signals on systems where they're available, so the runtime will attempt to find one that's unused (which is usually all of them anyway), though that isn't an option on Darwin.

And a question from @randall77 (which I answered on the CL, but should have answered here):

Will we stop using the current preemption technique (the dummy large stack bound) altogether, or will the non-coop preemption just be a backstop?

There's really no cost to the current technique and we'll continue to rely on it in the runtime for the foreseeable future, so my current plan is to leave it in. However, we could be much more aggressive about removing stack bounds checks (for example if we can prove that a whole call tree will fit in the nosplit zone).

TocarIP · 2018-03-27T16:05:18Z

So it is still possible to make goroutine nonpreemptable with something like:
sha256.Sum(make([]byte,1000000000))
where inner loop is written in asm?

aclements · 2018-03-27T16:25:02Z

Yes, that would still make a goroutine non-preemptible. However, with some extra annotations in the assembly to indicate registers containing pointers it will become preemptible without any extra work or run-time overhead to reach an explicit safe-point. In the case of sha256.Sum these annotations would probably be trivial since it will never construct a pointer that isn't shadowed by the arguments (so it can claim there are no pointers in registers).

I'll add a paragraph to the design doc about this.

komuw · 2018-03-28T07:21:03Z

will the design doc be posted here?

aclements · 2018-03-28T12:23:31Z

The design doc is under review here: https://golang.org/cl/102600 (As a reminder, please only post editing comments to the CL itself and keep technical discussion on the GitHub issue.)

For golang/go#24543. Change-Id: Iba313a963aafcd93521bb9e006cb32d1f242301b Reviewed-on: https://go-review.googlesource.com/102600 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Keith Randall <khr@golang.org>

aclements · 2018-03-28T21:34:12Z

The doc is now submitted: Proposal: Non-cooperative goroutine preemption

mtstickney · 2018-03-30T18:40:54Z

Disclaimer: I'm not a platform expert, or an expert on language implementations, or involved with go aside from having written a few toy programs in it. That said:

There's a (potentially) fatal flaw here: GetThreadContext doesn't actually work on Windows (see here for details). There are several lisp implementations that have exhibited crashes on that platform because they tried to use GetThreadContext/SetThreadContext to implement preemptive signals on Windows.

As some old notes for SBCL point out, Windows has no working version of preemptive signals without loading a kernel driver, which is generally prohibitive for applications.

JamesBielby · 2018-03-31T06:10:19Z

I think the example code to avoid creating a past-the-end pointer has a problem if the slice has a capacity of 0. You need to declare _p after the first if statement.

creker · 2018-03-31T12:04:04Z

@mtstickney looks like it's true but we can look for other implementations, how they go about the same problem. CoreCLR talks about the same problem - they need to preempt threads for GC and talk about the same bugs with wrong thread context. And they also talk about how they solve it without ditching SuspendThread altogether by using redirection.

I'm not an expert in this kind of stuff so I'm sorry if this has nothing to do with solving the problem here.

mtstickney · 2018-03-31T17:18:34Z

@creker Nor me, so we're in the same boat there. I hadn't seen the CoreCLR reference before, but that's the same idea as the lisp approach: SuspendThread, retrieve the current register set with GetThreadContext, change IP to point to the signal code to be run, ResumeThread, then when the handler is finished restore the original registers with SetThreadContext.

The trick is capturing the original register set: you can either do it with an OS primitive (GetThreadContext, which is buggy), or roll your own code for it. If you do the latter, you're at risk for getting a bogus set of registers because your register-collecting code is in user-mode, and could be preempted by a kernel APC.

It looks like on some Windows versions, some of the time, you can detect and avoid the race conditions with GetThreadContext (see this post, particularly the comments concerning CONTEXT_EXCEPTION_REQUEST). The CoreCLR code seems to make some attempts to work around the race condition, although I don't know if it's suitable here.

aclements · 2018-03-31T20:08:38Z

Thanks for the pointers about GetThreadContext! That's really interesting and good to know, but I think it's actually not a problem.

For GC preemption, we can always resume the same goroutine on the same thread after preemption, so there's no need to call SetThreadContext to hijack the thread. We just need to observe its state; not run something else on that thread. Furthermore, my understanding is that GetThreadContext doesn't reliably return all registers if the thread is in a syscall, but in this case there won't be any live pointers in registers anyway (any pointer arguments to the syscall are shadowed on the Go wrapper's stack). Hence, we only need to retrieve the PC and SP in this case. Even this may not matter, since we currently treat a syscall as a giant GC safe-point, so we already save the information we need on the way in to the syscall.

For scheduler preemption, things are a bit more complicated, but I think still okay. In this case we would need to call SetThreadContext to hijack the thread, but we would only do this to threads at Go safe-points, meaning we'd never preempt something in a syscall. Today, if a goroutine has been in a syscall for too long, we don't hijack the thread, we simply flag that it should block upon returning from the syscall and schedule the next goroutine on a different thread (creating a new one or going to a pool). We would keep using that mechanism for rescheduling goroutines that are in system calls.

gopherbot · 2018-04-20T16:30:07Z

Change https://golang.org/cl/108497 mentions this issue: cmd/compile: teach Haspointer about TSSA and TTUPLE

gopherbot · 2018-04-20T16:30:07Z

Change https://golang.org/cl/108496 mentions this issue: cmd/compile: don't lower OpConvert

gopherbot · 2018-04-20T16:41:33Z

Change https://golang.org/cl/108498 mentions this issue: cmd/compile: don't compact liveness maps in place

Currently, each architecture lowers OpConvert to an arch-specific OpXXXconvert. This is silly because OpConvert means the same thing on all architectures and is logically a no-op that exists only to keep track of conversions to and from unsafe.Pointer. Furthermore, lowering it makes it harder to recognize in other analyses, particularly liveness analysis. This CL eliminates the lowering of OpConvert, leaving it as the generic op until code generation time. The main complexity here is that we still need to register-allocate OpConvert operations. Currently, each arch's lowered OpConvert specifies all GP registers in its register mask. Ideally, OpConvert wouldn't affect value homing at all, and we could just copy the home of OpConvert's source, but this can potentially home an OpConvert in a LocalSlot, which neither regalloc nor stackalloc expect. Rather than try to disentangle this assumption from regalloc and stackalloc, we continue to register-allocate OpConvert, but teach regalloc that OpConvert can be allocated to any allocatable GP register. For #24543. Change-Id: I795a6aee5fd94d4444a7bafac3838a400c9f7bb6 Reviewed-on: https://go-review.googlesource.com/108496 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>

These will appear when tracking live pointers in registers, so we need to know whether they have pointers. For #24543. Change-Id: I2edccee39ca989473db4b3e7875ff166808ac141 Reviewed-on: https://go-review.googlesource.com/108497 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: David Chase <drchase@google.com>

Currently Liveness.compact rewrites the Liveness.livevars slice in place. However, we're about to add register maps, which we'll want to track in livevars, but compact independently from the stack maps. Hence, this CL modifies Liveness.compact to consume Liveness.livevars and produce a new slice of deduplicated stack maps. This is somewhat clearer anyway because it avoids potential confusion over how Liveness.livevars is indexed. Passes toolstash -cmp. For #24543. Change-Id: I7093fbc71143f8a29e677aa30c96e501f953ca2b Reviewed-on: https://go-review.googlesource.com/108498 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>

gopherbot · 2018-04-25T21:33:23Z

Change https://golang.org/cl/109351 mentions this issue: cmd/compile: dense numbering for GP registers

gopherbot · 2018-04-25T21:33:24Z

Change https://golang.org/cl/109353 mentions this issue: cmd/compile, cmd/internal/obj: record register maps in binary

gopherbot · 2020-01-08T19:46:00Z

Change https://golang.org/cl/213837 mentions this issue: runtime: protect against external code calling ExitProcess

On Windows, we implement asynchronous preemption using SuspendThread to suspend other threads in our process. However, SuspendThread is itself actually asynchronous (it enqueues a kernel "asynchronous procedure call" and returns). Unfortunately, Windows' ExitProcess API kills all threads except the calling one and then runs APCs. As a result, if SuspendThread and ExitProcess are called simultaneously, the exiting thread can be suspended and the suspending thread can be exited, leaving behind a ghost process consisting of a single thread that's suspended. We've already protected against the runtime's own calls to ExitProcess, but if Go code calls external code, there's nothing stopping that code from calling ExitProcess. For example, in #35775, our own call to racefini leads to C code calling ExitProcess and occasionally causing a deadlock. This CL fixes this by introducing synchronization between calling external code on Windows and preemption. It adds an atomic field to the M that participates in a simple CAS-based synchronization protocol to prevent suspending a thread running external code. We use this to protect cgocall (which is used for both cgo calls and system calls on Windows) and racefini. Tested by running the flag package's TestParse test compiled in race mode in a loop. Before this change, this would reliably deadlock after a few minutes. Fixes #35775. Updates #10958, #24543. Change-Id: I50d847abcdc2688b4f71eee6a75eca0f2fee892c Reviewed-on: https://go-review.googlesource.com/c/go/+/213837 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: David Chase <drchase@google.com>

networkimprov · 2020-01-23T18:46:40Z

List of todo items posted in #36365

ianlancetaylor · 2020-05-19T00:49:12Z

Is there a reason to leave this issue open, given the existence of #36365?

aclements · 2020-05-19T00:55:14Z

Nope! Closing.

szmcdull · 2020-07-03T03:40:33Z

Is there any way to disable preemption altogether? In most situation I want a simple single-threaded asynchronous model similar to node.js, where locks are mostly not needed.

Now even if I specify runtime.GOMAXPROCS(1), I have to protect against things like panic: concurrent map iteration and map write.

networkimprov · 2020-07-03T04:28:51Z

@szmcdull have you tried this runtime switch? Note that Go had preemption before 1.14...

$ GODEBUG=asyncpreemptoff=1 ./your_app arguments ...

szmcdull · 2020-07-03T04:41:16Z

@szmcdull have you tried this runtime switch? Note that Go had preemption before 1.14...
$ GODEBUG=asyncpreemptoff=1 ./your_app arguments ...

Yes I tried. But still got panic: concurrent map iteration and map

networkimprov · 2020-07-03T04:49:53Z

I think you were just lucky that you didn't see that before 1.14 :-)

Try https://golang.org/pkg/sync/#Map

For further Q's, I refer you to golang-nuts. You'll get more & faster responses there, generally.

The user notifier feature allows for filtering of seccomp notifications in userspace. While the user notifier is handling the syscall, the notifying process can be preempted, thus ending the notification. This has become a growing problem, as Golang has adopted signal based async preemption[1]. In this, it will preempt every 10ms, thus leaving the supervisor less than 10ms to respond to a given notification. If the syscall require I/O (mount, connect) on behalf of the process, it can easily take 10ms. This allows the supervisor to set a flag that moves the process into a state where it is only killable by terminating signals as opposed to all signals. Signed-off-by: Sargun Dhillon <sargun@sargun.me> [1]: golang/go#24543

The user notifier feature allows for filtering of seccomp notifications in userspace. While the user notifier is handling the syscall, the notifying process can be preempted, thus ending the notification. This has become a growing problem, as Golang has adopted signal based async preemption[1]. In this, it will preempt every 10ms, thus leaving the supervisor less than 10ms to respond to a given notification. If the syscall require I/O (mount, connect) on behalf of the process, it can easily take 10ms. This allows the supervisor to set a flag that moves the process into a state where it is only killable by terminating signals as opposed to all signals. The process can still be terminated before the supervisor receives the notification. Signed-off-by: Sargun Dhillon <sargun@sargun.me> [1]: golang/go#24543

aclements added this to the Go1.12 milestone Mar 26, 2018

aclements self-assigned this Mar 26, 2018

gopherbot added the Proposal label Mar 26, 2018

komuw mentioned this issue Apr 5, 2018

Add ability to safely call functions go-delve/delve#119

Closed

aclements added the Proposal-Accepted label Apr 9, 2018

chai2010 mentioned this issue Apr 23, 2018

Go1.12可能支持抢占式goroutine调度 chai2010/advanced-go-programming-book#69

Closed

bradfitz mentioned this issue Apr 24, 2018

crypto/elliptic: hang in doubleJacobian with Curve P-521 #25054

Closed

toothrot modified the milestones: Go1.14, Go1.15 Feb 25, 2020

dumblob mentioned this issue Feb 28, 2020

V Concurrency Considerations vlang/v#3814

Closed

mknyszek mentioned this issue Mar 11, 2020

runtime: print all threads in GOTRACEBACK >= all #13161

Open

mgkanani mentioned this issue Mar 22, 2020

Optimize the performance of GetTS tikv/pd#1847

Open

jonhoo mentioned this issue Apr 1, 2020

blog: reducing tail latencies with auto yielding tokio-rs/website#422

Merged

OneOfOne mentioned this issue Apr 7, 2020

syscall: signal SIGURG(I/O condition) on go1.14.1, but no SIGURG signal before go 1.14 #38290

Closed

tri-adam mentioned this issue Apr 21, 2020

Fix SIGURG Handling apptainer/singularity#5233

Merged

aclements closed this as completed May 19, 2020

mknyszek mentioned this issue Nov 17, 2020

runtime: multi-ms sweep termination pauses (second edition) #42642

Closed

github-actions bot mentioned this issue Apr 2, 2021

Garbage Collection In Go : Part I - Semantics Wusuluren/website-snapshot#3

Closed

eikenb mentioned this issue Jun 22, 2021

ignore runtime premeption signals, SIGURG (was: receiving signal "urgent I/O condition") hashicorp/consul-template#1486

Closed

lithdew mentioned this issue Jul 2, 2021

perf: provide safepoints for pre-emptive scheduling oasisprotocol/curve25519-voi#72

Closed

golang locked and limited conversation to collaborators Jul 3, 2021

gopherbot added the FrozenDueToAge label Jul 3, 2021

rsc unassigned aclements Jun 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime: non-cooperative goroutine preemption #24543

runtime: non-cooperative goroutine preemption #24543

aclements commented Mar 26, 2018 •

edited

Loading

gopherbot commented Mar 26, 2018

gopherbot commented Mar 26, 2018

gopherbot commented Mar 26, 2018

aclements commented Mar 27, 2018

TocarIP commented Mar 27, 2018 •

edited

Loading

aclements commented Mar 27, 2018

komuw commented Mar 28, 2018

aclements commented Mar 28, 2018 via email •

edited

Loading

aclements commented Mar 28, 2018

mtstickney commented Mar 30, 2018

JamesBielby commented Mar 31, 2018

creker commented Mar 31, 2018 •

edited

Loading

mtstickney commented Mar 31, 2018

aclements commented Mar 31, 2018

gopherbot commented Apr 20, 2018

gopherbot commented Apr 20, 2018

gopherbot commented Apr 20, 2018

gopherbot commented Apr 25, 2018

gopherbot commented Apr 25, 2018

gopherbot commented Jan 8, 2020

networkimprov commented Jan 23, 2020

ianlancetaylor commented May 19, 2020

aclements commented May 19, 2020

szmcdull commented Jul 3, 2020 •

edited

Loading

networkimprov commented Jul 3, 2020

szmcdull commented Jul 3, 2020

networkimprov commented Jul 3, 2020

runtime: non-cooperative goroutine preemption #24543

runtime: non-cooperative goroutine preemption #24543

Comments

aclements commented Mar 26, 2018 • edited Loading

gopherbot commented Mar 26, 2018

gopherbot commented Mar 26, 2018

gopherbot commented Mar 26, 2018

aclements commented Mar 27, 2018

TocarIP commented Mar 27, 2018 • edited Loading

aclements commented Mar 27, 2018

komuw commented Mar 28, 2018

aclements commented Mar 28, 2018 via email • edited Loading

aclements commented Mar 28, 2018

mtstickney commented Mar 30, 2018

JamesBielby commented Mar 31, 2018

creker commented Mar 31, 2018 • edited Loading

mtstickney commented Mar 31, 2018

aclements commented Mar 31, 2018

gopherbot commented Apr 20, 2018

gopherbot commented Apr 20, 2018

gopherbot commented Apr 20, 2018

gopherbot commented Apr 25, 2018

gopherbot commented Apr 25, 2018

gopherbot commented Jan 8, 2020

networkimprov commented Jan 23, 2020

ianlancetaylor commented May 19, 2020

aclements commented May 19, 2020

szmcdull commented Jul 3, 2020 • edited Loading

networkimprov commented Jul 3, 2020

szmcdull commented Jul 3, 2020

networkimprov commented Jul 3, 2020

aclements commented Mar 26, 2018 •

edited

Loading

TocarIP commented Mar 27, 2018 •

edited

Loading

aclements commented Mar 28, 2018 via email •

edited

Loading

creker commented Mar 31, 2018 •

edited

Loading

szmcdull commented Jul 3, 2020 •

edited

Loading