-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: cannot ReadMemStats during GC #19812
Comments
This issue is kind of a pain for CockroachDB because we use FWIW, we'd rather have more approximate heap stats that we can collect cheaply than more accurate heap stats that can only be collected at specific points. |
Note that there is a work around in client_golang now, see prometheus/client_golang#568 (check the final state of the PR, not the whole discussion). It will be released any moment as v0.9.3. |
Meaning: If you use prometheus/client_golang, you are good now. If you are doing it on your own, it might serve as inspiration. |
Thanks for the reports of this happening in real systems. I knew this was a theoretical problem, but this is the first data I've had on it happening in production. Making the stats approximate doesn't really help in this case. We're just doing some heavy-handed synchronization that we need to figure out how to break up. |
Is it acceptable to cache and return the stats before GC starts? |
I spoke with @aclements and there may be a simple way to do this. Most of the trickiness arises from dealing with GOMAXPROCS changing during the mark phase, but we can also ignore that case by just adding a new semaphore. Turns out we also have to retain the old behavior for trace start for other reasons. I have a quick and dirty change which seems to work OK in |
Change https://golang.org/cl/182657 mentions this issue: |
This change makes it so that worldsema isn't held across the mark phase. This means that various operations like ReadMemStats may now stop the world during the mark phase, reducing latency on such operations. Only three such operations are still no longer allowed to occur during marking: GOMAXPROCS, StartTrace, and StopTrace. For the former it's because any change to GOMAXPROCS impacts GC mark background worker scheduling and the details there are tricky. For the latter two it's because tracing needs to observe consistent GC start and GC end events, and if StartTrace or StopTrace may stop the world during marking, then it's possible for it to see a GC end event without a start or GC start event without an end, respectively. To ensure that GOMAXPROCS and StartTrace/StopTrace cannot proceed until marking is complete, the runtime now holds a new semaphore, gcsema, across the mark phase just like it used to with worldsema. Fixes golang#19812. Change-Id: I15d43ed184f711b3d104e8f267fb86e335f86bf9 Reviewed-on: https://go-review.googlesource.com/c/go/+/182657 Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
Change https://golang.org/cl/215157 mentions this issue: |
Change https://golang.org/cl/216358 mentions this issue: |
This reverts commit 7b294cd, CL 182657. Reason for revert: This change may be causing latency problems for applications which call ReadMemStats, because it may cause all goroutines to stop until the GC completes. https://golang.org/cl/215157 fixes this problem, but it's too late in the cycle to land that. Updates #19812. Change-Id: Iaa26f4dec9b06b9db2a771a44e45f58d0aa8f26d Reviewed-on: https://go-review.googlesource.com/c/go/+/216358 Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reopening because of revert. |
We're pretty sure we know how to fix this properly now. I'll have a more complete patch ready early in the Go 1.15 cycle. |
Change https://golang.org/cl/216557 mentions this issue: |
This issue is currently labeled as early-in-cycle for Go 1.15. That time is now, so friendly ping. If it no longer needs to be done early in cycle, that label can be removed. |
Change https://golang.org/cl/220177 mentions this issue: |
@dmitshur I'm pinging my reviewers. This should go in ASAP given how much trouble it gave us last release. It should be a lot better this time though; the fix to this bug last time was a bit sloppy, and everything here has been re-done in a more principled manner (with a new benchmark). |
Currently, dedicated background mark workers are essentially always non-preemptible. This change makes it so that dedicated background mark workers park if their preemption flag is set and someone is trying to STW, allowing them to do so. This change prepares us for allowing a STW to happen (and happen promptly) during GC marking in a follow-up change. Updates #19812. Change-Id: I67fb6085bf0f0aebd18ca500172767818a1f15e3 Reviewed-on: https://go-review.googlesource.com/c/go/+/215157 Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Austin Clements <austin@google.com>
This change adds a benchmark to the runtime which measures ReadMemStats latencies. It generates allocations with lots of pointers to keep the GC busy while hitting ReadMemStats and measuring the time it takes to complete. Updates #19812. Change-Id: I7a76aaf497ba5324d3c7a7b3df32461b3e6c3ac8 Reviewed-on: https://go-review.googlesource.com/c/go/+/220177 Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
Change https://golang.org/cl/228337 mentions this issue: |
Currently, as a result of us releasing worldsema now to allow STW events during a mark phase, we release worldsema between starting the world and having the goroutine block in STW mode. This inserts preemption points which, if followed through, could lead to a deadlock. Specifically, because user goroutine scheduling is disabled in STW mode, the goroutine will block before properly releasing worldsema. The fix here is to prevent preemption while releasing the worldsema. Fixes #38404. Updates #19812. Change-Id: I8ed5b3aa108ab2e4680c38e77b0584fb75690e3d Reviewed-on: https://go-review.googlesource.com/c/go/+/228337 Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Austin Clements <austin@google.com>
GC holds the worldsema across the entire garbage collection, which means nothing else can stop the world, even during concurrent marking. Anything that does try to stop the world will block until GC is done. Probably the most interesting such operation is ReadMemStats.
We should fix GC to only hold worldsema during sweep termination and mark termination. The trickiest part of this is probably dealing with GOMAXPROCS changing during GC, which will affect mark worker scheduling.
/cc @RLH
The text was updated successfully, but these errors were encountered: