Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SafePoint times to UpdateGraphProcessor cycle times logging. #2123

Merged
merged 17 commits into from
Apr 1, 2022

Conversation

jcferretti
Copy link
Member

@jcferretti jcferretti commented Mar 25, 2022

Given Charles comments (below), I've re-done the implementation based on HotspotRuntimeMBean instead of GarbageCollectorMXBean.

The information from HotspotRuntimeMBean contains count and times for stop the world pauses due to safepointing; every GC uses a safepoint, not every safepoint is GC.

See

--

Logging HotSpot Safepoint times using data from HotSpot introspection (see https://stackoverflow.com/questions/42514817/printgcapplicationstoppedtime/42515743#42515743)
Thanks to Devin for the help hooking up this in a way that
(1) Works for both jdk 11 and 17.
(2) Does not overly burden the system with additional dependencies: we are adding as an optional a service; if not present is not used.

Also format millis output to a fixed number of decimal places.

This is how it looked before:

server_1        | 2022-03-24T20:51:10.289Z | EFAULT.refreshThread |  INFO | .e.u.UpdateGraphProcessor | Minimal Update Graph Processor cycle times: 18.871795ms / 27 cycles = 0.6989553703703704ms/cycle average
server_1        | 2022-03-24T20:51:10.289Z | EFAULT.refreshThread |  INFO | .e.u.UpdateGraphProcessor | Update Graph Processor cycleTime=3341.797276ms, lockWaitTime=3112.12555ms, yieldTime=0.0ms, sleepTime=0.0ms

This is how it looks now:

server_1        | 2022-03-30T03:17:42.926Z | EFAULT.refreshThread |  INFO | .e.u.UpdateGraphProcessor | Minimal Update Graph Processor cycle times: 18.203ms / 27 cycles = 0.674ms/cycle average), safePointTime=0ms
server_1        | 2022-03-30T03:17:42.926Z | EFAULT.refreshThread |  INFO | .e.u.UpdateGraphProcessor | Update Graph Processor cycleTime=3296.973ms, safePointTime=136ms, safePointTimePct=4.12%, safePointSyncTime=4ms, lockWaitTime=3051.758ms, yieldTime=0ms, sleepTime=0ms
server_1        | 2022-03-30T03:17:43.940Z | EFAULT.refreshThread |  INFO | .e.u.UpdateGraphProcessor | Minimal Update Graph Processor cycle times: 30.103ms / 2 cycles = 15.051ms/cycle average), safePointTime=0ms
server_1        | 2022-03-30T03:17:45.778Z | EFAULT.refreshThread |  INFO | .e.u.UpdateGraphProcessor | Update Graph Processor cycleTime=852.066ms, safePointTime=86ms, safePointTimePct=10.09%, safePointSyncTime=3ms, lockWaitTime=0.132ms, yieldTime=0ms, sleepTime=0ms
server_1        | 2022-03-30T03:17:46.586Z | EFAULT.refreshThread |  INFO | .e.u.UpdateGraphProcessor | Update Graph Processor cycleTime=659.362ms, safePointTime=70ms, safePointTimePct=10.62%, safePointSyncTime=3ms, lockWaitTime=0.188ms, yieldTime=0ms, sleepTime=0ms
server_1        | 2022-03-30T03:17:47.433Z | EFAULT.refreshThread |  INFO | .e.u.UpdateGraphProcessor | Update Graph Processor cycleTime=507.558ms, safePointTime=22ms, safePointTimePct=4.33%, safePointSyncTime=0ms, lockWaitTime=0.198ms, yieldTime=0ms, sleepTime=0ms
server_1        | 2022-03-30T03:17:48.464Z | EFAULT.refreshThread |  INFO | .e.u.UpdateGraphProcessor | Update Graph Processor cycleTime=538.352ms, safePointTime=13ms, safePointTimePct=2.41%, safePointSyncTime=0ms, lockWaitTime=0.103ms, yieldTime=0ms, sleepTime=0ms
server_1        | 2022-03-30T03:17:49.465Z | EFAULT.refreshThread |  INFO | .e.u.UpdateGraphProcessor | Update Graph Processor cycleTime=538.972ms, safePointTime=24ms, safePointTimePct=4.45%, safePointSyncTime=0ms, lockWaitTime=0.182ms, yieldTime=0ms, sleepTime=0ms

The idea for computing the number of base 10 digits for an integer came from here: https://stackoverflow.com/questions/25892665/performance-of-log10-function-returning-an-int

@jcferretti jcferretti added this to the Mar 2022 milestone Mar 25, 2022
@jcferretti jcferretti requested a review from cpwright March 25, 2022 05:12
@jcferretti jcferretti self-assigned this Mar 25, 2022
Copy link
Contributor

@cpwright cpwright left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The decimal printout stuff looks nice and improves the output. What I am not sure that I have my head around is if we are actually reporting the right thing. We will report the amount of GC that occurred during a cycle. However, I don't know if that is the most useful metric or really what we should report at all.

I think that separating STW vs. other collection is important. I also think that we're ignoring everything outside the cycle; which may or may not matter. For non-STW collections, I don't know how the things will accumulate when something starts or ends cross a cycle boundary. Will we sometimes have zero if we start but don't finish; and other times have > 100% when we finish but had already started.

Tracking GC separately using the existing GC logs may be a better mechanism than what is being done here. @rcaudy is better at GC tuning and investigation than I am. Also Raffi.

@jcferretti jcferretti changed the title Add GC times to UpdateGraphProcessor cycle times logging. Add SafePoint times to UpdateGraphProcessor cycle times logging. Mar 27, 2022
Copy link
Member

@devinrsmith devinrsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't review any of the logical changes to UGP. I think we need some more polish here before merging.

hotspot/build.gradle Outdated Show resolved Hide resolved
hotspot/gradle.properties Outdated Show resolved Hide resolved
engine/updategraph/build.gradle Outdated Show resolved Hide resolved
settings.gradle Show resolved Hide resolved
Copy link
Member

@rcaudy rcaudy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My review is mostly at a high level. Contrary to expectations, I need not do a line-by-line review. I'll address overall context in the Slack channel.

Copy link
Member

@devinrsmith devinrsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll be going through this to try and support our hotspot-impl in docker use case.

hotspot/build.gradle Outdated Show resolved Hide resolved
server/jetty/build.gradle Outdated Show resolved Hide resolved
server/netty/build.gradle Outdated Show resolved Hide resolved
@jcferretti jcferretti requested a review from niloc132 March 30, 2022 22:32
niloc132
niloc132 previously approved these changes Apr 1, 2022
Copy link
Member

@niloc132 niloc132 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on instead renaming the existing server-jetty/netty to server-jetty-lib etc, and leaving server-jetty as the app itself?

I could see either way here, and have just about convinced myself it is better with this explicit rename (so that the libraries are easier to use downstream), only real complaint is that ./gradlew :server-jetty-app:run is long to get it running - but we really should just make a top-level task if we decide this is important and something people will do.

I also think we should re-add the readmes for netty/jetty, but seems like we should do that separately.

Approved with one comment that I think would improve readability.

server/jetty-app/build.gradle Outdated Show resolved Hide resolved
@jcferretti jcferretti merged commit 714ca9d into deephaven:main Apr 1, 2022
@jcferretti jcferretti deleted the cfs-ugp-cycle-log-gctimes-0 branch April 1, 2022 19:46
@github-actions github-actions bot locked and limited conversation to collaborators Apr 1, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants