-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GR-47109] Implement JFR Event Throttling. #6899
Conversation
…test of distribution
Should we add a changelog entry for the new event? |
Yup that's true, I'll add a note to the change log. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, I added a couple of questions and comments.
@@ -237,7 +237,7 @@ private static Object slowPathNewInstanceWithoutAllocating(DynamicHub hub) { | |||
AlignedHeader newTlab = HeapImpl.getChunkProvider().produceAlignedChunk(); | |||
return allocateInstanceInNewTlab(hub, newTlab); | |||
} finally { | |||
ObjectAllocationInNewTLABEvent.emit(startTicks, hub, LayoutEncoding.getPureInstanceAllocationSize(hub.getLayoutEncoding()), HeapParameters.getAlignedHeapChunkSize()); | |||
JfrAllocationEvents.emit(startTicks, DynamicHub.toClass(hub), LayoutEncoding.getPureInstanceAllocationSize(hub.getLayoutEncoding()), HeapParameters.getAlignedHeapChunkSize()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is easier to do the conversion from DynamicHub
to Class
inside of JfrAllocationEvents.emit(...)
.
The size computation can be moved out of allocateInstanceInNewTlab
, then there is no need to call LayoutEncoding.getPureInstanceAllocationSize(hub.getLayoutEncoding())
again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea! Done
/** | ||
* Each event that allows throttling should have its own throttler instance. | ||
*/ | ||
public class JfrThrottler { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose this class is based on HotSpot code? Please document the git revision, the JDK version, and the C++ source files that this is based on.
Please also do the same for JfrThrottlerWindow
.
*/ | ||
public class JfrThrottler { | ||
private static final long SECOND_IN_MS = 1000; | ||
private static final long SECOND_IN_NS = 1000000 * SECOND_IN_MS; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are already constants for that, see TimeUtils
.
if (HasJfrSupport.get()) { | ||
emitObjectAllocationInNewTLAB(startTicks, clazz, allocationSize, tlabSize); | ||
|
||
if (shouldEmitObjectAllocationSample() && SubstrateJVM.get().shouldCommit(JfrEvent.ObjectAllocationSample)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that JfrEvent.shouldEmit()
should always call SubstrateJVM.get().shouldCommit(...)
. Otherwise, it is easy to forget that throttling is necessary for certain events.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true. I've taken your suggestion and also made all the throttler code uninterruptible
. It would now have to be uninterruptible
anyway due to locking without transition (see other responses).
} | ||
|
||
/** | ||
* This method exists as a slight optimization to avoid entering the sampler code if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that any optimization is necessary in this case. This is only called by slow-path code and the overhead is just one extra method call. Note that it is also impossible to inline shouldEmitObjectAllocationSample()
because it is (and also needs to be) uninterruptible, so there is already a method call anyways.
Optimizations like that are primarily necessary in situations where we would start a VM operation (because that is truly expensive).
/** | ||
* A rotation of the active window could happen while in this method. If so, then this window | ||
* will be updated as usual, although it is now the "next" window. This results in some wasted | ||
* effort, but doesn't affect correctness because this window will be reset before it becomes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this assume that the active window never changes more than once while this method is executed. I don't think that there is anything is in place to guarantee that. How is this solved on HotSpot?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't seem like hotspot does anything to prevent such a double rotation. If a double rotation happens while a thread is busy sampling the following could happen:
- The sample gets counted towards the new window (increases the measured population). It could be argued that this is ok since the evaluation happened during the new window's time slice. Depends on when we define a sample as being "taken".
- Sampling races with the rotating thread writing new window parameters. This can cause it to either sample when it would otherwise not have, or not sample when it otherwise would have. Under sampling should be fine in this edge case. Oversampling is bad.
To avoid those problems and any ambiguity about which window a sample belongs to, I've added a read-write lock. The writer's lock must be acquired for window rotation. The reader's lock must be acquired to sample.
There is locking without transition so I've made the appropriate code uninterruptible
. This also required avoiding Math.Random
and instead copying the simple PRNG that JFR uses in hotspot.
// The following are set to match the values in OpenJDK | ||
private static final int WINDOW_DIVISOR = 5; | ||
private static final int LOW_RATE_UPPER_BOUND = 9; | ||
private static final int EVENT_THROTTLER_OFF = -2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably better to use @Alias
on jdk.jfr.internal.Utils.THROTTLE_OFF
.
} | ||
|
||
this.eventSampleSize = samplesPerPeriod; | ||
this.periodNs = (long) periodMs * 1000000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use TimeUtils.millisToNanos(...)
.
} | ||
|
||
/** Visible for testing. */ | ||
public volatile boolean isTest = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not a big fan of whitebox tests (tight coupling with implementation, become useless pretty quickly if the implementation changes), especially if they add extra fields and logic to production code. If you really need some way to customize the behavior for the test cases, then I think it would be better to create a dedicated subclass that only exists in the test project and where you @Override
certain methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've taken your suggestion and moved the code needed for testing out of JfrThrottler
and JfrThrottlerWindow
classes and into dedicated subclasses in the testing code. Some private methods/fields I've had to make protected in order to @Override
.
samplesPerWindow = eventSampleSize; | ||
windowDurationNs = periodNs; | ||
} | ||
activeWindow.samplesPerWindow = samplesPerWindow; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other threads may access the values of the active window, so isn't it always problematic when the values of the active window are changed (i.e., other threads may read inconsistent data)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, other threads busy sampling use the active window, but they never use the same fields we are changing here. The lock must be acquired before accessing those fields.
But either way, this shouldn't be a problem anymore because the read-write lock prevents samplers from working while we are meddling with the active window.
… fields by lock access. Restrict access in JfrThrottler.
minor clean up
clean up JFR random
6ae531e
to
41811ee
Compare
e6dcd98
to
b0960fe
Compare
Hi @christianhaeubl, just commenting here to keep this PR on your radar. |
@roberttoyonaga : I started integrating this PR. I will let you know if I run into any issues. |
Thank you Christian! |
Summary of Changes
These changes introduce JFR event throttling infrastructure. Throttling allows for an emission rate to be specified for a given event. This will allow for enabling allocation profiling via the
jdk.ObjectAllocationSample
event by default. Without throttling, this allocation profiling could produce too much overhead to enable by default. The approach taken in this PR is very similar to how the jfrAdaptiveSampler works in the OpenJDK.Related Issue #5729
More Details
Notes
VMMutex
for synchronization because it allows for locking with a state transition and can be used in interuptible code. The only issue is that there is notryLock()
method that could be used to optimize contention for window rotations.