Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GR-47109] Implement JFR Event Throttling. #6899

Merged
merged 39 commits into from
Feb 10, 2024

Conversation

roberttoyonaga
Copy link
Collaborator

@roberttoyonaga roberttoyonaga commented Jun 27, 2023

Summary of Changes

These changes introduce JFR event throttling infrastructure. Throttling allows for an emission rate to be specified for a given event. This will allow for enabling allocation profiling via thejdk.ObjectAllocationSample event by default. Without throttling, this allocation profiling could produce too much overhead to enable by default. The approach taken in this PR is very similar to how the jfrAdaptiveSampler works in the OpenJDK.

Related Issue #5729

More Details

  • The implementation uses an alternating window scheme where multiple windows make up a period ( which can be specified by the user).
  • Only one thread may rotate windows at a time.
  • Each event type that allows throttling will have it's own throttling instance. But, similar to OpenJDK, threads share the same throttler for any given event type.
  • There is a concept of sampling debt. If the sampler under-samples during a window, debt will accumulate, causing more samples to be taken in successive windows.
  • Sampling is done based on a "projected population size" for the next window. This is an exponential weighted moving average that depends on previous windows (the "lookback")
  • Throttler code is allocation free because it is along the allocation slow path.

Notes

  • I've opted to use VMMutex for synchronization because it allows for locking with a state transition and can be used in interuptible code. The only issue is that there is no tryLock() method that could be used to optimize contention for window rotations.

@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Jun 27, 2023
@fniephaus fniephaus changed the title Implement JFR Event Throttling [GR-47109] Implement JFR Event Throttling. Jul 7, 2023
@fniephaus
Copy link
Member

Should we add a changelog entry for the new event?

@roberttoyonaga
Copy link
Collaborator Author

Should we add a changelog entry for the new event?

Yup that's true, I'll add a note to the change log.

Copy link
Member

@christianhaeubl christianhaeubl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, I added a couple of questions and comments.

@@ -237,7 +237,7 @@ private static Object slowPathNewInstanceWithoutAllocating(DynamicHub hub) {
AlignedHeader newTlab = HeapImpl.getChunkProvider().produceAlignedChunk();
return allocateInstanceInNewTlab(hub, newTlab);
} finally {
ObjectAllocationInNewTLABEvent.emit(startTicks, hub, LayoutEncoding.getPureInstanceAllocationSize(hub.getLayoutEncoding()), HeapParameters.getAlignedHeapChunkSize());
JfrAllocationEvents.emit(startTicks, DynamicHub.toClass(hub), LayoutEncoding.getPureInstanceAllocationSize(hub.getLayoutEncoding()), HeapParameters.getAlignedHeapChunkSize());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is easier to do the conversion from DynamicHub to Class inside of JfrAllocationEvents.emit(...).

The size computation can be moved out of allocateInstanceInNewTlab, then there is no need to call LayoutEncoding.getPureInstanceAllocationSize(hub.getLayoutEncoding()) again.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! Done

/**
* Each event that allows throttling should have its own throttler instance.
*/
public class JfrThrottler {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this class is based on HotSpot code? Please document the git revision, the JDK version, and the C++ source files that this is based on.

Please also do the same for JfrThrottlerWindow.

*/
public class JfrThrottler {
private static final long SECOND_IN_MS = 1000;
private static final long SECOND_IN_NS = 1000000 * SECOND_IN_MS;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are already constants for that, see TimeUtils.

if (HasJfrSupport.get()) {
emitObjectAllocationInNewTLAB(startTicks, clazz, allocationSize, tlabSize);

if (shouldEmitObjectAllocationSample() && SubstrateJVM.get().shouldCommit(JfrEvent.ObjectAllocationSample)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that JfrEvent.shouldEmit() should always call SubstrateJVM.get().shouldCommit(...). Otherwise, it is easy to forget that throttling is necessary for certain events.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true. I've taken your suggestion and also made all the throttler code uninterruptible. It would now have to be uninterruptible anyway due to locking without transition (see other responses).

}

/**
* This method exists as a slight optimization to avoid entering the sampler code if
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that any optimization is necessary in this case. This is only called by slow-path code and the overhead is just one extra method call. Note that it is also impossible to inline shouldEmitObjectAllocationSample() because it is (and also needs to be) uninterruptible, so there is already a method call anyways.

Optimizations like that are primarily necessary in situations where we would start a VM operation (because that is truly expensive).

/**
* A rotation of the active window could happen while in this method. If so, then this window
* will be updated as usual, although it is now the "next" window. This results in some wasted
* effort, but doesn't affect correctness because this window will be reset before it becomes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this assume that the active window never changes more than once while this method is executed. I don't think that there is anything is in place to guarantee that. How is this solved on HotSpot?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem like hotspot does anything to prevent such a double rotation. If a double rotation happens while a thread is busy sampling the following could happen:

  1. The sample gets counted towards the new window (increases the measured population). It could be argued that this is ok since the evaluation happened during the new window's time slice. Depends on when we define a sample as being "taken".
  2. Sampling races with the rotating thread writing new window parameters. This can cause it to either sample when it would otherwise not have, or not sample when it otherwise would have. Under sampling should be fine in this edge case. Oversampling is bad.

To avoid those problems and any ambiguity about which window a sample belongs to, I've added a read-write lock. The writer's lock must be acquired for window rotation. The reader's lock must be acquired to sample.

There is locking without transition so I've made the appropriate code uninterruptible. This also required avoiding Math.Random and instead copying the simple PRNG that JFR uses in hotspot.

// The following are set to match the values in OpenJDK
private static final int WINDOW_DIVISOR = 5;
private static final int LOW_RATE_UPPER_BOUND = 9;
private static final int EVENT_THROTTLER_OFF = -2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably better to use @Alias on jdk.jfr.internal.Utils.THROTTLE_OFF.

}

this.eventSampleSize = samplesPerPeriod;
this.periodNs = (long) periodMs * 1000000;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use TimeUtils.millisToNanos(...).

}

/** Visible for testing. */
public volatile boolean isTest = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a big fan of whitebox tests (tight coupling with implementation, become useless pretty quickly if the implementation changes), especially if they add extra fields and logic to production code. If you really need some way to customize the behavior for the test cases, then I think it would be better to create a dedicated subclass that only exists in the test project and where you @Override certain methods.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've taken your suggestion and moved the code needed for testing out of JfrThrottler and JfrThrottlerWindow classes and into dedicated subclasses in the testing code. Some private methods/fields I've had to make protected in order to @Override.

samplesPerWindow = eventSampleSize;
windowDurationNs = periodNs;
}
activeWindow.samplesPerWindow = samplesPerWindow;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other threads may access the values of the active window, so isn't it always problematic when the values of the active window are changed (i.e., other threads may read inconsistent data)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, other threads busy sampling use the active window, but they never use the same fields we are changing here. The lock must be acquired before accessing those fields.

But either way, this shouldn't be a problem anymore because the read-write lock prevents samplers from working while we are meddling with the active window.

clean up JFR random
@roberttoyonaga
Copy link
Collaborator Author

Hi @christianhaeubl, just commenting here to keep this PR on your radar.

@christianhaeubl
Copy link
Member

@roberttoyonaga : I started integrating this PR. I will let you know if I run into any issues.

@roberttoyonaga
Copy link
Collaborator Author

@roberttoyonaga : I started integrating this PR. I will let you know if I run into any issues.

Thank you Christian!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature native-image native-image-jfr OCA Verified All contributors have signed the Oracle Contributor Agreement. redhat-interest
Projects
Status: Released
Development

Successfully merging this pull request may close these issues.

4 participants