Skip to content

Commit

Permalink
Sampling events (#11)
Browse files Browse the repository at this point in the history
# Summary
Added event sampling for high throughput stats collectors to reduce memory footprint, should hopefully resolve [issue #6](#6).

# Changes
- `JitStatsCollector`, `ThreadPoolSchedulingStatsCollector` and `ContentionStatsCollector` can now be configured to discard a ratio of start/ stop events, reducing the number of events we need to store in a `Cache` at any one time
- A new default sampling rate has been configured for the above collectors (see README)
- Updated README with example on how to configure sampling and details of the defaults for each stats collector
  • Loading branch information
djluck authored Jul 25, 2019
1 parent d9ec11a commit 3d688f9
Show file tree
Hide file tree
Showing 19 changed files with 400 additions and 85 deletions.
32 changes: 29 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,10 @@ These metrics are essential for understanding the peformance of any non-trivial
Add the packge from [nuget](https://www.nuget.org/packages/prometheus-net.DotNetRuntime):
```powershell
# If you're using v2 of prometheus-net
dotnet add package prometheus-net.DotNetRuntime --version 2.0.8-beta
dotnet add package prometheus-net.DotNetRuntime --version 2.1.0-beta
# If you're using v3 of prometheus-net
dotnet add package prometheus-net.DotNetRuntime --version 3.0.8-beta
dotnet add package prometheus-net.DotNetRuntime --version 3.1.0-beta
```

And then start the collector:
Expand Down Expand Up @@ -49,7 +49,33 @@ The metrics exposed can drive a rich dashboard, giving you a graphical insight i
The harder you work the .NET core runtime, the more events it generates. Event generation and processing costs can stack up, especially around these types of events:
- **JIT stats**: each method compiled by the JIT compiler emits two events. Most JIT compilation is performed at startup and depending on the size of your application, this could impact your startup performance.
- **GC stats**: every 100KB of allocations, an event is emitted. If you are consistently allocating memory at a rate > 1GB/sec, you might like to disable GC stats.
- **.NET thread pool scheduling stats**: For every work item scheduled on the thread pool, two events are emitted. If you are scheduling thousands of items per second on the thread pool, you might like to disable scheduling events.
- **.NET thread pool scheduling stats**: For every work item scheduled on the thread pool, two events are emitted. If you are scheduling thousands of items per second on the thread pool, you might like to disable scheduling events or decrease the sampling rate of these events.

### Sampling
To counteract some of the performance impacts of measuring .NET core runtime events, sampling can be configured on supported collectors:
```csharp
IDisposable collector = DotNetRuntimeStatsBuilder.Customize()
// Only 1 in 10 contention events will be sampled
.WithContentionStats(sampleRate: SampleEvery.TenEvents)
// Only 1 in 100 JIT events will be sampled
.WithJitStats(sampleRate: SampleEvery.HundredEvents)
// Every event will be sampled (disables sampling)
.WithThreadPoolSchedulingStats(sampleRate: SampleEvery.OneEvent)
.StartCollecting();
```

The default sample rates are listed below:

| Event collector | Default sample rate |
| ------------------------------ | ------------------------|
| `ThreadPoolSchedulingStats` | `SampleEvery.TenEvents` |
| `JitStats` | `SampleEvery.TenEvents` |
| `ContentionStats` | `SampleEvery.TwoEvents` |

While the default sampling rates provide a decent balance between accuracy and resource consumption if you're concerned with the accuracy of metrics at all costs,
then feel free to change the sampling rate to `SampleEvery.OneEvent`. If minimal resource consumption (especially memory), is your goal you might like to
reduce the sampling rate.


## Further reading
- The mechanism for listening to runtime events is outlined in the [.NET core 2.2 release notes](https://docs.microsoft.com/en-us/dotnet/core/whats-new/dotnet-core-2-2#core).
Expand Down
9 changes: 2 additions & 7 deletions src/Benchmarks/Program.cs
Original file line number Diff line number Diff line change
Expand Up @@ -71,20 +71,15 @@ public class TestBenchmark
[MethodImpl(MethodImplOptions.NoOptimization)]
public long TestInterlockedIncLong() => Interlocked.Increment(ref l1);

private EventPairTimer<int> timer = new EventPairTimer<int>(1, 2, x => x.EventId);
private EventPairTimer<int> timer = new EventPairTimer<int>(1, 2, x => x.EventId, SampleEvery.OneEvent);

private EventWrittenEventArgs eventArgs;

public TestBenchmark()
{
eventArgs = (EventWrittenEventArgs)Activator.CreateInstance(typeof(EventWrittenEventArgs), BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.CreateInstance, (Binder) null, new object[] {null}, null);
}
[Benchmark]
public bool TestEvent()
{
return timer.TryGetEventPairDuration(eventArgs, out var duration);
}


}
}
}
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
using System;
using System.Collections.Generic;
using System.Net.Http;
using System.Threading.Tasks;
using NUnit.Framework;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ internal class ContentionStatsCollectorTests : StatsCollectorIntegrationTestBase
{
protected override ContentionStatsCollector CreateStatsCollector()
{
return new ContentionStatsCollector();
return new ContentionStatsCollector(SampleEvery.OneEvent);
}

[Test]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -99,8 +99,12 @@ public void When_a_garbage_collection_is_performed_then_the_gc_cpu_and_pause_rat
if (val > 0)
return val;

// To improve the reliability of the test, call UpdateMetrics here. Why? Process.TotalProcessorTime isn't very precise and this
// To improve the reliability of the test, do some CPU busy work + call UpdateMetrics here. Why? Process.TotalProcessorTime isn't very precise and this
// can lead to CpuRation believing that no CPU has been consumed
for (int i = 0; i < 10_000; i++)
{
}

StatsCollector.UpdateMetrics();

return 0;
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
using System;
using System.Diagnostics;
using System.Linq.Expressions;
using System.Runtime.CompilerServices;
using System.Threading;
using NUnit.Framework;
Expand All @@ -7,11 +10,11 @@

namespace Prometheus.DotNetRuntime.Tests.StatsCollectors.IntegrationTests
{
internal class JitStatsCollectorTests : StatsCollectorIntegrationTestBase<JitStatsCollector>
internal class Given_A_JitStatsCollector_That_Samples_Every_Jit_Event : StatsCollectorIntegrationTestBase<JitStatsCollector>
{
protected override JitStatsCollector CreateStatsCollector()
{
return new JitStatsCollector();
return new JitStatsCollector(SampleEvery.OneEvent);
}

[Test]
Expand Down Expand Up @@ -68,4 +71,37 @@ private int ToJitDynamic()
return o.Length;
}
}

internal class Given_A_JitStatsCollector_That_Samples_Every_Fifth_Jit_Event : StatsCollectorIntegrationTestBase<JitStatsCollector>
{
protected override JitStatsCollector CreateStatsCollector()
{
return new JitStatsCollector(SampleEvery.FiveEvents);
}

[Test]
public void When_many_methods_are_jitted_then_their_compilation_is_measured()
{
// arrange
var methodsJitted = StatsCollector.MethodsJittedTotal.Labels("true").Value;
var methodsJittedSeconds = StatsCollector.MethodsJittedSecondsTotal.Labels("true").Value;

// act
var sp = Stopwatch.StartNew();
Compile100Methods(() => 1);
sp.Stop();

// assert
Assert.That(() => StatsCollector.MethodsJittedTotal.Labels("true").Value, Is.GreaterThanOrEqualTo(methodsJitted + 20).After(100, 10));
Assert.That(StatsCollector.MethodsJittedSecondsTotal.Labels("true").Value, Is.GreaterThan(methodsJittedSeconds + sp.Elapsed.TotalSeconds).Within(0.1));
}

private void Compile100Methods(Expression<Func<int>> toCompile)
{
for (int i = 0; i < 100; i++)
{
toCompile.Compile();
}
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,12 @@
using System.Threading.Tasks;
using NUnit.Framework;
using Prometheus.DotNetRuntime.StatsCollectors;
using Prometheus.DotNetRuntime.StatsCollectors.Util;

namespace Prometheus.DotNetRuntime.Tests.StatsCollectors.IntegrationTests
{
[TestFixture]
internal class ThreadPoolSchedulingStatsCollectorTests : StatsCollectorIntegrationTestBase<ThreadPoolSchedulingStatsCollector>
internal class Given_A_ThreadPoolSchedulingStatsCollector_That_Samples_Every_Event : StatsCollectorIntegrationTestBase<ThreadPoolSchedulingStatsCollector>
{
protected override ThreadPoolSchedulingStatsCollector CreateStatsCollector()
{
Expand All @@ -34,4 +35,35 @@ public async Task When_work_is_queued_on_the_thread_pool_then_the_queued_and_sch
Assert.That(StatsCollector.ScheduleDelay.CollectAllSumValues().Single(), Is.EqualTo(sp.Elapsed.TotalSeconds).Within(0.01));
}
}

[TestFixture]
internal class Given_A_ThreadPoolSchedulingStatsCollector_That_Samples_Fifth_Event : StatsCollectorIntegrationTestBase<ThreadPoolSchedulingStatsCollector>
{
protected override ThreadPoolSchedulingStatsCollector CreateStatsCollector()
{
return new ThreadPoolSchedulingStatsCollector(Constants.DefaultHistogramBuckets, SampleEvery.FiveEvents);
}

[Test]
public async Task When_many_items_of_work_is_queued_on_the_thread_pool_then_the_queued_and_scheduled_work_is_measured()
{
Assert.That(StatsCollector.ScheduledCount.Value, Is.EqualTo(0));

// act (Task.Run will execute the function on the thread pool)
// There seems to be either a bug in the implementation of .NET core or a bug in my understanding...
// First call to Task.Run triggers a queued event but not a queue event. For now, call twice
await Task.Run(() => 1 );

var sp = Stopwatch.StartNew();
for (int i = 0; i < 100; i++)
{
sp.Start();
await Task.Run(() => sp.Stop());
}

Assert.That(() => StatsCollector.ScheduledCount.Value, Is.GreaterThanOrEqualTo(100).After(100, 10));
Assert.That(StatsCollector.ScheduleDelay.CollectAllCountValues().Single(), Is.GreaterThanOrEqualTo(100));
Assert.That(StatsCollector.ScheduleDelay.CollectAllSumValues().Single(), Is.EqualTo(sp.Elapsed.TotalSeconds).Within(0.01));
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -8,30 +8,30 @@
namespace Prometheus.DotNetRuntime.Tests.StatsCollectors.Util
{
[TestFixture]
public class EventPairTimerTests
public class Given_An_EventPairTimer_That_Samples_Every_Event : EventPairTimerBaseClass
{
private const int EventIdStart = 1, EventIdEnd = 2;
private EventPairTimer<long> _eventPairTimer;

[SetUp]
public void SetUp()
{
_eventPairTimer = new EventPairTimer<long>(EventIdStart, EventIdEnd, x => (long)x.Payload[0]);
_eventPairTimer = new EventPairTimer<long>(EventIdStart, EventIdEnd, x => (long)x.Payload[0], SampleEvery.OneEvent);
}

[Test]
public void TryGetEventPairDuration_ignores_events_that_its_not_configured_to_look_for()
{
var nonMonitoredEvent = CreateEventWrittenEventArgs(3);
Assert.That(_eventPairTimer.TryGetEventPairDuration(nonMonitoredEvent, out var duration), Is.False);
Assert.That(_eventPairTimer.TryGetDuration(nonMonitoredEvent, out var duration), Is.EqualTo(DurationResult.Unrecognized));
Assert.That(duration, Is.EqualTo(TimeSpan.Zero));
}

[Test]
public void TryGetEventPairDuration_ignores_end_events_if_it_never_saw_the_start_event()
{
var nonMonitoredEvent = CreateEventWrittenEventArgs(EventIdEnd, payload: 1L);
Assert.That(_eventPairTimer.TryGetEventPairDuration(nonMonitoredEvent, out var duration), Is.False);
Assert.That(_eventPairTimer.TryGetDuration(nonMonitoredEvent, out var duration),Is.EqualTo(DurationResult.FinalWithoutDuration));
Assert.That(duration, Is.EqualTo(TimeSpan.Zero));
}

Expand All @@ -41,11 +41,11 @@ public void TryGetEventPairDuration_calculates_duration_between_configured_event
// arrange
var now = DateTime.UtcNow;
var startEvent = CreateEventWrittenEventArgs(EventIdStart, now, payload: 1L);
Assert.That(_eventPairTimer.TryGetEventPairDuration(startEvent, out var _), Is.False);
Assert.That(_eventPairTimer.TryGetDuration(startEvent, out var _), Is.EqualTo(DurationResult.Start));
var endEvent = CreateEventWrittenEventArgs(EventIdEnd, now.AddMilliseconds(100), payload: 1L);

// act
Assert.That(_eventPairTimer.TryGetEventPairDuration(endEvent, out var duration), Is.True);
Assert.That(_eventPairTimer.TryGetDuration(endEvent, out var duration), Is.EqualTo(DurationResult.FinalWithDuration));
Assert.That(duration.TotalMilliseconds, Is.EqualTo(100));
}

Expand All @@ -55,11 +55,11 @@ public void TryGetEventPairDuration_calculates_duration_between_configured_event
// arrange
var now = DateTime.UtcNow;
var startEvent = CreateEventWrittenEventArgs(EventIdStart, now, payload: 1L);
Assert.That(_eventPairTimer.TryGetEventPairDuration(startEvent, out var _), Is.False);
Assert.That(_eventPairTimer.TryGetDuration(startEvent, out var _), Is.EqualTo(DurationResult.Start));
var endEvent = CreateEventWrittenEventArgs(EventIdEnd, now, payload: 1L);

// act
Assert.That(_eventPairTimer.TryGetEventPairDuration(endEvent, out var duration), Is.True);
Assert.That(_eventPairTimer.TryGetDuration(endEvent, out var duration), Is.EqualTo(DurationResult.FinalWithDuration));
Assert.That(duration, Is.EqualTo(TimeSpan.Zero));
}

Expand All @@ -75,21 +75,67 @@ public void TryGetEventPairDuration_calculates_duration_between_multiple_out_of_
var startEvent3 = CreateEventWrittenEventArgs(EventIdStart, now, payload: 3L);
var endEvent3 = CreateEventWrittenEventArgs(EventIdEnd, now.AddMilliseconds(100), payload: 3L);

_eventPairTimer.TryGetEventPairDuration(startEvent1, out var _);
_eventPairTimer.TryGetEventPairDuration(startEvent2, out var _);
_eventPairTimer.TryGetEventPairDuration(startEvent3, out var _);
_eventPairTimer.TryGetDuration(startEvent1, out var _);
_eventPairTimer.TryGetDuration(startEvent2, out var _);
_eventPairTimer.TryGetDuration(startEvent3, out var _);

// act
Assert.That(_eventPairTimer.TryGetEventPairDuration(endEvent3, out var event3Duration), Is.True);
Assert.That(_eventPairTimer.TryGetEventPairDuration(endEvent2, out var event2Duration), Is.True);
Assert.That(_eventPairTimer.TryGetEventPairDuration(endEvent1, out var event1Duration), Is.True);
Assert.That(_eventPairTimer.TryGetDuration(endEvent3, out var event3Duration), Is.EqualTo(DurationResult.FinalWithDuration));
Assert.That(_eventPairTimer.TryGetDuration(endEvent2, out var event2Duration), Is.EqualTo(DurationResult.FinalWithDuration));
Assert.That(_eventPairTimer.TryGetDuration(endEvent1, out var event1Duration), Is.EqualTo(DurationResult.FinalWithDuration));

Assert.That(event1Duration.TotalMilliseconds, Is.EqualTo(300));
Assert.That(event2Duration.TotalMilliseconds, Is.EqualTo(200));
Assert.That(event3Duration.TotalMilliseconds, Is.EqualTo(100));
}
}


[TestFixture]
public class Given_An_EventPairTimer_That_Samples_Every_2nd_Event : EventPairTimerBaseClass
{

private EventPairTimer<long> _eventPairTimer;

[SetUp]
public void SetUp()
{
_eventPairTimer = new EventPairTimer<long>(EventIdStart, EventIdEnd, x => (long)x.Payload[0], SampleEvery.TwoEvents);
}

[Test]
public void TryGetEventPairDuration_recognizes_start_events_that_will_be_discarded()
{
var startEvent1 = CreateEventWrittenEventArgs(EventIdStart, DateTime.UtcNow, payload: 1L);
Assert.That(_eventPairTimer.TryGetDuration(startEvent1, out var duration),Is.EqualTo(DurationResult.Start));
Assert.That(duration, Is.EqualTo(TimeSpan.Zero));
}

[Test]
public void TryGetEventPairDuration_will_discard_1_event_and_calculate_duration_for_the_2nd_event()
{
// arrange
var now = DateTime.UtcNow;
var startEvent1 = CreateEventWrittenEventArgs(EventIdStart, now, payload: 1L);
var endEvent1 = CreateEventWrittenEventArgs(EventIdEnd, now.AddMilliseconds(300), payload: 1L);
var startEvent2 = CreateEventWrittenEventArgs(EventIdStart, now, payload: 2L);
var endEvent2 = CreateEventWrittenEventArgs(EventIdEnd, now.AddMilliseconds(200), payload: 2L);

private EventWrittenEventArgs CreateEventWrittenEventArgs(int eventId, DateTime? timestamp = null, params object[] payload)
_eventPairTimer.TryGetDuration(startEvent1, out var _);
_eventPairTimer.TryGetDuration(startEvent2, out var _);

// act
Assert.That(_eventPairTimer.TryGetDuration(endEvent1, out var _), Is.EqualTo(DurationResult.FinalWithoutDuration));
Assert.That(_eventPairTimer.TryGetDuration(endEvent2, out var _), Is.EqualTo(DurationResult.FinalWithDuration));
}

}

public class EventPairTimerBaseClass
{
protected const int EventIdStart = 1, EventIdEnd = 2;

protected EventWrittenEventArgs CreateEventWrittenEventArgs(int eventId, DateTime? timestamp = null, params object[] payload)
{
var args = (EventWrittenEventArgs)typeof(EventWrittenEventArgs).CreateInstance(new []{ typeof(EventSource)}, Flags.NonPublic | Flags.Instance, new object[] { null});
args.SetPropertyValue("EventId", eventId);
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
using System;
using NUnit.Framework;
using Prometheus.DotNetRuntime.StatsCollectors.Util;

namespace Prometheus.DotNetRuntime.Tests.StatsCollectors.Util
{
[TestFixture]
public class SamplingRateTests
{
[Test]
[TestCase(SampleEvery.OneEvent, 1)]
[TestCase(SampleEvery.TwoEvents, 2)]
[TestCase(SampleEvery.FiveEvents, 5)]
[TestCase(SampleEvery.TenEvents, 10)]
[TestCase(SampleEvery.TwentyEvents, 20)]
[TestCase(SampleEvery.FiftyEvents, 50)]
[TestCase(SampleEvery.HundredEvents, 100)]
public void SampleEvery_Reflects_The_Ratio_Of_Every_100_Events_That_Will_Be_Sampled(SampleEvery samplingRate, int expected)
{
var sr = new SamplingRate(samplingRate);
Assert.That(sr.SampleEvery, Is.EqualTo(expected));
}

[Test]
[TestCase(SampleEvery.OneEvent, 1000)]
[TestCase(SampleEvery.TwoEvents, 500)]
[TestCase(SampleEvery.FiveEvents, 200)]
[TestCase(SampleEvery.TenEvents, 100)]
[TestCase(SampleEvery.TwentyEvents, 50)]
[TestCase(SampleEvery.FiftyEvents, 20)]
[TestCase(SampleEvery.HundredEvents, 10)]
public void Given_1000_Events_ShouldSampleEvent_Returns_True_Every_Nth_Event(SampleEvery samplingRate, int expectedEvents)
{
var eventsSampled = 0;
var sr = new SamplingRate(samplingRate);

for (int i = 0; i < 1_000; i++)
{
if (sr.ShouldSampleEvent())
{
eventsSampled++;
}
}

Assert.That(eventsSampled, Is.EqualTo(expectedEvents));
}
}
}
Loading

0 comments on commit 3d688f9

Please sign in to comment.