-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Profiler] Add timer_create
-based CPU profiling on Linux
#5476
[Profiler] Add timer_create
-based CPU profiling on Linux
#5476
Conversation
93658f6
to
88c285e
Compare
Datadog ReportBranch report: ✅ 0 Failed, 333689 Passed, 1801 Skipped, 43m 30.98s Wall Time ⌛ Performance Regressions vs Default Branch (1)
|
Execution-Time Benchmarks Report ⏱️Execution-time results for samples comparing the following branches/commits: Execution-time benchmarks measure the whole time it takes to execute a program. And are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are shown in red. The following thresholds were used for comparing the execution times:
Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard. Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph). gantt
title Execution time (ms) FakeDbCommand (.NET Framework 4.6.2)
dateFormat X
axisFormat %s
todayMarker off
section Baseline
This PR (5476) - mean (71ms) : 64, 77
. : milestone, 71,
master - mean (74ms) : 61, 87
. : milestone, 74,
section CallTarget+Inlining+NGEN
This PR (5476) - mean (1,018ms) : 994, 1041
. : milestone, 1018,
master - mean (1,018ms) : 999, 1036
. : milestone, 1018,
gantt
title Execution time (ms) FakeDbCommand (.NET Core 3.1)
dateFormat X
axisFormat %s
todayMarker off
section Baseline
This PR (5476) - mean (108ms) : 106, 111
. : milestone, 108,
master - mean (109ms) : 106, 111
. : milestone, 109,
section CallTarget+Inlining+NGEN
This PR (5476) - mean (708ms) : 681, 734
. : milestone, 708,
master - mean (715ms) : 687, 744
. : milestone, 715,
gantt
title Execution time (ms) FakeDbCommand (.NET 6)
dateFormat X
axisFormat %s
todayMarker off
section Baseline
This PR (5476) - mean (92ms) : 89, 95
. : milestone, 92,
master - mean (92ms) : 90, 95
. : milestone, 92,
section CallTarget+Inlining+NGEN
This PR (5476) - mean (663ms) : 633, 692
. : milestone, 663,
master - mean (661ms) : 638, 684
. : milestone, 661,
gantt
title Execution time (ms) HttpMessageHandler (.NET Framework 4.6.2)
dateFormat X
axisFormat %s
todayMarker off
section Baseline
This PR (5476) - mean (190ms) : 187, 193
. : milestone, 190,
master - mean (191ms) : 188, 194
. : milestone, 191,
section CallTarget+Inlining+NGEN
This PR (5476) - mean (1,108ms) : 1085, 1130
. : milestone, 1108,
master - mean (1,108ms) : 1085, 1131
. : milestone, 1108,
gantt
title Execution time (ms) HttpMessageHandler (.NET Core 3.1)
dateFormat X
axisFormat %s
todayMarker off
section Baseline
This PR (5476) - mean (277ms) : 272, 282
. : milestone, 277,
master - mean (275ms) : 270, 280
. : milestone, 275,
section CallTarget+Inlining+NGEN
This PR (5476) - mean (881ms) : 855, 907
. : milestone, 881,
master - mean (880ms) : 859, 900
. : milestone, 880,
gantt
title Execution time (ms) HttpMessageHandler (.NET 6)
dateFormat X
axisFormat %s
todayMarker off
section Baseline
This PR (5476) - mean (265ms) : 261, 269
. : milestone, 265,
master - mean (265ms) : 261, 269
. : milestone, 265,
section CallTarget+Inlining+NGEN
This PR (5476) - mean (865ms) : 840, 891
. : milestone, 865,
master - mean (859ms) : 836, 882
. : milestone, 859,
|
de46062
to
b2a4323
Compare
Benchmarks Report for tracer 🐌Benchmarks for #5476 compared to master:
The following thresholds were used for comparing the benchmark speeds:
Allocation changes below 0.5% are ignored. Benchmark detailsBenchmarks.Trace.ActivityBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.AgentWriterBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.AspNetCoreBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.DbCommandBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.ElasticsearchBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.GraphQLBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.HttpClientBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.ILoggerBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.Log4netBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.NLogBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.RedisBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.SerilogBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.SpanBenchmark - Slower
|
Benchmark | diff/base | Base Median (ns) | Diff Median (ns) | Modality |
---|---|---|---|---|
Benchmarks.Trace.SpanBenchmark.StartFinishSpan‑net472 | 1.127 | 587.93 | 662.54 |
Raw results
Branch | Method | Toolchain | Mean | StdError | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|---|
master | StartFinishSpan |
net6.0 | 456ns | 0.0971ns | 0.363ns | 0.00813 | 0 | 0 | 576 B |
master | StartFinishSpan |
netcoreapp3.1 | 554ns | 0.193ns | 0.748ns | 0.00774 | 0 | 0 | 576 B |
master | StartFinishSpan |
net472 | 587ns | 0.722ns | 2.8ns | 0.0917 | 0 | 0 | 578 B |
master | StartFinishScope |
net6.0 | 461ns | 0.17ns | 0.66ns | 0.0098 | 0 | 0 | 696 B |
master | StartFinishScope |
netcoreapp3.1 | 676ns | 0.367ns | 1.42ns | 0.00964 | 0 | 0 | 696 B |
master | StartFinishScope |
net472 | 847ns | 0.382ns | 1.43ns | 0.104 | 0 | 0 | 658 B |
#5476 | StartFinishSpan |
net6.0 | 426ns | 0.15ns | 0.582ns | 0.008 | 0 | 0 | 576 B |
#5476 | StartFinishSpan |
netcoreapp3.1 | 584ns | 0.238ns | 0.922ns | 0.00792 | 0 | 0 | 576 B |
#5476 | StartFinishSpan |
net472 | 662ns | 0.507ns | 1.96ns | 0.0918 | 0 | 0 | 578 B |
#5476 | StartFinishScope |
net6.0 | 468ns | 0.0669ns | 0.241ns | 0.00966 | 0 | 0 | 696 B |
#5476 | StartFinishScope |
netcoreapp3.1 | 666ns | 0.291ns | 1.09ns | 0.00939 | 0 | 0 | 696 B |
#5476 | StartFinishScope |
net472 | 863ns | 0.694ns | 2.69ns | 0.105 | 0 | 0 | 658 B |
Benchmarks.Trace.TraceAnnotationsBenchmark - Same speed ✔️ Same allocations ✔️
Raw results
Branch | Method | Toolchain | Mean | StdError | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|---|
master | RunOnMethodBegin |
net6.0 | 668ns | 0.559ns | 2.17ns | 0.00969 | 0 | 0 | 696 B |
master | RunOnMethodBegin |
netcoreapp3.1 | 919ns | 0.313ns | 1.21ns | 0.00958 | 0 | 0 | 696 B |
master | RunOnMethodBegin |
net472 | 1.07μs | 0.353ns | 1.37ns | 0.104 | 0 | 0 | 658 B |
#5476 | RunOnMethodBegin |
net6.0 | 730ns | 0.621ns | 2.4ns | 0.00988 | 0 | 0 | 696 B |
#5476 | RunOnMethodBegin |
netcoreapp3.1 | 875ns | 0.413ns | 1.6ns | 0.00926 | 0 | 0 | 696 B |
#5476 | RunOnMethodBegin |
net472 | 1.13μs | 0.491ns | 1.9ns | 0.104 | 0 | 0 | 658 B |
Throughput/Crank Report:zap:Throughput results for AspNetCoreSimpleController comparing the following branches/commits: Cases where throughput results for the PR are worse than latest master (5% drop or greater), results are shown in red. Note that these results are based on a single point-in-time result for each branch. For full results, see one of the many, many dashboards! gantt
title Throughput Linux x64 (Total requests)
dateFormat X
axisFormat %s
section Baseline
This PR (5476) (11.939M) : 0, 11938686
master (11.881M) : 0, 11880597
benchmarks/2.9.0 (11.915M) : 0, 11914657
section Automatic
This PR (5476) (8.063M) : 0, 8062731
master (8.047M) : 0, 8046727
benchmarks/2.9.0 (8.513M) : 0, 8513066
section Trace stats
master (8.464M) : 0, 8463773
section Manual
This PR (5476) (10.252M) : 0, 10252228
master (10.247M) : 0, 10246957
section Manual + Automatic
This PR (5476) (7.512M) : 0, 7511836
master (7.578M) : 0, 7577515
section Version Conflict
master (6.707M) : 0, 6707065
gantt
title Throughput Linux arm64 (Total requests)
dateFormat X
axisFormat %s
section Baseline
This PR (5476) (9.546M) : 0, 9546024
benchmarks/2.9.0 (9.711M) : 0, 9711350
section Automatic
This PR (5476) (6.550M) : 0, 6550413
section Manual
This PR (5476) (8.246M) : 0, 8245631
section Manual + Automatic
This PR (5476) (6.217M) : 0, 6217291
gantt
title Throughput Windows x64 (Total requests)
dateFormat X
axisFormat %s
section Baseline
This PR (5476) (10.032M) : 0, 10031840
master (10.099M) : 0, 10099485
section Automatic
This PR (5476) (7.157M) : 0, 7156646
master (7.144M) : 0, 7143660
section Trace stats
master (7.418M) : 0, 7418021
section Manual
This PR (5476) (8.794M) : 0, 8794122
master (8.900M) : 0, 8899902
section Manual + Automatic
This PR (5476) (6.921M) : 0, 6920921
master (6.862M) : 0, 6861859
section Version Conflict
master (6.305M) : 0, 6305249
|
b3db22f
to
46805c1
Compare
@@ -109,6 +109,31 @@ StackSnapshotResultBuffer* LinuxStackFramesCollector::CollectStackSampleImplemen | |||
return GetStackSnapshotResult(); | |||
} | |||
|
|||
// If there a timer associated to the managed thread, we have to disarm it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be moved to line 89 so that self collect cases (exceptions, contention, ...) do not account stack walking to the application
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️
@@ -76,6 +76,8 @@ StackSamplerLoop::StackSamplerLoop( | |||
Log::Info("Wall time sampled threads = ", _walltimeThreadsThreshold); | |||
Log::Info("Max CodeHotspots sampled threads = ", _codeHotspotsThreadsThreshold); | |||
Log::Info("Max CPU sampled threads = ", _cpuThreadsThreshold); | |||
Log::Info("Cpu profiler is ", (_isCpuEnabled) ? "enabled" : "disabled"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Log::Info("Cpu profiler is ", (_isCpuEnabled) ? "enabled" : "disabled"); | |
Log::Info("Manual Cpu profiler is ", (_isCpuEnabled) ? "enabled" : "disabled"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️
return &signalManager; | ||
if (signal < 1 || signal > 31) | ||
{ | ||
throw std::invalid_argument(std::string("Signal argument is invalid ") + "(" + std::to_string(signal) + "). Value must be: 1 <= signal <= 31"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid crashing the app: should assert to more easily find our mistake while debugging + log + return nullptr to be checked (in LinuxStackFramesCollector constructor and CorProfilerCallback init)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️
profiler/src/ProfilerEngine/Datadog.Profiler.Native.Linux/ProfilerSignalManager.cpp
Show resolved
Hide resolved
profiler/src/ProfilerEngine/Datadog.Profiler.Native.Linux/TimerCreateCpuProfiler.cpp
Outdated
Show resolved
Hide resolved
profiler/src/ProfilerEngine/Datadog.Profiler.Native.Linux/TimerCreateCpuProfiler.cpp
Outdated
Show resolved
Hide resolved
return true; | ||
} | ||
|
||
// If the signal is higjacked, what to do? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// If the signal is higjacked, what to do? | |
// If the signal is hijacked, what to do? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️
profiler/src/ProfilerEngine/Datadog.Profiler.Native.Linux/TimerCreateCpuProfiler.cpp
Outdated
Show resolved
Hide resolved
46805c1
to
588ddc2
Compare
Datadog ReportBranch report: ✅ 0 Failed, 344440 Passed, 2033 Skipped, 16h 7m 33.32s Total Time New Flaky Tests (1)
|
a57f047
to
b7c3066
Compare
5f98128
to
58c7c7e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
profiler/src/ProfilerEngine/Datadog.Profiler.Native.Linux/ProfilerSignalManager.cpp
Show resolved
Hide resolved
IManagedThreadList* _pManagedThreadsList; | ||
CpuTimeProvider* _pProvider; | ||
CallstackProvider _callstackProvider; | ||
std::atomic<ServiceState> _serviceState; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be removed because it is not more used (done in ServiceBase now)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️
profiler/src/ProfilerEngine/Datadog.Profiler.Native.Linux/TimerCreateCpuProfiler.cpp
Show resolved
Hide resolved
nullptr | ||
} | ||
}; | ||
{WStr("Microsoft-Windows-DotNETRuntime"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a line between { and the first element
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️
db712f8
to
96174c9
Compare
96174c9
to
66f6868
Compare
Summary of changes
Add
timer_create
-based CPU profiling on linux.Reason for change
On Linux, at each profiling tick, the sampling thread goes over the thread
proc
information (reading files) to 1) check if it's currently running on a CPU 2) retrieve the CPU time.This is a significant overhead ( in addition to the stackwalking) and has a greater impact (relatively) in case of job application (applications that wait most of the time, and time to time, process requests/jobs)
With this (and the other improvements we've done in the past), this decrease the profiler overhead by at least 2x (2.46, the profiler had a 2.5s CPU overhead over a minute, with the various improvements, its goes to 1s CPU over a minute).
Implementation details
ProfilerSignalManager
to be able to get a manager by signalTest coverage
Other details