Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash tracking #5451

Merged
merged 102 commits into from
May 15, 2024
Merged

Crash tracking #5451

merged 102 commits into from
May 15, 2024

Conversation

kevingosse
Copy link
Collaborator

@kevingosse kevingosse commented Apr 16, 2024

Summary of changes

Monitor crashes in instrumented applications. If there is suspicion that the crash might be caused by us, send a crash report through telemetry logs.

We currently only support Linux.

The crash handler is enabled by setting the DD_TRACE_CRASH_HANDLER environment variable and making it point to dd-dotnet. The goal is to have SSI automatically set that environment variable.

Reason for change

When we crash customer applications, it can take them days/weeks before they narrow it down to Datadog and open a support ticket. With this, hopefully we can be more proactive.

Implementation details

On Linux, there is no good way to know if a .NET application is crashing (listening to signals is not enough because .NET use segfault signals even for NullReferenceException that are going to be caught). The solution we found is to use LD_PRELOAD to hook the execve syscall to find out if .NET is trying to call createdump. If that's the case, we redirect the call to dd-dotnet instead.

In dd-dotnet, we use ClrMD to find the managed exception that caused the crash (if it's a managed crash). To resolve the callstack, we use some native code (written in the profiler) relying on libunwind. The native code callbacks the managed code to resolve the managed frames using ClrMD.
All the data we collect is sent to libdatadog, which is responsible of formatting and sending the crash report.

There are currently two heuristics implemented to try to figure out if we caused the crash:

  • If there's a managed exception, we check if it's one of the exception types that we usually get from instrumentation errors (InvalidProgramException, VerificationException, MissingMethodException, BadImageFormatException), or if the exception is in the Datadog.* namespace (for instance for ducktyping exceptions)
  • For each of the frames in the thread that caused the crash, we check if it's a Datadog function other than the BlockingMiddleware or the TaskContinuationGenerator (because they "stay" in the callstack even if they do nothing)

The crash tracker tries to be discreet, and only display information if we think we caused the crash (or if there's a really unexpected error).

We still want customers to be able to generate crash dumps by setting COMPlus_DbgEnableMiniDump. If it was set, dd-dotnet calls createdump after generating the crash report. Because Datadog.Linux.ApiWrapper will always set COMPlus_DbgEnableMiniDump (because otherwise .NET won't call createdump, and so there is no execve call to hook), we set the DD_TRACE_CRASH_HANDLER_PASSTHROUGH environment variable. dd-dotnet uses that information to know if COMPlus_DbgEnableMiniDump was originally set or just added by us, and to know if it must forward the call to createdump.

When telemetry is explicitly disabled, we disable the crash report.

Test coverage

Added some tests to check that dd-dotnet is correctly invoked. I will add more tests to validate the report itself.

Other details

Please be very careful about reviewing the changes in Datadog.Linux.ApiWrapper. Be nitpicky, let nothing slide. If you're feeling lazy, think the PR is too big, and decide to review only one file, that's the one. It runs in all applications so it's paramount that this part of the code is perfect.

Things left before merging:

  • It doesn't work on ARM64 in the CI (there is a permission error). For now I disabled the test. I need to check why it fails, in case it hides a bigger issue
  • I need to add tests to validate the structure of the crash report
  • Libdatadog is supposed to automatically detect the URL of the agent, but it's not currently working. Waiting for the libdatadog people to publish an updated version

@andrewlock
Copy link
Member

andrewlock commented Apr 16, 2024

Execution-Time Benchmarks Report ⏱️

Execution-time results for samples comparing the following branches/commits:

Execution-time benchmarks measure the whole time it takes to execute a program. And are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are shown in red. The following thresholds were used for comparing the execution times:

  • Welch test with statistical test for significance of 5%
  • Only results indicating a difference greater than 5% and 5 ms are considered.

Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard.

Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph).

gantt
    title Execution time (ms) FakeDbCommand (.NET Framework 4.6.2) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (5451) - mean (75ms)  : 63, 87
     .   : milestone, 75,
    master - mean (75ms)  : 64, 86
     .   : milestone, 75,

    section CallTarget+Inlining+NGEN
    This PR (5451) - mean (989ms)  : 958, 1019
     .   : milestone, 989,
    master - mean (998ms)  : 959, 1036
     .   : milestone, 998,

Loading
gantt
    title Execution time (ms) FakeDbCommand (.NET Core 3.1) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (5451) - mean (110ms)  : 108, 113
     .   : milestone, 110,
    master - mean (110ms)  : 106, 113
     .   : milestone, 110,

    section CallTarget+Inlining+NGEN
    This PR (5451) - mean (698ms)  : 668, 728
     .   : milestone, 698,
    master - mean (701ms)  : 673, 728
     .   : milestone, 701,

Loading
gantt
    title Execution time (ms) FakeDbCommand (.NET 6) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (5451) - mean (92ms)  : 90, 94
     .   : milestone, 92,
    master - mean (94ms)  : 90, 98
     .   : milestone, 94,

    section CallTarget+Inlining+NGEN
    This PR (5451) - mean (653ms)  : 630, 677
     .   : milestone, 653,
    master - mean (655ms)  : 630, 679
     .   : milestone, 655,

Loading
gantt
    title Execution time (ms) HttpMessageHandler (.NET Framework 4.6.2) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (5451) - mean (191ms)  : 185, 197
     .   : milestone, 191,
    master - mean (190ms)  : 186, 194
     .   : milestone, 190,

    section CallTarget+Inlining+NGEN
    This PR (5451) - mean (1,079ms)  : 1051, 1106
     .   : milestone, 1079,
    master - mean (1,079ms)  : 1052, 1106
     .   : milestone, 1079,

Loading
gantt
    title Execution time (ms) HttpMessageHandler (.NET Core 3.1) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (5451) - mean (277ms)  : 270, 283
     .   : milestone, 277,
    master - mean (276ms)  : 271, 281
     .   : milestone, 276,

    section CallTarget+Inlining+NGEN
    This PR (5451) - mean (862ms)  : 838, 886
     .   : milestone, 862,
    master - mean (864ms)  : 841, 888
     .   : milestone, 864,

Loading
gantt
    title Execution time (ms) HttpMessageHandler (.NET 6) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (5451) - mean (265ms)  : 260, 271
     .   : milestone, 265,
    master - mean (265ms)  : 261, 269
     .   : milestone, 265,

    section CallTarget+Inlining+NGEN
    This PR (5451) - mean (855ms)  : 822, 888
     .   : milestone, 855,
    master - mean (845ms)  : 824, 866
     .   : milestone, 845,

Loading

@datadog-ddstaging
Copy link

datadog-ddstaging bot commented Apr 16, 2024

Datadog Report

Branch report: kevin/report_crash_local
Commit report: 3d0f86b
Test service: dd-trace-dotnet

✅ 0 Failed, 331892 Passed, 1836 Skipped, 14h 56m 57.81s Total Time

@andrewlock
Copy link
Member

andrewlock commented Apr 17, 2024

Throughput/Crank Report:zap:

Throughput results for AspNetCoreSimpleController comparing the following branches/commits:

Cases where throughput results for the PR are worse than latest master (5% drop or greater), results are shown in red.

Note that these results are based on a single point-in-time result for each branch. For full results, see one of the many, many dashboards!

gantt
    title Throughput Linux x64 (Total requests) 
    dateFormat  X
    axisFormat %s
    section Baseline
    This PR (5451) (11.872M)   : 0, 11872182
    master (11.849M)   : 0, 11849466
    benchmarks/2.9.0 (11.905M)   : 0, 11905297

    section Automatic
    This PR (5451) (8.035M)   : 0, 8035235
    master (7.908M)   : 0, 7907936
    benchmarks/2.9.0 (8.378M)   : 0, 8378100

    section Trace stats
    master (8.335M)   : 0, 8334859

    section Manual
    This PR (5451) (10.208M)   : 0, 10207518
    master (10.049M)   : 0, 10048746

    section Manual + Automatic
    This PR (5451) (7.591M)   : 0, 7590568
    master (7.580M)   : 0, 7579532

    section Version Conflict
    master (6.733M)   : 0, 6733343

Loading
gantt
    title Throughput Linux arm64 (Total requests) 
    dateFormat  X
    axisFormat %s
    section Baseline
    This PR (5451) (9.651M)   : 0, 9651452
    master (9.620M)   : 0, 9619876
    benchmarks/2.9.0 (9.517M)   : 0, 9516587

    section Automatic
    This PR (5451) (6.637M)   : 0, 6637280
    master (6.446M)   : 0, 6446273

    section Trace stats
    master (6.992M)   : 0, 6992127

    section Manual
    This PR (5451) (8.223M)   : 0, 8222877
    master (8.285M)   : 0, 8285417

    section Manual + Automatic
    This PR (5451) (6.269M)   : 0, 6269138
    master (6.251M)   : 0, 6251406

    section Version Conflict
    master (5.629M)   : 0, 5629094

Loading
gantt
    title Throughput Windows x64 (Total requests) 
    dateFormat  X
    axisFormat %s
    section Baseline
    This PR (5451) (9.697M)   : 0, 9697494
    master (9.953M)   : 0, 9953121
    benchmarks/2.9.0 (10.013M)   : 0, 10013461

    section Automatic
    This PR (5451) (6.990M)   : 0, 6989856
    master (7.141M)   : 0, 7141172
    benchmarks/2.9.0 (7.404M)   : 0, 7403989

    section Trace stats
    master (7.366M)   : 0, 7366104

    section Manual
    This PR (5451) (8.593M)   : 0, 8592975
    master (8.746M)   : 0, 8745815

    section Manual + Automatic
    This PR (5451) (6.698M)   : 0, 6698134
    master (6.776M)   : 0, 6776058

    section Version Conflict
    master (6.176M)   : 0, 6175582

Loading

@andrewlock
Copy link
Member

andrewlock commented Apr 18, 2024

Benchmarks Report for tracer 🐌

Benchmarks for #5451 compared to master:

  • All benchmarks have the same speed
  • All benchmarks have the same allocations

The following thresholds were used for comparing the benchmark speeds:

  • Mann–Whitney U test with statistical test for significance of 5%
  • Only results indicating a difference greater than 10% and 0.3 ns are considered.

Allocation changes below 0.5% are ignored.

Benchmark details

Benchmarks.Trace.ActivityBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master StartStopWithChild net6.0 8.88μs 47.1ns 235ns 0.0171 0.00855 0 7.55 KB
master StartStopWithChild netcoreapp3.1 10.9μs 59ns 339ns 0.0305 0.0153 0 7.65 KB
master StartStopWithChild net472 17.2μs 65.2ns 253ns 1.36 0.372 0.11 8.11 KB
#5451 StartStopWithChild net6.0 8.84μs 46.5ns 246ns 0.0214 0.00857 0 7.55 KB
#5451 StartStopWithChild netcoreapp3.1 11μs 59.1ns 313ns 0.032 0.016 0 7.64 KB
#5451 StartStopWithChild net472 17.1μs 36.2ns 140ns 1.36 0.368 0.109 8.1 KB
Benchmarks.Trace.AgentWriterBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master WriteAndFlushEnrichedTraces net6.0 475μs 192ns 691ns 0 0 0 2.7 KB
master WriteAndFlushEnrichedTraces netcoreapp3.1 646μs 441ns 1.71μs 0 0 0 2.7 KB
master WriteAndFlushEnrichedTraces net472 841μs 440ns 1.65μs 0.419 0 0 3.3 KB
#5451 WriteAndFlushEnrichedTraces net6.0 465μs 411ns 1.54μs 0 0 0 2.7 KB
#5451 WriteAndFlushEnrichedTraces netcoreapp3.1 636μs 494ns 1.91μs 0 0 0 2.7 KB
#5451 WriteAndFlushEnrichedTraces net472 832μs 580ns 2.17μs 0.414 0 0 3.3 KB
Benchmarks.Trace.AspNetCoreBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master SendRequest net6.0 175μs 883ns 4.04μs 0.254 0 0 18.49 KB
master SendRequest netcoreapp3.1 189μs 231ns 865ns 0.19 0 0 20.65 KB
master SendRequest net472 5.29E‑05ns 3.69E‑05ns 0.000138ns 0 0 0 0 b
#5451 SendRequest net6.0 170μs 186ns 720ns 0.254 0 0 18.49 KB
#5451 SendRequest netcoreapp3.1 193μs 278ns 1.08μs 0.191 0 0 20.65 KB
#5451 SendRequest net472 0.00028ns 0.000115ns 0.000429ns 0 0 0 0 b
Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master WriteAndFlushEnrichedTraces net6.0 556μs 637ns 2.47μs 0.278 0 0 41.77 KB
master WriteAndFlushEnrichedTraces netcoreapp3.1 653μs 1.54μs 5.76μs 0.326 0 0 41.53 KB
master WriteAndFlushEnrichedTraces net472 826μs 2.72μs 10.2μs 8.22 2.47 0.411 53.24 KB
#5451 WriteAndFlushEnrichedTraces net6.0 554μs 343ns 1.33μs 0.548 0 0 41.8 KB
#5451 WriteAndFlushEnrichedTraces netcoreapp3.1 657μs 2.23μs 8.62μs 0.324 0 0 41.59 KB
#5451 WriteAndFlushEnrichedTraces net472 863μs 4.16μs 17.2μs 8.08 2.55 0.425 53.27 KB
Benchmarks.Trace.DbCommandBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master ExecuteNonQuery net6.0 1.15μs 3.52ns 13.2ns 0.0115 0 0 808 B
master ExecuteNonQuery netcoreapp3.1 1.57μs 2.63ns 10.2ns 0.0107 0 0 808 B
master ExecuteNonQuery net472 1.81μs 0.511ns 1.91ns 0.122 0 0 770 B
#5451 ExecuteNonQuery net6.0 1.16μs 0.487ns 1.89ns 0.0111 0 0 808 B
#5451 ExecuteNonQuery netcoreapp3.1 1.45μs 0.836ns 3.01ns 0.0111 0 0 808 B
#5451 ExecuteNonQuery net472 1.79μs 0.41ns 1.54ns 0.122 0 0 770 B
Benchmarks.Trace.ElasticsearchBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master CallElasticsearch net6.0 1.25μs 0.424ns 1.59ns 0.0145 0 0 1.03 KB
master CallElasticsearch netcoreapp3.1 1.7μs 0.642ns 2.4ns 0.0136 0 0 1.03 KB
master CallElasticsearch net472 2.47μs 1.12ns 4.18ns 0.166 0.00124 0 1.04 KB
master CallElasticsearchAsync net6.0 1.36μs 0.679ns 2.63ns 0.0137 0 0 1.01 KB
master CallElasticsearchAsync netcoreapp3.1 1.73μs 1.26ns 4.89ns 0.0148 0 0 1.08 KB
master CallElasticsearchAsync net472 2.64μs 1.16ns 4.17ns 0.174 0 0 1.1 KB
#5451 CallElasticsearch net6.0 1.29μs 0.664ns 2.48ns 0.0143 0 0 1.03 KB
#5451 CallElasticsearch netcoreapp3.1 1.58μs 1.03ns 3.86ns 0.0137 0 0 1.03 KB
#5451 CallElasticsearch net472 2.72μs 2.64ns 9.52ns 0.166 0 0 1.04 KB
#5451 CallElasticsearchAsync net6.0 1.42μs 0.941ns 3.64ns 0.0143 0 0 1.01 KB
#5451 CallElasticsearchAsync netcoreapp3.1 1.78μs 0.979ns 3.66ns 0.0143 0 0 1.08 KB
#5451 CallElasticsearchAsync net472 2.58μs 3.09ns 12ns 0.174 0 0 1.1 KB
Benchmarks.Trace.GraphQLBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master ExecuteAsync net6.0 1.32μs 0.884ns 3.42ns 0.0133 0 0 952 B
master ExecuteAsync netcoreapp3.1 1.74μs 0.758ns 2.94ns 0.0128 0 0 952 B
master ExecuteAsync net472 1.79μs 1.37ns 5.13ns 0.145 0 0 915 B
#5451 ExecuteAsync net6.0 1.4μs 0.743ns 2.78ns 0.0132 0 0 952 B
#5451 ExecuteAsync netcoreapp3.1 1.69μs 1.24ns 4.81ns 0.0126 0 0 952 B
#5451 ExecuteAsync net472 1.77μs 1.05ns 3.93ns 0.145 0 0 915 B
Benchmarks.Trace.HttpClientBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master SendAsync net6.0 4.29μs 2.3ns 8.3ns 0.0319 0 0 2.27 KB
master SendAsync netcoreapp3.1 5.16μs 3.41ns 12.8ns 0.0364 0 0 2.81 KB
master SendAsync net472 8.02μs 4.85ns 18.8ns 0.504 0 0 3.18 KB
#5451 SendAsync net6.0 4.32μs 1.77ns 6.84ns 0.0308 0 0 2.27 KB
#5451 SendAsync netcoreapp3.1 5.14μs 3.57ns 13.8ns 0.0387 0 0 2.81 KB
#5451 SendAsync net472 7.93μs 2.93ns 10.9ns 0.503 0 0 3.18 KB
Benchmarks.Trace.ILoggerBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master EnrichedLog net6.0 1.52μs 0.892ns 3.34ns 0.0236 0 0 1.7 KB
master EnrichedLog netcoreapp3.1 2.28μs 0.653ns 2.35ns 0.0227 0 0 1.7 KB
master EnrichedLog net472 2.75μs 1.45ns 5.43ns 0.257 0 0 1.62 KB
#5451 EnrichedLog net6.0 1.59μs 0.865ns 3.35ns 0.0241 0 0 1.7 KB
#5451 EnrichedLog netcoreapp3.1 2.23μs 1.38ns 5.17ns 0.0224 0 0 1.7 KB
#5451 EnrichedLog net472 2.67μs 5.05ns 17.5ns 0.257 0 0 1.62 KB
Benchmarks.Trace.Log4netBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master EnrichedLog net6.0 113μs 260ns 1.01μs 0.0565 0 0 4.28 KB
master EnrichedLog netcoreapp3.1 119μs 121ns 468ns 0.0584 0 0 4.28 KB
master EnrichedLog net472 148μs 241ns 934ns 0.668 0.223 0 4.46 KB
#5451 EnrichedLog net6.0 113μs 155ns 599ns 0 0 0 4.28 KB
#5451 EnrichedLog netcoreapp3.1 118μs 209ns 781ns 0 0 0 4.28 KB
#5451 EnrichedLog net472 151μs 197ns 764ns 0.674 0.225 0 4.46 KB
Benchmarks.Trace.NLogBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master EnrichedLog net6.0 2.98μs 1.08ns 4.03ns 0.0313 0 0 2.25 KB
master EnrichedLog netcoreapp3.1 4.26μs 2.62ns 10.1ns 0.0299 0 0 2.25 KB
master EnrichedLog net472 4.81μs 1.43ns 4.95ns 0.328 0 0 2.07 KB
#5451 EnrichedLog net6.0 3.12μs 3.02ns 11.7ns 0.0307 0 0 2.25 KB
#5451 EnrichedLog netcoreapp3.1 4.2μs 1.82ns 7.04ns 0.0316 0 0 2.25 KB
#5451 EnrichedLog net472 4.78μs 1.65ns 6.41ns 0.327 0 0 2.07 KB
Benchmarks.Trace.RedisBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master SendReceive net6.0 1.45μs 0.569ns 2.13ns 0.0165 0 0 1.2 KB
master SendReceive netcoreapp3.1 1.95μs 2.81ns 10.9ns 0.0164 0 0 1.2 KB
master SendReceive net472 2.24μs 1.15ns 4.44ns 0.19 0 0 1.2 KB
#5451 SendReceive net6.0 1.56μs 1.12ns 4.19ns 0.0172 0 0 1.2 KB
#5451 SendReceive netcoreapp3.1 1.84μs 1.19ns 4.45ns 0.0166 0 0 1.2 KB
#5451 SendReceive net472 2.18μs 1.91ns 6.89ns 0.191 0.00109 0 1.2 KB
Benchmarks.Trace.SerilogBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master EnrichedLog net6.0 2.85μs 1.02ns 3.81ns 0.0215 0 0 1.6 KB
master EnrichedLog netcoreapp3.1 3.86μs 2.26ns 8.45ns 0.0212 0 0 1.65 KB
master EnrichedLog net472 4.46μs 3ns 11.2ns 0.322 0 0 2.04 KB
#5451 EnrichedLog net6.0 2.81μs 0.507ns 1.9ns 0.0227 0 0 1.6 KB
#5451 EnrichedLog netcoreapp3.1 4.02μs 1.45ns 5.23ns 0.0221 0 0 1.65 KB
#5451 EnrichedLog net472 4.51μs 2.91ns 11.3ns 0.324 0 0 2.04 KB
Benchmarks.Trace.SpanBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master StartFinishSpan net6.0 538ns 0.274ns 1.06ns 0.00812 0 0 576 B
master StartFinishSpan netcoreapp3.1 774ns 0.398ns 1.49ns 0.00754 0 0 576 B
master StartFinishSpan net472 798ns 0.556ns 2.15ns 0.0916 0 0 578 B
master StartFinishScope net6.0 565ns 0.152ns 0.569ns 0.00994 0 0 696 B
master StartFinishScope netcoreapp3.1 906ns 0.958ns 3.71ns 0.00946 0 0 696 B
master StartFinishScope net472 1.02μs 0.645ns 2.5ns 0.105 0 0 658 B
#5451 StartFinishSpan net6.0 555ns 0.182ns 0.682ns 0.00811 0 0 576 B
#5451 StartFinishSpan netcoreapp3.1 741ns 0.494ns 1.85ns 0.00792 0 0 576 B
#5451 StartFinishSpan net472 816ns 1.23ns 4.75ns 0.0915 0 0 578 B
#5451 StartFinishScope net6.0 568ns 0.209ns 0.782ns 0.00978 0 0 696 B
#5451 StartFinishScope netcoreapp3.1 971ns 0.637ns 2.47ns 0.00925 0 0 696 B
#5451 StartFinishScope net472 989ns 1.1ns 4.26ns 0.104 0 0 658 B
Benchmarks.Trace.TraceAnnotationsBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master RunOnMethodBegin net6.0 705ns 0.294ns 1.14ns 0.00957 0 0 696 B
master RunOnMethodBegin netcoreapp3.1 947ns 0.566ns 2.12ns 0.00954 0 0 696 B
master RunOnMethodBegin net472 1.14μs 0.672ns 2.6ns 0.104 0 0 658 B
#5451 RunOnMethodBegin net6.0 744ns 0.257ns 0.996ns 0.00967 0 0 696 B
#5451 RunOnMethodBegin netcoreapp3.1 963ns 0.861ns 3.33ns 0.0092 0 0 696 B
#5451 RunOnMethodBegin net472 1.19μs 0.44ns 1.7ns 0.104 0 0 658 B

Copy link
Member

@andrewlock andrewlock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:blindfold:


private static bool ShouldRedactFrame(string? assemblyName)
{
// It would be nice to get those names directly from the source-generated InstrumentationDefinitions.IsInstrumentedAssembly
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should move this out of the source generation into a Nuke step? 🤔 That would make it easier to share, but adds a bit of extra noise dev time when adding an integration (like the trimming/nullability files that we all forget to update). Not this PR obviously, but this is going to go out of date as soon as a new integration is added, and we can't expect people to just remember to update this, especially if there's no explicit tests for it...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a job for future-one-of-us

{
var value = Environment.GetEnvironmentVariable(ConfigurationKeys.Telemetry.Enabled);

if (string.Equals(value, "false", StringComparison.OrdinalIgnoreCase) || value == "0")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heh, surprised we don't have a helper for this on our IConfigurationSource in dd_dotnet


private static bool IsTelemetryEnabled()
{
var value = Environment.GetEnvironmentVariable(ConfigurationKeys.Telemetry.Enabled);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we should be using the full ConfigurationSource here, incase the variable is set elsewhere (e.g. in json). But maybe that's too much overhead/risk to be thinking about at this point?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'm not sure it's worth the complexity

// DD_TRACE_CRASH_HANDLER_PASSTHROUGH environment variable, which codifies the result of the
// "was COMPlus_DbgEnableMiniDump set?" check.

SkipOn.Platform(SkipOn.PlatformValue.MacOs);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of this works on Windows either, right? If it does, we should add the RunOnWindows trait so that we test it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's linux only.


File.Exists(reportFile.Path).Should().BeTrue();

var report = JObject.Parse(reportFile.GetContent());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it's worth snapshot testing this? Probably not due to too much variation in runtimes, but just a thought 🤷‍♂️

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost everything will be different from one run to the other, so yuck

];
}

private class TemporaryFile : IDisposable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Seems like something genuinely useful to put into TestHelpers

return Path.Combine(EnvironmentHelper.GetMonitoringHomePath(), rid, RuntimeInformation.IsOSPlatform(OSPlatform.Windows) ? "dd-dotnet.exe" : "dd-dotnet");
}

internal static bool IsAlpine()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In CI, we also just set a variable that you can check 😃

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly I prefer this way. At least it doesn't break in mysterious way when you run it locally and forget to set the environment variable.


auto resolved = resolveManagedCallstack(tid, context, &managedCallstack, &numberOfManagedFrames);

std::vector<StackFrame> managedFrames;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can call reserve(numberOfManagedFrames) on managedFrames to preallocate and avoid temporary allocations.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 4c07518

{
for (int i = 0; i < numberOfManagedFrames; i++)
{
auto managedFrame = managedCallstack[i];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use auto const& managedFrame to avoid the copy.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 4c07518

stackFrame.symbolAddress = managedFrame.symbolAddress;
stackFrame.isSuspicious = managedFrame.isSuspicious;

managedFrames.push_back(stackFrame);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use std::move(stackFrame)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 4c07518

}

// TODO: Check if the stacktrace is from the tracer or the profiler
stackFrame.isSuspicious = false;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on line 238, you can use std::move(stackFrame)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 4c07518


std::vector<StackFrame> CrashReportingLinux::MergeFrames(const std::vector<StackFrame>& nativeFrames, const std::vector<StackFrame>& managedFrames)
{
std::vector<StackFrame> result;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can call reserve(std::max(nativeFrames.size(), managedFrames.size()) to preallocate and avoid temporary allocations.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 4c07518

@kevingosse kevingosse requested a review from a team as a code owner May 13, 2024 14:18
Copy link
Contributor

@daniel-romano-DD daniel-romano-DD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Superb job

@kevingosse kevingosse merged commit 0597c72 into master May 15, 2024
65 checks passed
@kevingosse kevingosse deleted the kevin/report_crash_local branch May 15, 2024 10:45
@github-actions github-actions bot added this to the vNext-v2 milestone May 15, 2024
@andrewlock andrewlock added area:tracer The core tracer library (Datadog.Trace, does not include OpenTracing, native code, or integrations) type:new-feature area:native-library Automatic instrumentation native C++ code (Datadog.Trace.ClrProfiler.Native) labels May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:native-library Automatic instrumentation native C++ code (Datadog.Trace.ClrProfiler.Native) area:tracer The core tracer library (Datadog.Trace, does not include OpenTracing, native code, or integrations) type:new-feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants