Crash tracking #5451

kevingosse · 2024-04-16T15:30:19Z

Summary of changes

Monitor crashes in instrumented applications. If there is suspicion that the crash might be caused by us, send a crash report through telemetry logs.

We currently only support Linux.

The crash handler is enabled by setting the DD_TRACE_CRASH_HANDLER environment variable and making it point to dd-dotnet. The goal is to have SSI automatically set that environment variable.

Reason for change

When we crash customer applications, it can take them days/weeks before they narrow it down to Datadog and open a support ticket. With this, hopefully we can be more proactive.

Implementation details

On Linux, there is no good way to know if a .NET application is crashing (listening to signals is not enough because .NET use segfault signals even for NullReferenceException that are going to be caught). The solution we found is to use LD_PRELOAD to hook the execve syscall to find out if .NET is trying to call createdump. If that's the case, we redirect the call to dd-dotnet instead.

In dd-dotnet, we use ClrMD to find the managed exception that caused the crash (if it's a managed crash). To resolve the callstack, we use some native code (written in the profiler) relying on libunwind. The native code callbacks the managed code to resolve the managed frames using ClrMD.
All the data we collect is sent to libdatadog, which is responsible of formatting and sending the crash report.

There are currently two heuristics implemented to try to figure out if we caused the crash:

If there's a managed exception, we check if it's one of the exception types that we usually get from instrumentation errors (InvalidProgramException, VerificationException, MissingMethodException, BadImageFormatException), or if the exception is in the Datadog.* namespace (for instance for ducktyping exceptions)
For each of the frames in the thread that caused the crash, we check if it's a Datadog function other than the BlockingMiddleware or the TaskContinuationGenerator (because they "stay" in the callstack even if they do nothing)

The crash tracker tries to be discreet, and only display information if we think we caused the crash (or if there's a really unexpected error).

We still want customers to be able to generate crash dumps by setting COMPlus_DbgEnableMiniDump. If it was set, dd-dotnet calls createdump after generating the crash report. Because Datadog.Linux.ApiWrapper will always set COMPlus_DbgEnableMiniDump (because otherwise .NET won't call createdump, and so there is no execve call to hook), we set the DD_TRACE_CRASH_HANDLER_PASSTHROUGH environment variable. dd-dotnet uses that information to know if COMPlus_DbgEnableMiniDump was originally set or just added by us, and to know if it must forward the call to createdump.

When telemetry is explicitly disabled, we disable the crash report.

Test coverage

Added some tests to check that dd-dotnet is correctly invoked. I will add more tests to validate the report itself.

Other details

Please be very careful about reviewing the changes in Datadog.Linux.ApiWrapper. Be nitpicky, let nothing slide. If you're feeling lazy, think the PR is too big, and decide to review only one file, that's the one. It runs in all applications so it's paramount that this part of the code is perfect.

Things left before merging:

It doesn't work on ARM64 in the CI (there is a permission error). For now I disabled the test. I need to check why it fails, in case it hides a bigger issue
I need to add tests to validate the structure of the crash report
Libdatadog is supposed to automatically detect the URL of the agent, but it's not currently working. Waiting for the libdatadog people to publish an updated version

andrewlock · 2024-04-16T16:02:51Z

Execution-Time Benchmarks Report ⏱️

Execution-time results for samples comparing the following branches/commits:

Execution-time benchmarks measure the whole time it takes to execute a program. And are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are shown in red. The following thresholds were used for comparing the execution times:

Welch test with statistical test for significance of 5%
Only results indicating a difference greater than 5% and 5 ms are considered.

Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard.

Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph).

gantt
    title Execution time (ms) FakeDbCommand (.NET Framework 4.6.2) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (5451) - mean (75ms)  : 63, 87
     .   : milestone, 75,
    master - mean (75ms)  : 64, 86
     .   : milestone, 75,

    section CallTarget+Inlining+NGEN
    This PR (5451) - mean (989ms)  : 958, 1019
     .   : milestone, 989,
    master - mean (998ms)  : 959, 1036
     .   : milestone, 998,

gantt
    title Execution time (ms) FakeDbCommand (.NET Core 3.1) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (5451) - mean (110ms)  : 108, 113
     .   : milestone, 110,
    master - mean (110ms)  : 106, 113
     .   : milestone, 110,

    section CallTarget+Inlining+NGEN
    This PR (5451) - mean (698ms)  : 668, 728
     .   : milestone, 698,
    master - mean (701ms)  : 673, 728
     .   : milestone, 701,

gantt
    title Execution time (ms) FakeDbCommand (.NET 6) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (5451) - mean (92ms)  : 90, 94
     .   : milestone, 92,
    master - mean (94ms)  : 90, 98
     .   : milestone, 94,

    section CallTarget+Inlining+NGEN
    This PR (5451) - mean (653ms)  : 630, 677
     .   : milestone, 653,
    master - mean (655ms)  : 630, 679
     .   : milestone, 655,

gantt
    title Execution time (ms) HttpMessageHandler (.NET Framework 4.6.2) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (5451) - mean (191ms)  : 185, 197
     .   : milestone, 191,
    master - mean (190ms)  : 186, 194
     .   : milestone, 190,

    section CallTarget+Inlining+NGEN
    This PR (5451) - mean (1,079ms)  : 1051, 1106
     .   : milestone, 1079,
    master - mean (1,079ms)  : 1052, 1106
     .   : milestone, 1079,

gantt
    title Execution time (ms) HttpMessageHandler (.NET Core 3.1) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (5451) - mean (277ms)  : 270, 283
     .   : milestone, 277,
    master - mean (276ms)  : 271, 281
     .   : milestone, 276,

    section CallTarget+Inlining+NGEN
    This PR (5451) - mean (862ms)  : 838, 886
     .   : milestone, 862,
    master - mean (864ms)  : 841, 888
     .   : milestone, 864,

gantt
    title Execution time (ms) HttpMessageHandler (.NET 6) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (5451) - mean (265ms)  : 260, 271
     .   : milestone, 265,
    master - mean (265ms)  : 261, 269
     .   : milestone, 265,

    section CallTarget+Inlining+NGEN
    This PR (5451) - mean (855ms)  : 822, 888
     .   : milestone, 855,
    master - mean (845ms)  : 824, 866
     .   : milestone, 845,

datadog-ddstaging · 2024-04-16T16:07:28Z

Datadog Report

Branch report: kevin/report_crash_local
Commit report: 3d0f86b
Test service: dd-trace-dotnet

✅ 0 Failed, 331892 Passed, 1836 Skipped, 14h 56m 57.81s Total Time

andrewlock · 2024-04-17T17:13:54Z

Throughput/Crank Report:zap:

Throughput results for AspNetCoreSimpleController comparing the following branches/commits:

Cases where throughput results for the PR are worse than latest master (5% drop or greater), results are shown in red.

Note that these results are based on a single point-in-time result for each branch. For full results, see one of the many, many dashboards!

gantt
    title Throughput Linux x64 (Total requests) 
    dateFormat  X
    axisFormat %s
    section Baseline
    This PR (5451) (11.872M)   : 0, 11872182
    master (11.849M)   : 0, 11849466
    benchmarks/2.9.0 (11.905M)   : 0, 11905297

    section Automatic
    This PR (5451) (8.035M)   : 0, 8035235
    master (7.908M)   : 0, 7907936
    benchmarks/2.9.0 (8.378M)   : 0, 8378100

    section Trace stats
    master (8.335M)   : 0, 8334859

    section Manual
    This PR (5451) (10.208M)   : 0, 10207518
    master (10.049M)   : 0, 10048746

    section Manual + Automatic
    This PR (5451) (7.591M)   : 0, 7590568
    master (7.580M)   : 0, 7579532

    section Version Conflict
    master (6.733M)   : 0, 6733343

gantt
    title Throughput Linux arm64 (Total requests) 
    dateFormat  X
    axisFormat %s
    section Baseline
    This PR (5451) (9.651M)   : 0, 9651452
    master (9.620M)   : 0, 9619876
    benchmarks/2.9.0 (9.517M)   : 0, 9516587

    section Automatic
    This PR (5451) (6.637M)   : 0, 6637280
    master (6.446M)   : 0, 6446273

    section Trace stats
    master (6.992M)   : 0, 6992127

    section Manual
    This PR (5451) (8.223M)   : 0, 8222877
    master (8.285M)   : 0, 8285417

    section Manual + Automatic
    This PR (5451) (6.269M)   : 0, 6269138
    master (6.251M)   : 0, 6251406

    section Version Conflict
    master (5.629M)   : 0, 5629094

gantt
    title Throughput Windows x64 (Total requests) 
    dateFormat  X
    axisFormat %s
    section Baseline
    This PR (5451) (9.697M)   : 0, 9697494
    master (9.953M)   : 0, 9953121
    benchmarks/2.9.0 (10.013M)   : 0, 10013461

    section Automatic
    This PR (5451) (6.990M)   : 0, 6989856
    master (7.141M)   : 0, 7141172
    benchmarks/2.9.0 (7.404M)   : 0, 7403989

    section Trace stats
    master (7.366M)   : 0, 7366104

    section Manual
    This PR (5451) (8.593M)   : 0, 8592975
    master (8.746M)   : 0, 8745815

    section Manual + Automatic
    This PR (5451) (6.698M)   : 0, 6698134
    master (6.776M)   : 0, 6776058

    section Version Conflict
    master (6.176M)   : 0, 6175582

andrewlock · 2024-04-18T18:58:22Z

Benchmarks Report for tracer 🐌

Benchmarks for #5451 compared to master:

All benchmarks have the same speed
All benchmarks have the same allocations

The following thresholds were used for comparing the benchmark speeds:

Mann–Whitney U test with statistical test for significance of 5%
Only results indicating a difference greater than 10% and 0.3 ns are considered.

Allocation changes below 0.5% are ignored.

Benchmark details

Benchmarks.Trace.ActivityBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Gen 1	Gen 2	Allocated
master	`StartStopWithChild`	net6.0	8.88μs	47.1ns	235ns	0.0171	0.00855	0	7.55 KB
master	`StartStopWithChild`	netcoreapp3.1	10.9μs	59ns	339ns	0.0305	0.0153	0	7.65 KB
master	`StartStopWithChild`	net472	17.2μs	65.2ns	253ns	1.36	0.372	0.11	8.11 KB
#5451	`StartStopWithChild`	net6.0	8.84μs	46.5ns	246ns	0.0214	0.00857	0	7.55 KB
#5451	`StartStopWithChild`	netcoreapp3.1	11μs	59.1ns	313ns	0.032	0.016	0	7.64 KB
#5451	`StartStopWithChild`	net472	17.1μs	36.2ns	140ns	1.36	0.368	0.109	8.1 KB

Benchmarks.Trace.AgentWriterBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`WriteAndFlushEnrichedTraces`	net6.0	475μs	192ns	691ns	0	2.7 KB
master	`WriteAndFlushEnrichedTraces`	netcoreapp3.1	646μs	441ns	1.71μs	0	2.7 KB
master	`WriteAndFlushEnrichedTraces`	net472	841μs	440ns	1.65μs	0.419	3.3 KB
#5451	`WriteAndFlushEnrichedTraces`	net6.0	465μs	411ns	1.54μs	0	2.7 KB
#5451	`WriteAndFlushEnrichedTraces`	netcoreapp3.1	636μs	494ns	1.91μs	0	2.7 KB
#5451	`WriteAndFlushEnrichedTraces`	net472	832μs	580ns	2.17μs	0.414	3.3 KB

Benchmarks.Trace.AspNetCoreBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`SendRequest`	net6.0	175μs	883ns	4.04μs	0.254	18.49 KB
master	`SendRequest`	netcoreapp3.1	189μs	231ns	865ns	0.19	20.65 KB
master	`SendRequest`	net472	5.29E‑05ns	3.69E‑05ns	0.000138ns	0	0 b
#5451	`SendRequest`	net6.0	170μs	186ns	720ns	0.254	18.49 KB
#5451	`SendRequest`	netcoreapp3.1	193μs	278ns	1.08μs	0.191	20.65 KB
#5451	`SendRequest`	net472	0.00028ns	0.000115ns	0.000429ns	0	0 b

Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Gen 1	Gen 2	Allocated
master	`WriteAndFlushEnrichedTraces`	net6.0	556μs	637ns	2.47μs	0.278	0	0	41.77 KB
master	`WriteAndFlushEnrichedTraces`	netcoreapp3.1	653μs	1.54μs	5.76μs	0.326	0	0	41.53 KB
master	`WriteAndFlushEnrichedTraces`	net472	826μs	2.72μs	10.2μs	8.22	2.47	0.411	53.24 KB
#5451	`WriteAndFlushEnrichedTraces`	net6.0	554μs	343ns	1.33μs	0.548	0	0	41.8 KB
#5451	`WriteAndFlushEnrichedTraces`	netcoreapp3.1	657μs	2.23μs	8.62μs	0.324	0	0	41.59 KB
#5451	`WriteAndFlushEnrichedTraces`	net472	863μs	4.16μs	17.2μs	8.08	2.55	0.425	53.27 KB

Benchmarks.Trace.DbCommandBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`ExecuteNonQuery`	net6.0	1.15μs	3.52ns	13.2ns	0.0115	808 B
master	`ExecuteNonQuery`	netcoreapp3.1	1.57μs	2.63ns	10.2ns	0.0107	808 B
master	`ExecuteNonQuery`	net472	1.81μs	0.511ns	1.91ns	0.122	770 B
#5451	`ExecuteNonQuery`	net6.0	1.16μs	0.487ns	1.89ns	0.0111	808 B
#5451	`ExecuteNonQuery`	netcoreapp3.1	1.45μs	0.836ns	3.01ns	0.0111	808 B
#5451	`ExecuteNonQuery`	net472	1.79μs	0.41ns	1.54ns	0.122	770 B

Benchmarks.Trace.ElasticsearchBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Gen 1	Allocated
master	`CallElasticsearch`	net6.0	1.25μs	0.424ns	1.59ns	0.0145	0	1.03 KB
master	`CallElasticsearch`	netcoreapp3.1	1.7μs	0.642ns	2.4ns	0.0136	0	1.03 KB
master	`CallElasticsearch`	net472	2.47μs	1.12ns	4.18ns	0.166	0.00124	1.04 KB
master	`CallElasticsearchAsync`	net6.0	1.36μs	0.679ns	2.63ns	0.0137	0	1.01 KB
master	`CallElasticsearchAsync`	netcoreapp3.1	1.73μs	1.26ns	4.89ns	0.0148	0	1.08 KB
master	`CallElasticsearchAsync`	net472	2.64μs	1.16ns	4.17ns	0.174	0	1.1 KB
#5451	`CallElasticsearch`	net6.0	1.29μs	0.664ns	2.48ns	0.0143	0	1.03 KB
#5451	`CallElasticsearch`	netcoreapp3.1	1.58μs	1.03ns	3.86ns	0.0137	0	1.03 KB
#5451	`CallElasticsearch`	net472	2.72μs	2.64ns	9.52ns	0.166	0	1.04 KB
#5451	`CallElasticsearchAsync`	net6.0	1.42μs	0.941ns	3.64ns	0.0143	0	1.01 KB
#5451	`CallElasticsearchAsync`	netcoreapp3.1	1.78μs	0.979ns	3.66ns	0.0143	0	1.08 KB
#5451	`CallElasticsearchAsync`	net472	2.58μs	3.09ns	12ns	0.174	0	1.1 KB

Benchmarks.Trace.GraphQLBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`ExecuteAsync`	net6.0	1.32μs	0.884ns	3.42ns	0.0133	952 B
master	`ExecuteAsync`	netcoreapp3.1	1.74μs	0.758ns	2.94ns	0.0128	952 B
master	`ExecuteAsync`	net472	1.79μs	1.37ns	5.13ns	0.145	915 B
#5451	`ExecuteAsync`	net6.0	1.4μs	0.743ns	2.78ns	0.0132	952 B
#5451	`ExecuteAsync`	netcoreapp3.1	1.69μs	1.24ns	4.81ns	0.0126	952 B
#5451	`ExecuteAsync`	net472	1.77μs	1.05ns	3.93ns	0.145	915 B

Benchmarks.Trace.HttpClientBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`SendAsync`	net6.0	4.29μs	2.3ns	8.3ns	0.0319	2.27 KB
master	`SendAsync`	netcoreapp3.1	5.16μs	3.41ns	12.8ns	0.0364	2.81 KB
master	`SendAsync`	net472	8.02μs	4.85ns	18.8ns	0.504	3.18 KB
#5451	`SendAsync`	net6.0	4.32μs	1.77ns	6.84ns	0.0308	2.27 KB
#5451	`SendAsync`	netcoreapp3.1	5.14μs	3.57ns	13.8ns	0.0387	2.81 KB
#5451	`SendAsync`	net472	7.93μs	2.93ns	10.9ns	0.503	3.18 KB

Benchmarks.Trace.ILoggerBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`EnrichedLog`	net6.0	1.52μs	0.892ns	3.34ns	0.0236	1.7 KB
master	`EnrichedLog`	netcoreapp3.1	2.28μs	0.653ns	2.35ns	0.0227	1.7 KB
master	`EnrichedLog`	net472	2.75μs	1.45ns	5.43ns	0.257	1.62 KB
#5451	`EnrichedLog`	net6.0	1.59μs	0.865ns	3.35ns	0.0241	1.7 KB
#5451	`EnrichedLog`	netcoreapp3.1	2.23μs	1.38ns	5.17ns	0.0224	1.7 KB
#5451	`EnrichedLog`	net472	2.67μs	5.05ns	17.5ns	0.257	1.62 KB

Benchmarks.Trace.Log4netBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Gen 1	Allocated
master	`EnrichedLog`	net6.0	113μs	260ns	1.01μs	0.0565	0	4.28 KB
master	`EnrichedLog`	netcoreapp3.1	119μs	121ns	468ns	0.0584	0	4.28 KB
master	`EnrichedLog`	net472	148μs	241ns	934ns	0.668	0.223	4.46 KB
#5451	`EnrichedLog`	net6.0	113μs	155ns	599ns	0	0	4.28 KB
#5451	`EnrichedLog`	netcoreapp3.1	118μs	209ns	781ns	0	0	4.28 KB
#5451	`EnrichedLog`	net472	151μs	197ns	764ns	0.674	0.225	4.46 KB

Benchmarks.Trace.NLogBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`EnrichedLog`	net6.0	2.98μs	1.08ns	4.03ns	0.0313	2.25 KB
master	`EnrichedLog`	netcoreapp3.1	4.26μs	2.62ns	10.1ns	0.0299	2.25 KB
master	`EnrichedLog`	net472	4.81μs	1.43ns	4.95ns	0.328	2.07 KB
#5451	`EnrichedLog`	net6.0	3.12μs	3.02ns	11.7ns	0.0307	2.25 KB
#5451	`EnrichedLog`	netcoreapp3.1	4.2μs	1.82ns	7.04ns	0.0316	2.25 KB
#5451	`EnrichedLog`	net472	4.78μs	1.65ns	6.41ns	0.327	2.07 KB

Benchmarks.Trace.RedisBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Gen 1	Allocated
master	`SendReceive`	net6.0	1.45μs	0.569ns	2.13ns	0.0165	0	1.2 KB
master	`SendReceive`	netcoreapp3.1	1.95μs	2.81ns	10.9ns	0.0164	0	1.2 KB
master	`SendReceive`	net472	2.24μs	1.15ns	4.44ns	0.19	0	1.2 KB
#5451	`SendReceive`	net6.0	1.56μs	1.12ns	4.19ns	0.0172	0	1.2 KB
#5451	`SendReceive`	netcoreapp3.1	1.84μs	1.19ns	4.45ns	0.0166	0	1.2 KB
#5451	`SendReceive`	net472	2.18μs	1.91ns	6.89ns	0.191	0.00109	1.2 KB

Benchmarks.Trace.SerilogBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`EnrichedLog`	net6.0	2.85μs	1.02ns	3.81ns	0.0215	1.6 KB
master	`EnrichedLog`	netcoreapp3.1	3.86μs	2.26ns	8.45ns	0.0212	1.65 KB
master	`EnrichedLog`	net472	4.46μs	3ns	11.2ns	0.322	2.04 KB
#5451	`EnrichedLog`	net6.0	2.81μs	0.507ns	1.9ns	0.0227	1.6 KB
#5451	`EnrichedLog`	netcoreapp3.1	4.02μs	1.45ns	5.23ns	0.0221	1.65 KB
#5451	`EnrichedLog`	net472	4.51μs	2.91ns	11.3ns	0.324	2.04 KB

Benchmarks.Trace.SpanBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`StartFinishSpan`	net6.0	538ns	0.274ns	1.06ns	0.00812	576 B
master	`StartFinishSpan`	netcoreapp3.1	774ns	0.398ns	1.49ns	0.00754	576 B
master	`StartFinishSpan`	net472	798ns	0.556ns	2.15ns	0.0916	578 B
master	`StartFinishScope`	net6.0	565ns	0.152ns	0.569ns	0.00994	696 B
master	`StartFinishScope`	netcoreapp3.1	906ns	0.958ns	3.71ns	0.00946	696 B
master	`StartFinishScope`	net472	1.02μs	0.645ns	2.5ns	0.105	658 B
#5451	`StartFinishSpan`	net6.0	555ns	0.182ns	0.682ns	0.00811	576 B
#5451	`StartFinishSpan`	netcoreapp3.1	741ns	0.494ns	1.85ns	0.00792	576 B
#5451	`StartFinishSpan`	net472	816ns	1.23ns	4.75ns	0.0915	578 B
#5451	`StartFinishScope`	net6.0	568ns	0.209ns	0.782ns	0.00978	696 B
#5451	`StartFinishScope`	netcoreapp3.1	971ns	0.637ns	2.47ns	0.00925	696 B
#5451	`StartFinishScope`	net472	989ns	1.1ns	4.26ns	0.104	658 B

Benchmarks.Trace.TraceAnnotationsBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`RunOnMethodBegin`	net6.0	705ns	0.294ns	1.14ns	0.00957	696 B
master	`RunOnMethodBegin`	netcoreapp3.1	947ns	0.566ns	2.12ns	0.00954	696 B
master	`RunOnMethodBegin`	net472	1.14μs	0.672ns	2.6ns	0.104	658 B
#5451	`RunOnMethodBegin`	net6.0	744ns	0.257ns	0.996ns	0.00967	696 B
#5451	`RunOnMethodBegin`	netcoreapp3.1	963ns	0.861ns	3.33ns	0.0092	696 B
#5451	`RunOnMethodBegin`	net472	1.19μs	0.44ns	1.7ns	0.104	658 B

andrewlock

:blindfold:

tracer/src/Datadog.Trace.Tools.dd_dotnet/CreatedumpCommand.cs

andrewlock · 2024-05-03T08:54:54Z

tracer/src/Datadog.Trace.Tools.dd_dotnet/CreatedumpCommand.cs

+
+    private static bool ShouldRedactFrame(string? assemblyName)
+    {
+        // It would be nice to get those names directly from the source-generated InstrumentationDefinitions.IsInstrumentedAssembly


Maybe we should move this out of the source generation into a Nuke step? 🤔 That would make it easier to share, but adds a bit of extra noise dev time when adding an integration (like the trimming/nullability files that we all forget to update). Not this PR obviously, but this is going to go out of date as soon as a new integration is added, and we can't expect people to just remember to update this, especially if there's no explicit tests for it...

Sounds like a job for future-one-of-us

andrewlock · 2024-05-03T08:56:47Z

tracer/src/Datadog.Trace.Tools.dd_dotnet/CreatedumpCommand.cs

+    {
+        var value = Environment.GetEnvironmentVariable(ConfigurationKeys.Telemetry.Enabled);
+
+        if (string.Equals(value, "false", StringComparison.OrdinalIgnoreCase) || value == "0")


heh, surprised we don't have a helper for this on our IConfigurationSource in dd_dotnet

andrewlock · 2024-05-03T08:57:31Z

tracer/src/Datadog.Trace.Tools.dd_dotnet/CreatedumpCommand.cs

+
+    private static bool IsTelemetryEnabled()
+    {
+        var value = Environment.GetEnvironmentVariable(ConfigurationKeys.Telemetry.Enabled);


I'm wondering if we should be using the full ConfigurationSource here, incase the variable is set elsewhere (e.g. in json). But maybe that's too much overhead/risk to be thinking about at this point?

Yeah I'm not sure it's worth the complexity

andrewlock · 2024-05-03T09:20:02Z

tracer/test/Datadog.Trace.Tools.dd_dotnet.ArtifactTests/CreatedumpTests.cs

+        // DD_TRACE_CRASH_HANDLER_PASSTHROUGH environment variable, which codifies the result of the
+        // "was COMPlus_DbgEnableMiniDump set?" check.
+
+        SkipOn.Platform(SkipOn.PlatformValue.MacOs);


None of this works on Windows either, right? If it does, we should add the RunOnWindows trait so that we test it

Yeah it's linux only.

andrewlock · 2024-05-03T09:22:46Z

tracer/test/Datadog.Trace.Tools.dd_dotnet.ArtifactTests/CreatedumpTests.cs

+
+        File.Exists(reportFile.Path).Should().BeTrue();
+
+        var report = JObject.Parse(reportFile.GetContent());


I wonder if it's worth snapshot testing this? Probably not due to too much variation in runtimes, but just a thought 🤷‍♂️

Almost everything will be different from one run to the other, so yuck

andrewlock · 2024-05-03T09:23:42Z

tracer/test/Datadog.Trace.Tools.dd_dotnet.ArtifactTests/CreatedumpTests.cs

+        ];
+    }
+
+    private class TemporaryFile : IDisposable


nit: Seems like something genuinely useful to put into TestHelpers

tracer/test/Datadog.Trace.Tools.dd_dotnet.ArtifactTests/Utils.cs

andrewlock · 2024-05-03T09:25:20Z

tracer/test/Datadog.Trace.Tools.dd_dotnet.ArtifactTests/Utils.cs

+        return Path.Combine(EnvironmentHelper.GetMonitoringHomePath(), rid, RuntimeInformation.IsOSPlatform(OSPlatform.Windows) ? "dd-dotnet.exe" : "dd-dotnet");
+    }
+
+    internal static bool IsAlpine()


In CI, we also just set a variable that you can check 😃

Honestly I prefer this way. At least it doesn't break in mysterious way when you run it locally and forget to set the environment variable.

gleocadie · 2024-05-06T13:03:38Z