Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ASM] New marshalling system for Waf.Run calls to improve speed and reduce allocations #4302

Merged
merged 4 commits into from
Sep 16, 2023

Conversation

tonyredondo
Copy link
Member

@tonyredondo tonyredondo commented Jun 23, 2023

Summary of changes

Replace the marshalling system using in AppSec Context.Run waf call with a new one to avoid allocations and improve performance.
Basically, allocations are made on the tracer side and we don't rely on the waf's helpers method to create the structs.
Introduction of a mechanism to recycle memory faster.

Reason for change

Current Waf calls allocates a lot due to the marshalling of strings and conversion from UTF16 to UTF8.

Benchmarks are much improved (see below)
Reducing a lot of pressure to the garbage collector.

Implementation details

  • Not using the waf helpers anymore
  • Allocating our own memory
  • Using an unmanaged memory pool to recycle memory

@tonyredondo tonyredondo self-assigned this Jun 23, 2023
@tonyredondo tonyredondo changed the title New marshalling system for Waf.Run call to improve speed and reduce allocations New marshalling system for Waf.Run calls to improve speed and reduce allocations Jun 23, 2023
@tonyredondo tonyredondo marked this pull request as ready for review June 26, 2023 13:18
@tonyredondo tonyredondo requested review from a team as code owners June 26, 2023 13:18
@DataDog DataDog deleted a comment from andrewlock Jun 26, 2023
@DataDog DataDog deleted a comment from andrewlock Jun 26, 2023
@DataDog DataDog deleted a comment from andrewlock Jun 26, 2023
@DataDog DataDog deleted a comment from andrewlock Jun 26, 2023
@DataDog DataDog deleted a comment from andrewlock Jun 26, 2023
@DataDog DataDog deleted a comment from andrewlock Jun 26, 2023
@DataDog DataDog deleted a comment from andrewlock Jun 26, 2023
@DataDog DataDog deleted a comment from andrewlock Jun 26, 2023
@DataDog DataDog deleted a comment from andrewlock Jun 26, 2023
@DataDog DataDog deleted a comment from andrewlock Jun 26, 2023
@DataDog DataDog deleted a comment from datadog-ddstaging bot Jun 26, 2023
private static readonly IDatadogLogger Log = DatadogLogging.GetLoggerFor(typeof(Encoder));
private static readonly int ObjectStructSize = Marshal.SizeOf(typeof(DdwafObjectStruct));

[ThreadStatic]
Copy link
Contributor

@anna-git anna-git Jun 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for my understanding, why is 'ThreadStatic' necessary? 😅

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Unmanaged Pool is not thread safe for performance reasons, so the best way to avoid leaking memory in a concurrent scenario like ASP.NET is to create and mantain a pool for each thread (and just dispose the pool if the thread get's killed), this pool instance is copied to the Encoder stack so we avoid accessing a thread static twice in the same processing.

Copy link
Contributor

@anna-git anna-git left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks a lot!

@DataDog DataDog deleted a comment from andrewlock Jun 29, 2023
@DataDog DataDog deleted a comment from andrewlock Jun 29, 2023
@andrewlock

This comment has been minimized.

@andrewlock

This comment has been minimized.

@andrewlock

This comment has been minimized.

@andrewlock

This comment has been minimized.

@andrewlock
Copy link
Member

andrewlock commented Jul 18, 2023

Execution-Time Benchmarks Report ⏱️

Execution-time results for samples comparing the following branches/commits:

Execution-time benchmarks measure the whole time it takes to execute a program. And are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are shown in red. The following thresholds were used for comparing the execution times:

  • Welch test with statistical test for significance of 5%
  • Only results indicating a difference greater than 5% and 5 ms are considered.

Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard.

Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph).

gantt
    title Execution time (ms) FakeDbCommand (.NET Framework 4.6.2) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (4302) - mean (69ms)  : 64, 73
     .   : milestone, 69,
    master - mean (69ms)  : 63, 74
     .   : milestone, 69,

    section CallTarget+Inlining+NGEN
    This PR (4302) - mean (1,037ms)  : 1005, 1070
     .   : milestone, 1037,
    master - mean (1,028ms)  : 1011, 1045
     .   : milestone, 1028,

Loading
gantt
    title Execution time (ms) FakeDbCommand (.NET Core 3.1) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (4302) - mean (106ms)  : 101, 112
     .   : milestone, 106,
    master - mean (107ms)  : 94, 120
     .   : milestone, 107,

    section CallTarget+Inlining+NGEN
    This PR (4302) - mean (749ms)  : 734, 764
     .   : milestone, 749,
    master - mean (746ms)  : 732, 760
     .   : milestone, 746,

Loading
gantt
    title Execution time (ms) FakeDbCommand (.NET 6) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (4302) - mean (91ms)  : 84, 97
     .   : milestone, 91,
    master - mean (90ms)  : 83, 96
     .   : milestone, 90,

    section CallTarget+Inlining+NGEN
    This PR (4302) - mean (709ms)  : 693, 726
     .   : milestone, 709,
    master - mean (710ms)  : 694, 727
     .   : milestone, 710,

Loading
gantt
    title Execution time (ms) HttpMessageHandler (.NET Framework 4.6.2) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (4302) - mean (188ms)  : 182, 195
     .   : milestone, 188,
    master - mean (187ms)  : 179, 196
     .   : milestone, 187,

    section CallTarget+Inlining+NGEN
    This PR (4302) - mean (1,141ms)  : 1095, 1187
     .   : milestone, 1141,
    master - mean (1,124ms)  : 1098, 1149
     .   : milestone, 1124,

Loading
gantt
    title Execution time (ms) HttpMessageHandler (.NET Core 3.1) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (4302) - mean (276ms)  : 266, 285
     .   : milestone, 276,
    master - mean (270ms)  : 261, 280
     .   : milestone, 270,

    section CallTarget+Inlining+NGEN
    This PR (4302) - mean (1,099ms)  : 1065, 1132
     .   : milestone, 1099,
    master - mean (1,104ms)  : 1084, 1125
     .   : milestone, 1104,

Loading
gantt
    title Execution time (ms) HttpMessageHandler (.NET 6) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (4302) - mean (261ms)  : 250, 272
     .   : milestone, 261,
    master - mean (260ms)  : 247, 273
     .   : milestone, 260,

    section CallTarget+Inlining+NGEN
    This PR (4302) - mean (1,054ms)  : 1034, 1074
     .   : milestone, 1054,
    master - mean (1,053ms)  : 1024, 1081
     .   : milestone, 1053,

Loading

@andrewlock
Copy link
Member

andrewlock commented Jul 18, 2023

Benchmarks Report 🐌

Benchmarks for #4302 compared to master:

  • 20 benchmarks are faster, with geometric mean 2.095
  • 1 benchmarks are slower, with geometric mean 1.134
  • 21 benchmarks have fewer allocations
  • 3 benchmarks have more allocations

The following thresholds were used for comparing the benchmark speeds:

  • Mann–Whitney U test with statistical test for significance of 5%
  • Only results indicating a difference greater than 10% and 0.3 ns are considered.

Allocation changes below 0.5% are ignored.

Benchmark details

Benchmarks.Trace.ActivityBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master StartStopWithChild net6.0 8.21μs 41.7ns 225ns 0.0237 0.0119 0 7.29 KB
master StartStopWithChild netcoreapp3.1 10.3μs 58ns 376ns 0.0258 0.0103 0 7.39 KB
master StartStopWithChild net472 15.6μs 55.7ns 216ns 1.28 0.308 0.0923 7.66 KB
#4302 StartStopWithChild net6.0 8.29μs 47.4ns 345ns 0.0252 0.0126 0 7.29 KB
#4302 StartStopWithChild netcoreapp3.1 9.91μs 49.9ns 269ns 0.0404 0.0202 0.00505 7.38 KB
#4302 StartStopWithChild net472 15.6μs 32.9ns 123ns 1.3 0.341 0.116 7.66 KB
Benchmarks.Trace.AgentWriterBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master WriteAndFlushEnrichedTraces net6.0 470μs 279ns 1.08μs 0 0 0 2.7 KB
master WriteAndFlushEnrichedTraces netcoreapp3.1 660μs 254ns 951ns 0 0 0 2.7 KB
master WriteAndFlushEnrichedTraces net472 803μs 373ns 1.34μs 0.401 0 0 3.3 KB
#4302 WriteAndFlushEnrichedTraces net6.0 504μs 197ns 763ns 0 0 0 2.7 KB
#4302 WriteAndFlushEnrichedTraces netcoreapp3.1 633μs 273ns 1.02μs 0 0 0 2.7 KB
#4302 WriteAndFlushEnrichedTraces net472 794μs 217ns 782ns 0.396 0 0 3.3 KB
Benchmarks.Trace.Asm.AppSecBodyBenchmark - Same speed ✔️ More allocations ⚠️

More allocations ⚠️ in #4302

Benchmark Base Allocated Diff Allocated Change Change %
Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleSimpleBody‑netcoreapp3.1 1.75 KB 2.01 KB 256 B 14.61%
Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleSimpleBody‑net6.0 1.78 KB 2.03 KB 256 B 14.41%
Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleSimpleBody‑net472 1.82 KB 2.08 KB 256 B 14.05%

Fewer allocations 🎉 in #4302

Benchmark Base Allocated Diff Allocated Change Change %
Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleMoreComplexBody‑net472 9.42 KB 8.7 KB -723 B -7.67%
Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleMoreComplexBody‑net6.0 9.35 KB 8.63 KB -720 B -7.70%
Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleMoreComplexBody‑netcoreapp3.1 9.24 KB 8.52 KB -721 B -7.80%

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master AllCycleSimpleBody net6.0 42.8μs 79.7ns 287ns 0.0215 0 0 1.78 KB
master AllCycleSimpleBody netcoreapp3.1 45.6μs 84.1ns 315ns 0 0 0 1.75 KB
master AllCycleSimpleBody net472 47.9μs 15.7ns 60.7ns 0.283 0 0 1.82 KB
master AllCycleMoreComplexBody net6.0 235μs 88.1ns 341ns 0.118 0 0 9.35 KB
master AllCycleMoreComplexBody netcoreapp3.1 251μs 368ns 1.43μs 0 0 0 9.24 KB
master AllCycleMoreComplexBody net472 261μs 282ns 1.09μs 1.44 0 0 9.42 KB
master ObjectExtractorSimpleBody net6.0 116ns 0.0364ns 0.131ns 0.00395 0 0 280 B
master ObjectExtractorSimpleBody netcoreapp3.1 175ns 0.178ns 0.691ns 0.00367 0 0 272 B
master ObjectExtractorSimpleBody net472 144ns 0.128ns 0.496ns 0.0446 0 0 281 B
master ObjectExtractorMoreComplexBody net6.0 2.94μs 0.56ns 2.1ns 0.0543 0 0 3.88 KB
master ObjectExtractorMoreComplexBody netcoreapp3.1 4.06μs 1.71ns 6.4ns 0.0508 0 0 3.78 KB
master ObjectExtractorMoreComplexBody net472 4.15μs 4.27ns 16ns 0.617 0.00615 0 3.89 KB
#4302 AllCycleSimpleBody net6.0 42.3μs 15.9ns 59.6ns 0.021 0 0 2.03 KB
#4302 AllCycleSimpleBody netcoreapp3.1 44.1μs 65ns 234ns 0.022 0 0 2.01 KB
#4302 AllCycleSimpleBody net472 45.4μs 62.8ns 243ns 0.316 0 0 2.08 KB
#4302 AllCycleMoreComplexBody net6.0 228μs 56.2ns 210ns 0 0 0 8.63 KB
#4302 AllCycleMoreComplexBody netcoreapp3.1 233μs 142ns 550ns 0 0 0 8.52 KB
#4302 AllCycleMoreComplexBody net472 236μs 140ns 542ns 1.3 0 0 8.7 KB
#4302 ObjectExtractorSimpleBody net6.0 128ns 0.0487ns 0.182ns 0.00396 0 0 280 B
#4302 ObjectExtractorSimpleBody netcoreapp3.1 174ns 0.0408ns 0.158ns 0.00374 0 0 272 B
#4302 ObjectExtractorSimpleBody net472 146ns 0.0931ns 0.361ns 0.0446 0 0 281 B
#4302 ObjectExtractorMoreComplexBody net6.0 2.98μs 0.785ns 2.94ns 0.0551 0 0 3.88 KB
#4302 ObjectExtractorMoreComplexBody netcoreapp3.1 4.05μs 2.01ns 7.8ns 0.0506 0 0 3.78 KB
#4302 ObjectExtractorMoreComplexBody net472 4.13μs 1.35ns 5.23ns 0.617 0.00619 0 3.89 KB
Benchmarks.Trace.Asm.AppSecWafBenchmark - Faster 🎉 Fewer allocations 🎉

Faster 🎉 in #4302

Benchmark base/diff Base Median (ns) Diff Median (ns) Modality
Benchmarks.Trace.Asm.AppSecWafBenchmark.RunWaf(args: NestedMap (10))‑net6.0 3.884 49,313.75 12,695.19
Benchmarks.Trace.Asm.AppSecWafBenchmark.RunWaf(args: NestedMap (10))‑net472 3.434 96,104.83 27,989.58
Benchmarks.Trace.Asm.AppSecWafBenchmark.RunWaf(args: NestedMap (10))‑netcoreapp3.1 3.403 67,385.13 19,800.35 bimodal
Benchmarks.Trace.Asm.AppSecWafBenchmark.RunWaf(args: NestedMap (1000))‑net6.0 2.918 130,238.34 44,640.39
Benchmarks.Trace.Asm.AppSecWafBenchmark.RunWaf(args: NestedMap (100))‑net6.0 2.801 126,958.86 45,321.28
Benchmarks.Trace.Asm.AppSecWafBenchmark.RunWaf(args: NestedMap (100))‑net472 2.729 222,391.87 81,492.96
Benchmarks.Trace.Asm.AppSecWafBenchmark.RunWaf(args: NestedMap (1000))‑netcoreapp3.1 2.723 171,701.07 63,055.73
Benchmarks.Trace.Asm.AppSecWafBenchmark.RunWaf(args: NestedMap (100))‑netcoreapp3.1 2.690 167,989.34 62,448.69
Benchmarks.Trace.Asm.AppSecWafBenchmark.RunWaf(args: NestedMap (1000))‑net472 2.557 213,375.31 83,437.91
Benchmarks.Trace.Asm.AppSecWafBenchmark.RunWafWithAttack(args: NestedMap (100, attack))‑net472 1.803 311,935.10 172,964.97

Fewer allocations 🎉 in #4302

Benchmark Base Allocated Diff Allocated Change Change %
Benchmarks.Trace.Asm.AppSecWafBenchmark.RunWafWithAttack(args: NestedMap (10, attack))‑net472 22.71 KB 16.04 KB -6.67 KB -29.37%
Benchmarks.Trace.Asm.AppSecWafBenchmark.RunWafWithAttack(args: NestedMap (10, attack))‑net6.0 22.42 KB 15.77 KB -6.65 KB -29.66%
Benchmarks.Trace.Asm.AppSecWafBenchmark.RunWafWithAttack(args: NestedMap (10, attack))‑netcoreapp3.1 22.37 KB 15.72 KB -6.65 KB -29.72%
Benchmarks.Trace.Asm.AppSecWafBenchmark.RunWafWithAttack(args: NestedMap (100, attack))‑net472 42.89 KB 29.84 KB -13.05 KB -30.42%
Benchmarks.Trace.Asm.AppSecWafBenchmark.RunWafWithAttack(args: NestedMap (1000, attack))‑net472 42.9 KB 29.85 KB -13.05 KB -30.42%
Benchmarks.Trace.Asm.AppSecWafBenchmark.RunWafWithAttack(args: NestedMap (1000, attack))‑netcoreapp3.1 41.69 KB 28.77 KB -12.91 KB -30.98%
Benchmarks.Trace.Asm.AppSecWafBenchmark.RunWafWithAttack(args: NestedMap (100, attack))‑netcoreapp3.1 41.7 KB 28.77 KB -12.93 KB -31.00%
Benchmarks.Trace.Asm.AppSecWafBenchmark.RunWafWithAttack(args: NestedMap (100, attack))‑net6.0 41.09 KB 27.99 KB -13.1 KB -31.89%
Benchmarks.Trace.Asm.AppSecWafBenchmark.RunWafWithAttack(args: NestedMap (1000, attack))‑net6.0 41.09 KB 27.98 KB -13.11 KB -31.90%
Benchmarks.Trace.Asm.AppSecWafBenchmark.RunWaf(args: NestedMap (100))‑net472 36.33 KB 23.29 KB -13.05 KB -35.91%

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master RunWaf(args=NestedMap (10)) net6.0 49.5μs 182ns 629ns 0.219 0 0 16.07 KB
master RunWaf(args=NestedMap (10)) netcoreapp3.1 67.7μs 381ns 2.64μs 0.2 0 0 16.06 KB
master RunWaf(args=NestedMap (10)) net472 96.1μs 35ns 121ns 2.54 0.0957 0 16.14 KB
master RunWafWithAttack(args=Neste(...)tack) [22]) net6.0 116μs 38.8ns 145ns 0.291 0 0 22.42 KB
master RunWafWithAttack(args=Neste(...)tack) [22]) netcoreapp3.1 138μs 527ns 1.9μs 0.276 0 0 22.37 KB
master RunWafWithAttack(args=Neste(...)tack) [22]) net472 166μs 75ns 291ns 3.54 0.165 0 22.71 KB
master RunWaf(args=NestedMap (100)) net6.0 126μs 604ns 2.34μs 0.447 0 0 34.74 KB
master RunWaf(args=NestedMap (100)) netcoreapp3.1 165μs 864ns 4.14μs 0.469 0 0 35.39 KB
master RunWaf(args=NestedMap (100)) net472 222μs 154ns 596ns 5.73 0.424 0 36.33 KB
master RunWafWithAttack(args=Neste(...)tack) [23]) net6.0 207μs 1.12μs 6.33μs 0.492 0 0 41.09 KB
master RunWafWithAttack(args=Neste(...)tack) [23]) netcoreapp3.1 247μs 955ns 3.7μs 0.497 0 0 41.7 KB
master RunWafWithAttack(args=Neste(...)tack) [23]) net472 311μs 538ns 2.08μs 6.76 0.601 0 42.89 KB
master RunWaf(args=NestedMap (1000)) net6.0 130μs 110ns 426ns 0.471 0 0 34.74 KB
master RunWaf(args=NestedMap (1000)) netcoreapp3.1 170μs 613ns 2.38μs 0.414 0 0 35.39 KB
master RunWaf(args=NestedMap (1000)) net472 213μs 65ns 252ns 5.76 0.427 0 36.34 KB
master RunWafWithAttack(args=Neste(...)tack) [24]) net6.0 204μs 185ns 718ns 0.52 0 0 41.09 KB
master RunWafWithAttack(args=Neste(...)tack) [24]) netcoreapp3.1 250μs 975ns 3.78μs 0.488 0 0 41.69 KB
master RunWafWithAttack(args=Neste(...)tack) [24]) net472 306μs 1.04μs 4.03μs 6.79 0.453 0 42.9 KB
#4302 RunWaf(args=NestedMap (10)) net6.0 12.7μs 3.02ns 10.9ns 0.128 0 0 9.42 KB
#4302 RunWaf(args=NestedMap (10)) netcoreapp3.1 19.8μs 7.04ns 25.4ns 0.128 0 0 9.42 KB
#4302 RunWaf(args=NestedMap (10)) net472 28μs 21.6ns 80.8ns 1.5 0 0 9.48 KB
#4302 RunWafWithAttack(args=Neste(...)tack) [22]) net6.0 77.4μs 46.9ns 181ns 0.195 0 0 15.77 KB
#4302 RunWafWithAttack(args=Neste(...)tack) [22]) netcoreapp3.1 84.9μs 132ns 513ns 0.211 0 0 15.72 KB
#4302 RunWafWithAttack(args=Neste(...)tack) [22]) net472 98.7μs 56.6ns 212ns 2.52 0 0 16.04 KB
#4302 RunWaf(args=NestedMap (100)) net6.0 45.4μs 57.1ns 221ns 0.294 0 0 21.64 KB
#4302 RunWaf(args=NestedMap (100)) netcoreapp3.1 62.5μs 64.3ns 240ns 0.31 0 0 22.47 KB
#4302 RunWaf(args=NestedMap (100)) net472 81.5μs 51.7ns 193ns 3.7 0.0406 0 23.29 KB
#4302 RunWafWithAttack(args=Neste(...)tack) [23]) net6.0 125μs 123ns 460ns 0.371 0 0 27.99 KB
#4302 RunWafWithAttack(args=Neste(...)tack) [23]) netcoreapp3.1 146μs 266ns 1.03μs 0.367 0 0 28.77 KB
#4302 RunWafWithAttack(args=Neste(...)tack) [23]) net472 173μs 153ns 593ns 4.74 0.0862 0 29.84 KB
#4302 RunWaf(args=NestedMap (1000)) net6.0 44.7μs 47.4ns 184ns 0.289 0 0 21.64 KB
#4302 RunWaf(args=NestedMap (1000)) netcoreapp3.1 63μs 71.7ns 278ns 0.284 0 0 22.47 KB
#4302 RunWaf(args=NestedMap (1000)) net472 83.5μs 53.5ns 207ns 3.66 0.0416 0 23.28 KB
#4302 RunWafWithAttack(args=Neste(...)tack) [24]) net6.0 127μs 157ns 608ns 0.384 0 0 27.98 KB
#4302 RunWafWithAttack(args=Neste(...)tack) [24]) netcoreapp3.1 146μs 292ns 1.09μs 0.367 0 0 28.77 KB
#4302 RunWafWithAttack(args=Neste(...)tack) [24]) net472 173μs 185ns 718ns 4.73 0.086 0 29.85 KB
Benchmarks.Trace.AspNetCoreBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master SendRequest net6.0 169μs 88.3ns 342ns 0.254 0 0 18.09 KB
master SendRequest netcoreapp3.1 189μs 327ns 1.27μs 0.189 0 0 20.25 KB
master SendRequest net472 0.000182ns 0.000101ns 0.000391ns 0 0 0 0 b
#4302 SendRequest net6.0 168μs 158ns 611ns 0.249 0 0 18.09 KB
#4302 SendRequest netcoreapp3.1 192μs 311ns 1.2μs 0.193 0 0 20.25 KB
#4302 SendRequest net472 0.00243ns 0.000367ns 0.00137ns 0 0 0 0 b
Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master WriteAndFlushEnrichedTraces net6.0 537μs 2.4μs 9.28μs 0.519 0 0 41.46 KB
master WriteAndFlushEnrichedTraces netcoreapp3.1 624μs 1.7μs 5.9μs 0.316 0 0 41.77 KB
master WriteAndFlushEnrichedTraces net472 815μs 3.81μs 17.9μs 8.58 2.45 0.408 53.26 KB
#4302 WriteAndFlushEnrichedTraces net6.0 532μs 1.42μs 5.12μs 0.506 0 0 41.48 KB
#4302 WriteAndFlushEnrichedTraces netcoreapp3.1 613μs 388ns 1.5μs 0.312 0 0 41.64 KB
#4302 WriteAndFlushEnrichedTraces net472 799μs 2.35μs 8.78μs 8.31 2.37 0.396 53.24 KB
Benchmarks.Trace.DbCommandBenchmark - Faster 🎉 Same allocations ✔️

Faster 🎉 in #4302

Benchmark base/diff Base Median (ns) Diff Median (ns) Modality
Benchmarks.Trace.DbCommandBenchmark.ExecuteNonQuery‑net6.0 1.198 1,174.96 981.11

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master ExecuteNonQuery net6.0 1.18μs 0.715ns 2.77ns 0.0106 0 0 768 B
master ExecuteNonQuery netcoreapp3.1 1.35μs 0.616ns 2.39ns 0.0101 0 0 768 B
master ExecuteNonQuery net472 1.63μs 0.372ns 1.44ns 0.116 0 0 730 B
#4302 ExecuteNonQuery net6.0 981ns 0.369ns 1.43ns 0.0104 0 0 768 B
#4302 ExecuteNonQuery netcoreapp3.1 1.42μs 0.453ns 1.7ns 0.00997 0 0 768 B
#4302 ExecuteNonQuery net472 1.58μs 0.447ns 1.67ns 0.116 0 0 730 B
Benchmarks.Trace.ElasticsearchBenchmark - Slower ⚠️ Same allocations ✔️

Slower ⚠️ in #4302

Benchmark diff/base Base Median (ns) Diff Median (ns) Modality
Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearch‑net472 1.134 2,231.82 2,530.40

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master CallElasticsearch net6.0 1.2μs 1.37ns 5.32ns 0.0138 0 0 992 B
master CallElasticsearch netcoreapp3.1 1.46μs 0.463ns 1.79ns 0.0131 0 0 992 B
master CallElasticsearch net472 2.23μs 0.631ns 2.36ns 0.159 0.00112 0 1 KB
master CallElasticsearchAsync net6.0 1.31μs 1.64ns 6.12ns 0.0137 0 0 968 B
master CallElasticsearchAsync netcoreapp3.1 1.51μs 0.815ns 3.05ns 0.0143 0 0 1.04 KB
master CallElasticsearchAsync net472 2.49μs 0.776ns 3ns 0.167 0 0 1.06 KB
#4302 CallElasticsearch net6.0 1.19μs 1.47ns 5.52ns 0.0137 0 0 992 B
#4302 CallElasticsearch netcoreapp3.1 1.41μs 0.795ns 2.98ns 0.0135 0 0 992 B
#4302 CallElasticsearch net472 2.53μs 0.522ns 2.02ns 0.158 0 0 1 KB
#4302 CallElasticsearchAsync net6.0 1.34μs 0.947ns 3.67ns 0.0134 0 0 968 B
#4302 CallElasticsearchAsync netcoreapp3.1 1.57μs 0.73ns 2.83ns 0.014 0 0 1.04 KB
#4302 CallElasticsearchAsync net472 2.52μs 0.571ns 2.14ns 0.168 0.00126 0 1.06 KB
Benchmarks.Trace.GraphQLBenchmark - Faster 🎉 Same allocations ✔️

Faster 🎉 in #4302

Benchmark base/diff Base Median (ns) Diff Median (ns) Modality
Benchmarks.Trace.GraphQLBenchmark.ExecuteAsync‑net6.0 1.134 1,335.74 1,177.50

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master ExecuteAsync net6.0 1.34μs 0.695ns 2.6ns 0.0129 0 0 912 B
master ExecuteAsync netcoreapp3.1 1.41μs 0.595ns 2.22ns 0.0121 0 0 912 B
master ExecuteAsync net472 1.73μs 1.34ns 5.19ns 0.139 0.000862 0 875 B
#4302 ExecuteAsync net6.0 1.18μs 0.634ns 2.37ns 0.0124 0 0 912 B
#4302 ExecuteAsync netcoreapp3.1 1.4μs 0.473ns 1.77ns 0.0119 0 0 912 B
#4302 ExecuteAsync net472 1.7μs 0.448ns 1.62ns 0.138 0.000853 0 875 B
Benchmarks.Trace.HttpClientBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master SendAsync net6.0 3.8μs 1.7ns 6.35ns 0.0267 0 0 1.94 KB
master SendAsync netcoreapp3.1 4.33μs 2.17ns 8.39ns 0.0325 0 0 2.48 KB
master SendAsync net472 7.23μs 4.65ns 18ns 0.482 0 0 3.05 KB
#4302 SendAsync net6.0 3.77μs 1.54ns 5.77ns 0.0264 0 0 1.94 KB
#4302 SendAsync netcoreapp3.1 4.48μs 1.85ns 6.94ns 0.0336 0 0 2.48 KB
#4302 SendAsync net472 7.08μs 2.28ns 8.53ns 0.481 0 0 3.05 KB
Benchmarks.Trace.ILoggerBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master EnrichedLog net6.0 1.34μs 3.19ns 12.4ns 0.0231 0 0 1.62 KB
master EnrichedLog netcoreapp3.1 2.01μs 0.971ns 3.63ns 0.0221 0 0 1.62 KB
master EnrichedLog net472 2.38μs 4.33ns 16.8ns 0.244 0 0 1.54 KB
#4302 EnrichedLog net6.0 1.32μs 0.569ns 2.13ns 0.0226 0 0 1.62 KB
#4302 EnrichedLog netcoreapp3.1 1.94μs 0.675ns 2.43ns 0.0218 0 0 1.62 KB
#4302 EnrichedLog net472 2.42μs 2.59ns 9.33ns 0.245 0 0 1.54 KB
Benchmarks.Trace.Log4netBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master EnrichedLog net6.0 114μs 192ns 742ns 0.0571 0 0 4.21 KB
master EnrichedLog netcoreapp3.1 118μs 179ns 669ns 0 0 0 4.21 KB
master EnrichedLog net472 150μs 159ns 594ns 0.671 0.224 0 4.38 KB
#4302 EnrichedLog net6.0 115μs 117ns 423ns 0.0582 0 0 4.21 KB
#4302 EnrichedLog netcoreapp3.1 119μs 273ns 1.02μs 0.0594 0 0 4.21 KB
#4302 EnrichedLog net472 150μs 96ns 346ns 0.674 0.225 0 4.38 KB
Benchmarks.Trace.NLogBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master EnrichedLog net6.0 2.96μs 1ns 3.75ns 0.0298 0 0 2.18 KB
master EnrichedLog netcoreapp3.1 3.77μs 1.09ns 4.07ns 0.0282 0 0 2.18 KB
master EnrichedLog net472 4.59μs 2.28ns 8.84ns 0.314 0 0 1.99 KB
#4302 EnrichedLog net6.0 2.89μs 1.41ns 5.29ns 0.03 0 0 2.18 KB
#4302 EnrichedLog netcoreapp3.1 3.94μs 1.4ns 5.24ns 0.0295 0 0 2.18 KB
#4302 EnrichedLog net472 4.59μs 3.4ns 12.7ns 0.316 0 0 1.99 KB
Benchmarks.Trace.RedisBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master SendReceive net6.0 1.24μs 0.844ns 3.16ns 0.0162 0 0 1.16 KB
master SendReceive netcoreapp3.1 1.62μs 2.34ns 8.1ns 0.0155 0 0 1.16 KB
master SendReceive net472 2.12μs 2.2ns 8.53ns 0.185 0 0 1.16 KB
#4302 SendReceive net6.0 1.27μs 1.1ns 4.26ns 0.0159 0 0 1.16 KB
#4302 SendReceive netcoreapp3.1 1.49μs 0.771ns 2.88ns 0.0159 0 0 1.16 KB
#4302 SendReceive net472 2.14μs 2.57ns 9.95ns 0.184 0 0 1.16 KB
Benchmarks.Trace.SerilogBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master EnrichedLog net6.0 2.62μs 1.77ns 6.86ns 0.0209 0 0 1.53 KB
master EnrichedLog netcoreapp3.1 3.66μs 1.16ns 4.33ns 0.0201 0 0 1.58 KB
master EnrichedLog net472 3.97μs 3.62ns 13.5ns 0.309 0 0 1.96 KB
#4302 EnrichedLog net6.0 2.5μs 0.846ns 3.28ns 0.0213 0 0 1.53 KB
#4302 EnrichedLog netcoreapp3.1 3.61μs 1.76ns 6.81ns 0.0216 0 0 1.58 KB
#4302 EnrichedLog net472 4.11μs 1.79ns 6.95ns 0.31 0 0 1.96 KB
Benchmarks.Trace.SpanBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master StartFinishSpan net6.0 441ns 0.694ns 2.69ns 0.00751 0 0 536 B
master StartFinishSpan netcoreapp3.1 575ns 0.192ns 0.693ns 0.00721 0 0 536 B
master StartFinishSpan net472 621ns 0.285ns 1.07ns 0.0852 0 0 538 B
master StartFinishScope net6.0 551ns 0.176ns 0.657ns 0.00916 0 0 656 B
master StartFinishScope netcoreapp3.1 773ns 0.459ns 1.78ns 0.00888 0 0 656 B
master StartFinishScope net472 847ns 0.321ns 1.24ns 0.098 0 0 618 B
#4302 StartFinishSpan net6.0 455ns 0.304ns 1.18ns 0.00747 0 0 536 B
#4302 StartFinishSpan netcoreapp3.1 579ns 0.308ns 1.19ns 0.00712 0 0 536 B
#4302 StartFinishSpan net472 597ns 0.136ns 0.528ns 0.0851 0 0 538 B
#4302 StartFinishScope net6.0 546ns 0.301ns 1.17ns 0.00931 0 0 656 B
#4302 StartFinishScope netcoreapp3.1 751ns 0.324ns 1.26ns 0.00868 0 0 656 B
#4302 StartFinishScope net472 868ns 0.341ns 1.32ns 0.0978 0 0 618 B
Benchmarks.Trace.TraceAnnotationsBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master RunOnMethodBegin net6.0 563ns 0.244ns 0.946ns 0.00937 0 0 656 B
master RunOnMethodBegin netcoreapp3.1 811ns 0.538ns 2.01ns 0.00888 0 0 656 B
master RunOnMethodBegin net472 887ns 0.283ns 1.02ns 0.098 0 0 618 B
#4302 RunOnMethodBegin net6.0 558ns 0.173ns 0.649ns 0.0093 0 0 656 B
#4302 RunOnMethodBegin netcoreapp3.1 787ns 0.605ns 2.26ns 0.00876 0 0 656 B
#4302 RunOnMethodBegin net472 926ns 0.233ns 0.903ns 0.0979 0 0 618 B


if (enumerableDic is IDictionary)
{
var typeKVP = typeof(KeyValuePair<TKey, TValue>);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd cache the typeofs in static fields to avoid the Reflection slugginess

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typeof is already performant, in IL this is transformed into a ldtoken and then a GetTypeFromHandle that is really fast because is only a getter to the runtime handle field. https://source.dot.net/#System.Private.CoreLib/src/System/Type.CoreCLR.cs,97

So I don't think caching is required here.

@andrewlock
Copy link
Member

andrewlock commented Jul 18, 2023

Throughput/Crank Report:zap:

Throughput results for AspNetCoreSimpleController comparing the following branches/commits:

Cases where throughput results for the PR are worse than latest master (5% drop or greater), results are shown in red.

Note that these results are based on a single point-in-time result for each branch. For full results, see one of the many, many dashboards!

gantt
    title Throughput Linux x64 (Total requests) 
    dateFormat  X
    axisFormat %s
    section Baseline
    This PR (4302) (11.636M)   : 0, 11635977
    master (11.433M)   : 0, 11433406
    benchmarks/2.37.0 (11.517M)   : 0, 11517370
    benchmarks/2.9.0 (11.506M)   : 0, 11506345

    section Automatic
    This PR (4302) (8.180M)   : 0, 8180319
    master (8.114M)   : 0, 8114165
    benchmarks/2.37.0 (8.143M)   : 0, 8142506
    benchmarks/2.9.0 (8.280M)   : 0, 8279564

    section Trace stats
    master (8.118M)   : 0, 8118262
    benchmarks/2.37.0 (8.044M)   : 0, 8044129

    section Manual
    This PR (4302) (10.253M)   : 0, 10253315
    master (10.339M)   : 0, 10338672
    benchmarks/2.37.0 (10.278M)   : 0, 10278226

    section Manual + Automatic
    This PR (4302) (7.828M)   : 0, 7827523
    master (7.711M)   : 0, 7710676
    benchmarks/2.37.0 (7.786M)   : 0, 7785678

    section Version Conflict
    master (7.078M)   : 0, 7077875
    benchmarks/2.37.0 (7.067M)   : 0, 7066760

Loading
gantt
    title Throughput Linux arm64 (Total requests) 
    dateFormat  X
    axisFormat %s
    section Baseline
    This PR (4302) (9.637M)   : 0, 9636754
    master (9.576M)   : 0, 9576012
    benchmarks/2.37.0 (9.619M)   : 0, 9618848
    benchmarks/2.9.0 (9.542M)   : 0, 9542061

    section Automatic
    This PR (4302) (6.699M)   : 0, 6698584
    master (6.687M)   : 0, 6687360
    benchmarks/2.37.0 (6.844M)   : 0, 6843850

    section Trace stats
    master (6.893M)   : 0, 6893377
    benchmarks/2.37.0 (6.801M)   : 0, 6801072

    section Manual
    This PR (4302) (8.542M)   : 0, 8541678
    master (8.538M)   : 0, 8538336
    benchmarks/2.37.0 (8.587M)   : 0, 8587283

    section Manual + Automatic
    This PR (4302) (6.617M)   : 0, 6616821
    master (6.448M)   : 0, 6448420
    benchmarks/2.37.0 (6.537M)   : 0, 6536939

    section Version Conflict
    master (5.788M)   : 0, 5788108
    benchmarks/2.37.0 (5.769M)   : 0, 5768881

Loading
gantt
    title Throughput Windows x64 (Total requests) 
    dateFormat  X
    axisFormat %s
    section Baseline
    This PR (4302) (9.126M)   : 0, 9125559
    master (9.569M)   : 0, 9569061
    benchmarks/2.37.0 (9.309M)   : 0, 9308797
    benchmarks/2.9.0 (9.366M)   : 0, 9366476

    section Automatic
    This PR (4302) (6.739M)   : 0, 6739382
    master (6.663M)   : 0, 6662702
    benchmarks/2.37.0 (6.552M)   : 0, 6552099
    benchmarks/2.9.0 (6.864M)   : 0, 6864235

    section Trace stats
    master (6.622M)   : 0, 6622086
    benchmarks/2.37.0 (6.685M)   : 0, 6685340

    section Manual
    This PR (4302) (8.200M)   : 0, 8199967
    master (8.316M)   : 0, 8316117
    benchmarks/2.37.0 (8.493M)   : 0, 8492505

    section Manual + Automatic
    This PR (4302) (6.391M)   : 0, 6391465
    master (6.415M)   : 0, 6414915
    benchmarks/2.37.0 (6.373M)   : 0, 6372588

    section Version Conflict
    master (5.818M)   : 0, 5817640
    benchmarks/2.37.0 (5.938M)   : 0, 5937795

Loading
gantt
    title Throughput Linux x64 (ASM) (Total requests) 
    dateFormat  X
    axisFormat %s
    section Baseline
    This PR (4302) (7.369M)   : 0, 7369404
    master (7.389M)   : 0, 7388844
    benchmarks/2.37.0 (7.239M)   : 0, 7239363
    benchmarks/2.9.0 (7.508M)   : 0, 7508023

    section No attack
    This PR (4302) (2.140M)   : 0, 2139796
    master (2.036M)   : 0, 2036034
    benchmarks/2.37.0 (2.105M)   : 0, 2105389
    benchmarks/2.9.0 (3.275M)   : 0, 3275098

    section Attack
    This PR (4302) (1.762M)   : 0, 1762035
    master (1.644M)   : 0, 1643534
    benchmarks/2.37.0 (1.686M)   : 0, 1685550
    benchmarks/2.9.0 (2.570M)   : 0, 2570112

    section Blocking
    This PR (4302) (3.456M)   : 0, 3456302
    master (3.180M)   : 0, 3179669
    benchmarks/2.37.0 (3.048M)   : 0, 3048473

Loading

void EnumerateItems<TKeySource, TValueSource>()
where TKeySource : notnull
{
var itemData = childrenData;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's better to pass captured vars as params in order to avoid extra heap allocations?
https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/classes-and-structs/local-functions#heap-allocations

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, but if we want to avoid any captured vars in this one, it really damages readability, we would have to pass childrenData, enumerableDic, getKey,getValue, argToFree, applySafetyLimits, remainingDepth, pool`...
and wouldnt the copy of all these arguments balance out the capture? I dont know..
but as we're already gaining a lot in allocations/speed in this PR (see waf benchmark above), maybe this could be kept as is, for the sake of readability?

Copy link
Contributor

@daniel-romano-DD daniel-romano-DD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work

@anna-git anna-git force-pushed the tony/improve-marshalling-for-waf branch from c3a2411 to 7260172 Compare September 14, 2023 10:21
@datadog-ddstaging
Copy link

datadog-ddstaging bot commented Sep 14, 2023

Datadog Report

Branch report: tony/improve-marshalling-for-waf
Commit report: a6c44b7

dd-trace-dotnet: 0 Failed, 0 New Flaky, 302174 Passed, 1154 Skipped, 29m 25.6s Wall Time

@anna-git anna-git force-pushed the tony/improve-marshalling-for-waf branch 3 times, most recently from 345f76f to c5e5822 Compare September 14, 2023 13:10
@anna-git anna-git force-pushed the tony/improve-marshalling-for-waf branch from c5e5822 to 97b27b5 Compare September 14, 2023 15:36
@anna-git anna-git force-pushed the tony/improve-marshalling-for-waf branch from 97b27b5 to 51b4a45 Compare September 15, 2023 06:20
Copy link
Member

@andrewlock andrewlock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM on the tracer side, but I didn't look at the ASM side

var pool = Encoder.Pool;
var pwArgs = Encoder.Encode(addresses, applySafetyLimits: true, argToFree: _argCache, pool: pool);
code = _waf.Run(_contextHandle, ref pwArgs, ref retNative, timeoutMicroSeconds);
pool.Return(_argCache);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should pool.Return() (and _argCache.Clear();) be in a try{}finally{} to be on the safe side?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in a6c44b7

Comment on lines 50 to 51
var items = _items;
var length = _length;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering, given these variables are both readonly, is their much benefit to the local copy here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes thanks!, Tony checked, and removing the copy makes the generated code shorter, fixed in a6c44b7


namespace Datadog.Trace.Util;

internal unsafe class UnmanagedMemoryPool : IDisposable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add comments to this type and the Rent() methods etc (at the very least) about the fact that this type is not thread safe, and should only be used as such with [ThreadStatic]. Because the equivalent BCL types are thread safe, so this could cause some interesting issues if used incorrectly 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes! added in a6c44b7

@anna-git anna-git changed the title New marshalling system for Waf.Run calls to improve speed and reduce allocations [ASM] New marshalling system for Waf.Run calls to improve speed and reduce allocations Sep 16, 2023
@anna-git anna-git merged commit 096f106 into master Sep 16, 2023
11 checks passed
@anna-git anna-git deleted the tony/improve-marshalling-for-waf branch September 16, 2023 11:45
@github-actions github-actions bot added this to the vNext milestone Sep 16, 2023
pull bot pushed a commit to astradot/dd-trace-dotnet that referenced this pull request Nov 20, 2023
DataDog#4891)

* Revert "[ASM] New marshalling system for Waf.Run calls to improve speed and reduce allocations (DataDog#4302)"

This reverts commit 096f106.

* fix schema test by syncing with master
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants