Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent the native loader from being unloaded while sending telemetry (#5944 => V2) #5957

Merged
merged 1 commit into from
Aug 28, 2024

Conversation

kevingosse
Copy link
Collaborator

This is a backport of #5944 to v2.

Summary of changes

Prevent the native loader from being unloaded while sending telemetry.

Reason for change

We send telemetry when we decide not to instrument a process (for instance because of an EOL runtime). To send the telemetry, we spawn the telemetry forwarder. This is done in a background thread to avoid blocking the startup of the process. However, .NET unloads the profiler when the CorProfilerInfo::Initialize method returns, even if the background thread is still running, causing a segfault.

Implementation details

We increment the reference count of the module to prevent it from being unloaded by the .NET runtime.

Two different implementations:

  • On Windows, we use GetModuleHandleEx to increment the reference count. Then, when we're done sending the telemetry, we use FreeLibraryAndExitThread to safely unload the module.
  • On Linux, we use dlopen to increment the reference count. Unfortunately there is no safe way to unload the module from within itself, so we keep it in memory.

If for some reason we failed to increment the reference count, we give up on sending telemetry.

Test coverage

OnEolFrameworkInSsi_WhenForwarderPathExists_CallsForwarderWithExpectedTelemetry segfaults on 3.0 without this change.

…#5944)

## Summary of changes

Prevent the native loader from being unloaded while sending telemetry.

## Reason for change

We send telemetry when we decide not to instrument a process (for
instance because of an EOL runtime). To send the telemetry, we spawn the
telemetry forwarder. This is done in a background thread to avoid
blocking the startup of the process. However, .NET unloads the profiler
when the `CorProfilerInfo::Initialize` method returns, even if the
background thread is still running, causing a segfault.

## Implementation details

We increment the reference count of the module to prevent it from being
unloaded by the .NET runtime.

Two different implementations:
- On Windows, we use `GetModuleHandleEx` to increment the reference
count. Then, when we're done sending the telemetry, we use
`FreeLibraryAndExitThread` to safely unload the module.
- On Linux, we use `dlopen` to increment the reference count.
Unfortunately there is no safe way to unload the module from within
itself, so we keep it in memory.

If for some reason we failed to increment the reference count, we give
up on sending telemetry.

## Test coverage


`OnEolFrameworkInSsi_WhenForwarderPathExists_CallsForwarderWithExpectedTelemetry`
segfaults on 3.0 without this change.
@datadog-ddstaging
Copy link

datadog-ddstaging bot commented Aug 28, 2024

Datadog Report

Branch report: kevin/safe_unload_v2
Commit report: d30842a
Test service: dd-trace-dotnet

✅ 0 Failed, 346867 Passed, 2080 Skipped, 15h 44m 58.52s Total Time
⌛ 14 Performance Regressions

⌛ Performance Regressions vs Default Branch (14)

This report shows up to 5 performance regressions.

  • Profiler_cpu_walltime_old_stackwalk - scenarios 3.22s (+876.27ms, +37%) - Details
  • Profiler_cpu_walltime - scenarios 3.22s (+877.41ms, +37%) - Details
  • Profiler_walltime - scenarios 3.23s (+889.51ms, +38%) - Details
  • Profiler_cpu_walltime_timer_create - scenarios 3.24s (+902.34ms, +39%) - Details
  • Profiler_liveheap - scenarios 4.22s (+1.04s, +33%) - Details

@andrewlock
Copy link
Member

Execution-Time Benchmarks Report ⏱️

Execution-time results for samples comparing the following branches/commits:

Execution-time benchmarks measure the whole time it takes to execute a program. And are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are shown in red. The following thresholds were used for comparing the execution times:

  • Welch test with statistical test for significance of 5%
  • Only results indicating a difference greater than 5% and 5 ms are considered.

Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard.

Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph).

gantt
    title Execution time (ms) FakeDbCommand (.NET Framework 4.6.2) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (5957) - mean (76ms)  : 67, 86
     .   : milestone, 76,

    section CallTarget+Inlining+NGEN
    This PR (5957) - mean (1,027ms)  : 1009, 1045
     .   : milestone, 1027,

Loading
gantt
    title Execution time (ms) FakeDbCommand (.NET Core 3.1) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (5957) - mean (109ms)  : 105, 113
     .   : milestone, 109,

    section CallTarget+Inlining+NGEN
    This PR (5957) - mean (707ms)  : 686, 728
     .   : milestone, 707,

Loading
gantt
    title Execution time (ms) FakeDbCommand (.NET 6) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (5957) - mean (93ms)  : 90, 96
     .   : milestone, 93,

    section CallTarget+Inlining+NGEN
    This PR (5957) - mean (664ms)  : 643, 684
     .   : milestone, 664,

Loading
gantt
    title Execution time (ms) HttpMessageHandler (.NET Framework 4.6.2) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (5957) - mean (190ms)  : 187, 193
     .   : milestone, 190,

    section CallTarget+Inlining+NGEN
    This PR (5957) - mean (1,109ms)  : 1090, 1129
     .   : milestone, 1109,

Loading
gantt
    title Execution time (ms) HttpMessageHandler (.NET Core 3.1) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (5957) - mean (276ms)  : 271, 282
     .   : milestone, 276,

    section CallTarget+Inlining+NGEN
    This PR (5957) - mean (883ms)  : 864, 902
     .   : milestone, 883,

Loading
gantt
    title Execution time (ms) HttpMessageHandler (.NET 6) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (5957) - mean (264ms)  : 259, 269
     .   : milestone, 264,

    section CallTarget+Inlining+NGEN
    This PR (5957) - mean (863ms)  : 843, 883
     .   : milestone, 863,

Loading

@andrewlock andrewlock merged commit 466b242 into release/2.x Aug 28, 2024
68 of 73 checks passed
@andrewlock andrewlock deleted the kevin/safe_unload_v2 branch August 28, 2024 14:19
@github-actions github-actions bot added this to the vNext-v2 milestone Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants