Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] Push native debug symbols to Datadog Symbolication server #6081

Merged
merged 2 commits into from
Sep 26, 2024

Conversation

gleocadie
Copy link
Collaborator

@gleocadie gleocadie commented Sep 25, 2024

Summary of changes

Pushing native debug symbols to datadog infra.

Reason for change

Reports sent by the crash tracker (when application is crashing) contains unresolved callstacks. When every pieces of the puzzle will be in place, the backend will be able to resolve the instruction pointers and by pushing our native debug symbols to datadog infra, we make sure that instruction pointers related to our native binaries will be correctly resolved.

Implementation details

  • Create a github action in charge of setting every things we need to send the symbols to the backend (setup datadog-ci and call it)
  • Add a manual workflow, so we can push old version
  • Update create release workflow to use the actions

Test coverage

Run multiple jobs successfully https://github.com/DataDog/dd-trace-dotnet/actions/runs/11035801336

Other details

@datadog-ddstaging
Copy link

datadog-ddstaging bot commented Sep 25, 2024

Datadog Report

Branch report: andrew/symbols-gitlab-job
Commit report: bdbc161
Test service: dd-trace-dotnet

✅ 0 Failed, 150169 Passed, 448 Skipped, 1h 58m 59.9s Total Time

@andrewlock
Copy link
Member

andrewlock commented Sep 25, 2024

Execution-Time Benchmarks Report ⏱️

Execution-time results for samples comparing the following branches/commits:

Execution-time benchmarks measure the whole time it takes to execute a program. And are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are shown in red. The following thresholds were used for comparing the execution times:

  • Welch test with statistical test for significance of 5%
  • Only results indicating a difference greater than 5% and 5 ms are considered.

Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard.

Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph).

gantt
    title Execution time (ms) FakeDbCommand (.NET Framework 4.6.2) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6081) - mean (71ms)  : 68, 73
     .   : milestone, 71,
    master - mean (71ms)  : 68, 73
     .   : milestone, 71,

    section CallTarget+Inlining+NGEN
    This PR (6081) - mean (1,107ms)  : 1087, 1126
     .   : milestone, 1107,
    master - mean (1,110ms)  : 1087, 1132
     .   : milestone, 1110,

Loading
gantt
    title Execution time (ms) FakeDbCommand (.NET Core 3.1) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6081) - mean (110ms)  : 106, 113
     .   : milestone, 110,
    master - mean (110ms)  : 107, 112
     .   : milestone, 110,

    section CallTarget+Inlining+NGEN
    This PR (6081) - mean (772ms)  : 755, 789
     .   : milestone, 772,
    master - mean (775ms)  : 759, 791
     .   : milestone, 775,

Loading
gantt
    title Execution time (ms) FakeDbCommand (.NET 6) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6081) - mean (93ms)  : 90, 96
     .   : milestone, 93,
    master - mean (94ms)  : 90, 97
     .   : milestone, 94,

    section CallTarget+Inlining+NGEN
    This PR (6081) - mean (730ms)  : 714, 746
     .   : milestone, 730,
    master - mean (734ms)  : 713, 756
     .   : milestone, 734,

Loading
gantt
    title Execution time (ms) HttpMessageHandler (.NET Framework 4.6.2) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6081) - mean (191ms)  : 188, 194
     .   : milestone, 191,
    master - mean (191ms)  : 188, 194
     .   : milestone, 191,

    section CallTarget+Inlining+NGEN
    This PR (6081) - mean (1,205ms)  : 1182, 1227
     .   : milestone, 1205,
    master - mean (1,199ms)  : 1176, 1222
     .   : milestone, 1199,

Loading
gantt
    title Execution time (ms) HttpMessageHandler (.NET Core 3.1) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6081) - mean (278ms)  : 272, 283
     .   : milestone, 278,
    master - mean (277ms)  : 272, 282
     .   : milestone, 277,

    section CallTarget+Inlining+NGEN
    This PR (6081) - mean (942ms)  : 922, 963
     .   : milestone, 942,
    master - mean (941ms)  : 920, 962
     .   : milestone, 941,

Loading
gantt
    title Execution time (ms) HttpMessageHandler (.NET 6) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6081) - mean (266ms)  : 262, 271
     .   : milestone, 266,
    master - mean (266ms)  : 263, 270
     .   : milestone, 266,

    section CallTarget+Inlining+NGEN
    This PR (6081) - mean (930ms)  : 906, 955
     .   : milestone, 930,
    master - mean (930ms)  : 910, 951
     .   : milestone, 930,

Loading

@gleocadie gleocadie force-pushed the andrew/symbols-gitlab-job branch 15 times, most recently from bdbc161 to cdde660 Compare September 25, 2024 15:28
@github-actions github-actions bot added the area:builds project files, build scripts, pipelines, versioning, releases, packages label Sep 25, 2024
@gleocadie gleocadie changed the title Andrew/symbols gitlab job [CI] Push native debug symbols to Datadog Symbolication server Sep 25, 2024
Copy link
Member

@andrewlock andrewlock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just a couple of small questions about verifying that the action has succeeded.

Once we've established that this does describe failures correctly and/or we know how to verify it worked, we should add a note about this to the release notes (just after the "run the create_drafe_release" section) to describe what happens, and how to remediate (running the workflow manually)

Comment on lines +109 to +120
- name: Prepare debug symbols for publishing
shell: bash
run: |
mkdir debug_symbols
tar zxcf linux-native-symbols.tar.gz -C debug_symbols

- name: 'Push debug symbols to datadog'
uses: ./.github/actions/publish-debug-symbols
with:
symbols_folder: "debug_symbols"
preprod_key: "${{ secrets.DD_PREPROD_API_KEY }}"
prod_key: "${{ secrets.DD_API_KEY }}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: is it better to do this before or after we publish to NuGet/create the release? 🤔

If before, then if there's any issues the good news is we can correct them first, and we can't create releases without symbols. The bad news is it would block the release if there's a problem, or if datadog is down or something 🤔

I think I'm inclined to go with your approach, especially as we can also do this manually using the other action if there is a problem later 🤔

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

workflow_dispatch:
inputs:
version:
description: 'An already released version of .NET APM'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
description: 'An already released version of .NET APM'
description: 'An already released version of .NET APM, without the "v" suffix, e.g. "3.3.1"'

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️

run: |
if [ -n "${{ inputs.prod_key }}" ]; then
echo "Push symbols to prod env"
DATADOG_API_KEY="${{ inputs.prod_key }}" DD_BETA_COMMANDS_ENABLED=1 datadog-ci elf-symbols upload ${{inputs.symbols_folder}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this fails, does it fail the action? We should try to test/make sure that it does, so that we know if/when this fails (and can remediate by running the action manually).

Also, is there any way for us to verify inside datadog that these have been uploaded correctly, whether that's automatic or manual?

Copy link
Collaborator Author

@gleocadie gleocadie Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes if it fails, the action/job is marked as failed https://github.com/DataDog/dd-trace-dotnet/actions/runs/11052646040

Also, is there any way for us to verify inside datadog that these have been uploaded correctly, whether that's automatic or manual?

Not that I know of. I have been told that if datadog-ci succeeds the symbols will be there ~5-10min.
Besides, It requires specific rights to query the service (AWS bucket) and not sure it's possible outside of Datadog. I'll see with Nicolas Savoir on that topic.

@gleocadie gleocadie marked this pull request as ready for review September 26, 2024 10:33
@gleocadie gleocadie requested a review from a team as a code owner September 26, 2024 10:33
@gleocadie gleocadie merged commit f64cde1 into master Sep 26, 2024
93 of 96 checks passed
@gleocadie gleocadie deleted the andrew/symbols-gitlab-job branch September 26, 2024 13:59
@github-actions github-actions bot added this to the vNext-v3 milestone Sep 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:builds project files, build scripts, pipelines, versioning, releases, packages
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants