-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CWS] make use of the remote workloadmeta in the system-probe #28070
Conversation
Go Package Import DifferencesBaseline: 33facf6
|
c736171
to
a6fc50b
Compare
Test changes on VMUse this command from test-infra-definitions to manually test this PR changes on a VM: inv create-vm --pipeline-id=43006925 --os-family=ubuntu Note: This applies to commit e06bd03 |
c61bc98
to
29092de
Compare
Regression DetectorRegression Detector ResultsRun ID: 1bc095e7-64a4-43fc-a350-69f29116af9a Metrics dashboard Target profiles Baseline: 33facf6 Performance changes are noted in the perf column of each table:
Significant changes in experiment optimization goalsConfidence level: 90.00%
|
perf | experiment | goal | Δ mean % | Δ mean % CI | links |
---|---|---|---|---|---|
➖ | uds_dogstatsd_to_api_cpu | % cpu utilization | +1.45 | [+0.49, +2.42] | Logs |
➖ | pycheck_lots_of_tags | % cpu utilization | +1.13 | [-1.43, +3.70] | Logs |
➖ | file_tree | memory utilization | +1.01 | [+0.95, +1.08] | Logs |
➖ | idle | memory utilization | +0.51 | [+0.47, +0.54] | Logs |
➖ | basic_py_check | % cpu utilization | +0.49 | [-2.18, +3.16] | Logs |
➖ | uds_dogstatsd_to_api | ingress throughput | -0.00 | [-0.00, +0.00] | Logs |
➖ | tcp_dd_logs_filter_exclude | ingress throughput | -0.00 | [-0.01, +0.01] | Logs |
➖ | otel_to_otel_logs | ingress throughput | -0.91 | [-1.71, -0.10] | Logs |
❌ | tcp_syslog_to_blackhole | ingress throughput | -12.91 | [-24.87, -0.95] | Logs |
Bounds Checks
perf | experiment | bounds_check_name | replicates_passed |
---|---|---|---|
✅ | idle | memory_usage | 10/10 |
Explanation
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
04645dc
to
56b736b
Compare
56b736b
to
76bf59e
Compare
Does this mean the workloadmeta component will be created, even if not used? Is it only CWS that requires this component? |
76bf59e
to
e3dc307
Compare
@brycekahle yes.. sadly it means that it will always be there. But since it's a dependency of the event monitor (not just CWS), I think it might be fine since it's enabled by default for other major system probe products. I would be happy to hear a solution to load it only if need honestly, I just didn't found one. My testing on system-probe, but also on the security agent side where it's already in use shows that the impact at runtime is really low. |
35d0549
to
3cc346d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏻 LGTM for files owned by @DataDog/container-integrations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for @DataDog/apm-onboarding
7b017a2
to
56bcfa7
Compare
56bcfa7
to
a8d764e
Compare
/merge |
🚂 MergeQueue: pull request added to the queue The median merge time in Use |
❌ MergeQueue: The build pipeline contains failing jobs for this merge request Build pipeline has failing jobs for 25b8f70: What to do next?
DetailsSince those jobs are not marked as being allowed to fail, the pipeline will most likely fail. If you need support, contact us on Slack #devflow with those details! |
a8d764e
to
2c5af03
Compare
b277b9b
to
e06bd03
Compare
/merge |
🚂 MergeQueue: waiting for PR to be ready This merge request is not mergeable yet, because of pending checks/missing approvals. It will be added to the queue as soon as checks pass and/or get approvals. Use |
🚂 MergeQueue: pull request added to the queue The median merge time in Use |
What does this PR do?
This PR lays the foundation of being able to use the remote workloadmeta store in the system probe. This PR only allows the remote workloadmeta store to be used, since in most deployments the system probe cannot really maintain a working local store (cannot access the containerd socket, some configuration flags are only applied to the core agent etc).
Most of the changes from this PR are related to the transition from an optional workloadmeta component to a non-optional one.
The main motivations behind this change are:
containers_running
CWS telemetry in the system-probe instead of the security agentMotivation
Additional Notes
Possible Drawbacks / Trade-offs
Describe how to test/QA your changes