Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NDM][NDMII-2748] test snmp tags on agent restart #26003

Merged
merged 11 commits into from
May 31, 2024

Conversation

jedupau
Copy link
Contributor

@jedupau jedupau commented May 28, 2024

What does this PR do?

Add an e2e following this PR
It tests that tags are stored on disk and so are still there after a restart of the agent

Motivation

Additional Notes

Possible Drawbacks / Trade-offs

Describe how to test/QA your changes

@jedupau jedupau added changelog/no-changelog team/network-device-monitoring qa/no-code-change Skip QA week as there is no code change in Agent code labels May 28, 2024
@jedupau jedupau requested review from a team as code owners May 28, 2024 09:50
.gitlab-ci.yml Outdated Show resolved Hide resolved
test/new-e2e/tests/ndm/snmp/snmp_test.go Outdated Show resolved Hide resolved
test/new-e2e/tests/ndm/snmp/snmp_test.go Outdated Show resolved Hide resolved
@pr-commenter
Copy link

pr-commenter bot commented May 28, 2024

Regression Detector

Regression Detector Results

Run ID: 8b342fca-5016-43a0-8974-69dbe3ddb794
Baseline: 7debf30
Comparison: 0cbe423

Performance changes are noted in the perf column of each table:

  • ✅ = significantly better comparison variant performance
  • ❌ = significantly worse comparison variant performance
  • ➖ = no significant change in performance

No significant changes in experiment optimization goals

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

There were no significant changes in experiment optimization goals at this confidence level and effect size tolerance.

Fine details of change detection per experiment

perf experiment goal Δ mean % Δ mean % CI
basic_py_check % cpu utilization +2.26 [-0.52, +5.03]
pycheck_1000_100byte_tags % cpu utilization +0.62 [-3.99, +5.23]
uds_dogstatsd_to_api_cpu % cpu utilization +0.34 [-2.55, +3.22]
idle memory utilization +0.18 [+0.14, +0.22]
otel_to_otel_logs ingress throughput +0.02 [-0.34, +0.38]
trace_agent_json ingress throughput +0.00 [-0.01, +0.01]
trace_agent_msgpack ingress throughput +0.00 [-0.00, +0.00]
tcp_dd_logs_filter_exclude ingress throughput -0.01 [-0.04, +0.03]
uds_dogstatsd_to_api ingress throughput -0.01 [-0.21, +0.19]
file_tree memory utilization -0.77 [-0.88, -0.66]
tcp_syslog_to_blackhole ingress throughput -3.19 [-24.12, +17.75]

Explanation

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

  1. Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.

  2. Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.

  3. Its configuration does not mark it "erratic".

@jedupau jedupau force-pushed the jed/test-tags-on-agent-restart branch from 5e1c2ea to 69c9d21 Compare May 28, 2024 12:08
@agent-platform-auto-pr
Copy link
Contributor

agent-platform-auto-pr bot commented May 28, 2024

[Fast Unit Tests Report]

On pipeline 35364259 (CI Visibility). The following jobs did not run any unit tests:

Jobs:
  • tests_deb-arm64-py3
  • tests_deb-x64-py3
  • tests_flavor_dogstatsd_deb-x64
  • tests_flavor_heroku_deb-x64
  • tests_flavor_iot_deb-x64
  • tests_rpm-arm64-py3
  • tests_rpm-x64-py3
  • tests_windows-x64

If you modified Go files and expected unit tests to run in these jobs, please double check the job logs. If you think tests should have been executed reach out to #agent-developer-experience

@jedupau jedupau removed the request for review from a team May 28, 2024 12:40
@pr-commenter
Copy link

pr-commenter bot commented May 28, 2024

Test changes on VM

Use this command from test-infra-definitions to manually test this PR changes on a VM:

inv create-vm --pipeline-id=35364259 --os-family=ubuntu

@jedupau jedupau requested review from a team as code owners May 28, 2024 13:33
@@ -322,3 +322,12 @@ new-e2e-ndm-netflow:
variables:
TARGETS: ./tests/ndm/netflow
TEAM: network-device-monitoring

new-e2e-ndm-snmp:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏 praise

@@ -4,4 +4,4 @@ init_config:
instances:
- ip_address: dd-snmp
port: 1161
community_string: "public"
community_string: "cisco-nexus"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥜 nitpick
You could update the config file name accordingly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed b80d861

require.NoError(s.T(), err)

initialTags := initialMetrics[0].Tags
_, err = s.Env().RemoteHost.Execute("docker stop dd-snmp")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💬 suggestion
Should be possible to run the request using the Docker client in

Docker *components.RemoteHostDocker

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be tested:

// stop
s.Env().Docker.GetClient().ContainerStop(context.Background(), "dd-snmp", container.StopOptions{})

// restart
s.Env().Docker.GetClient().ContainerRestart(context.Background(), s.Env().Agent.ContainerName, container.StopOptions{})

But I think you need to get the ID from the container name

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked and indeed you would need to get the container ID, will create a card to provide this through the Docker client, we can generate a list of container ids by their name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I tried before to get the list of container ids filtered by the name and then get the first one from the list but it is maybe easier like this

Comment on lines 165 to 167
initialMetrics, err := fakeintake.FilterMetrics("snmp.device.reachable",
client.WithTags[*aggregator.MetricSeries]([]string{}),
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

options are optional.
An empty list of filtering tags should be equivalent to not specifying any list of filtering tags at all.

Suggested change
initialMetrics, err := fakeintake.FilterMetrics("snmp.device.reachable",
client.WithTags[*aggregator.MetricSeries]([]string{}),
)
initialMetrics, err := fakeintake.FilterMetrics("snmp.device.reachable")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed 45ae6cc

Comment on lines 186 to 188
metrics, err = fakeintake.FilterMetrics("snmp.device.reachable",
client.WithTags[*aggregator.MetricSeries]([]string{}),
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same remark as above.

Suggested change
metrics, err = fakeintake.FilterMetrics("snmp.device.reachable",
client.WithTags[*aggregator.MetricSeries]([]string{}),
)
metrics, err = fakeintake.FilterMetrics("snmp.device.reachable")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed 45ae6cc


func (s *snmpDockerSuite) TestSnmpTagsAreStoredOnRestart() {
fakeintake := s.Env().FakeIntake.Client()
initialMetrics, err := fakeintake.FilterMetrics("snmp.device.reachable",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be safe from potentially flaky test failures I think FIlterMetrics() should be inside an EventuallyWithT().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed d8de7c0
also changed in that commit the require to assert to prevent the test from failing if the fakeintake responds an error


require.Zero(s.T(), metrics[0].Points[0].Value)
require.ElementsMatch(s.T(), tags, initialTags)
require.Len(s.T(), tags, 9)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider removing this check since the number of tags could change and it probably doesn't check anything beyond what was already checked by the previous ElementsMatch().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed 45ae6cc

_, err = s.Env().RemoteHost.Execute("docker stop dd-snmp")
require.NoError(s.T(), err)

err = fakeintake.FlushServerAndResetAggregators()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed both before and after the agent restart?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the first one ! 45ae6cc


require.EventuallyWithT(s.T(), func(t *assert.CollectT) {
initialMetrics, err = fakeintake.FilterMetrics("snmp.device.reachable")
assert.NoError(s.T(), err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will make the whole TestSnmpTagsAreStoredOnRestart test fail instead of retrying inside the EventuallyWithT block.
We need to target the CollectT of the EventuallyWithT function.

Suggested change
assert.NoError(s.T(), err)
assert.NoError(t, err)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed 0cbe423

@jedupau
Copy link
Contributor Author

jedupau commented May 31, 2024

/merge

@dd-devflow
Copy link

dd-devflow bot commented May 31, 2024

🚂 MergeQueue

Pull request added to the queue.

This build is going to start soon! (estimated merge in less than 28m)

Use /merge -c to cancel this operation!

@dd-mergequeue dd-mergequeue bot merged commit d946fb7 into main May 31, 2024
200 checks passed
@dd-mergequeue dd-mergequeue bot deleted the jed/test-tags-on-agent-restart branch May 31, 2024 08:22
@github-actions github-actions bot added this to the 7.55.0 milestone May 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog/no-changelog qa/no-code-change Skip QA week as there is no code change in Agent code team/network-device-monitoring
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants