Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the containers new-e2e tests send datadog events to help debugging #20508

Conversation

L3n41c
Copy link
Member

@L3n41c L3n41c commented Oct 30, 2023

What does this PR do?

Emit a datadog event each time a metric is asserted in the new-e2e test framework.

Motivation

Knowing that the agent is broken because an e2e test is failing is nice.
But knowing what it broken and why because e2e test failures are giving a maximum of information regarding the failure is better.

In that context, the containers e2e tests are already providing, at the end of their execution, a link to a dashboard containing all the metrics that are produced and asserted by the tests.

This PR adds an event overlay on top of those graph so that we can visually see when the assertion ran.

It is useful to visualize when a metric eventually show up after the retry loop ended, or when we do too many useless retries, etc.

Additional Notes

Here is what the dashboard looks like:
image

The events contain information about what was looked for and what was missing:
image

Possible Drawbacks / Trade-offs

Describe how to test/QA your changes

Reviewer's Checklist

  • If known, an appropriate milestone has been selected; otherwise the Triage milestone is set.
  • Use the major_change label if your change either has a major impact on the code base, is impacting multiple teams or is changing important well-established internals of the Agent. This label will be use during QA to make sure each team pay extra attention to the changed behavior. For any customer facing change use a releasenote.
  • A release note has been added or the changelog/no-changelog label has been applied.
  • Changed code has automated tests for its functionality.
  • Adequate QA/testing plan information is provided if the qa/skip-qa label is not applied.
  • At least one team/.. label has been applied, indicating the team(s) that should QA this change.
  • If applicable, docs team has been notified or an issue has been opened on the documentation repo.
  • If applicable, the need-change/operator and need-change/helm labels have been applied.
  • If applicable, the k8s/<min-version> label, indicating the lowest Kubernetes version compatible with this feature.
  • If applicable, the config template has been updated.

@L3n41c L3n41c added team/containers dev/testing changelog/no-changelog [deprecated] qa/skip-qa - use other qa/ labels [DEPRECATED] Please use qa/done or qa/no-code-change to skip creating a QA card labels Oct 30, 2023
@L3n41c L3n41c added this to the 7.50.0 milestone Oct 30, 2023
@L3n41c L3n41c requested review from a team as code owners October 30, 2023 15:35
Copy link
Contributor

@clamoriniere clamoriniere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome 🥇

@L3n41c L3n41c force-pushed the lenaic/CONTINT-52_CONTINT-258_CONTINT-3310_CONTINT-3314_new-e2e_send_events branch from da3a6be to 7180b8e Compare October 30, 2023 16:31
CONTINT-52, CONTINT-258, CONTINT-3310, CONTINT-3314
@L3n41c L3n41c force-pushed the lenaic/CONTINT-52_CONTINT-258_CONTINT-3310_CONTINT-3314_new-e2e_send_events branch from 7180b8e to 1fb9d12 Compare October 30, 2023 17:53
Copy link
Contributor

@pducolin pducolin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is amazing. Helps a lot with visibility of tests and debugging. Can we link it easily to CI Visibility ? I can investigate which tags the event need to share with the ci visibility payload to get the link under a test execution.

Currently test stack is cut which makes it hard to investigate issues from CI Visibility

suite.EventuallyWithTf(func(collect *assert.CollectT) {
myCollect := &myCollectT{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭 thought
Could possibly be skipped after stretchr/testify#1481 is merged

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not, I see we would not have access to collect errors.

@L3n41c L3n41c merged commit 0645a64 into main Oct 31, 2023
169 of 171 checks passed
@L3n41c L3n41c deleted the lenaic/CONTINT-52_CONTINT-258_CONTINT-3310_CONTINT-3314_new-e2e_send_events branch October 31, 2023 14:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog/no-changelog [deprecated] qa/skip-qa - use other qa/ labels [DEPRECATED] Please use qa/done or qa/no-code-change to skip creating a QA card dev/testing team/containers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants