Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix TestDockerStart flaky test #21681

Merged
merged 7 commits into from
Oct 19, 2020

Conversation

jsoriano
Copy link
Member

@jsoriano jsoriano commented Oct 8, 2020

Some changes are done to give more resilience to the test:

  • Wait till image pull is finished, and retry in case of failure.
  • Checked events are filtered by container id instead of image name,
    so tests are not affected by other containers that may be running
    in the system.
  • Check timeout is for all events now, instead of being reset after an
    event is received.
  • Container is removed after test is finished.

Fixes #20360.

Some changes are done to give more resilience to the test:
* Wait till image pull is finished, and retry in case of failure.
* Checked events are filtered by container id instead of image name,
  so tests are not affected by other containers that may be running
  in the system.
* Check timeout is for all events now, instead of being reset after an
  event is received.
* Container is removed after test is finished.
@jsoriano jsoriano added flaky-test Unstable or unreliable test cases. Team:Platforms Label for the Integrations - Platforms team labels Oct 8, 2020
@jsoriano jsoriano self-assigned this Oct 8, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations-platforms (Team:Platforms)

@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Oct 8, 2020
@elasticmachine
Copy link
Collaborator

elasticmachine commented Oct 8, 2020

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: [Pull request #21681 updated]

  • Start Time: 2020-10-19T13:45:45.130+0000

  • Duration: 93 min 54 sec

Test stats 🧪

Test Results
Failed 0
Passed 16369
Skipped 1342
Total 17711

@jsoriano
Copy link
Member Author

jsoriano commented Oct 8, 2020

jenkins run the tests again please

@jsoriano jsoriano added needs_backport PR is waiting to be backported to other branches. v7.11.0 labels Oct 9, 2020
Copy link
Member

@ChrsMark ChrsMark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. Just left a suggestion/question to evaluate. No blocking though.

// Image already available, do nothing
return nil
}
for retry := 0; retry < 3; retry++ {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we could make it configurable like having this number of retries to be a function variable. Also maybe to add sleep time between each of the retries, however not sure if this is needed at all since not familiar with the nature of the errors that may occur here.

Suggestion would look like:
imagePullWithRetry(image string, retries int, interval int)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion, but I think this wouldn't be needed by now, and in any case we wouldn't expose these settings in the public methods.
I would prefer to add them if we need it at some moment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@jsoriano jsoriano merged commit a79dddc into elastic:master Oct 19, 2020
@jsoriano jsoriano deleted the fix-flaky-test-docker-start branch October 19, 2020 15:44
jsoriano added a commit to jsoriano/beats that referenced this pull request Oct 19, 2020
Some changes are done to give more resilience to the test:
* Wait till image pull is finished, and retry in case of failure.
* Checked events are filtered by container id instead of image name,
  so tests are not affected by other containers that may be running
  in the system.
* Check timeout is for all events now, instead of being reset after an
  event is received.
* Container is removed after test is finished.

(cherry picked from commit a79dddc)
@jsoriano jsoriano removed the needs_backport PR is waiting to be backported to other branches. label Oct 19, 2020
jsoriano added a commit that referenced this pull request Oct 19, 2020
Some changes are done to give more resilience to the test:
* Wait till image pull is finished, and retry in case of failure.
* Checked events are filtered by container id instead of image name,
  so tests are not affected by other containers that may be running
  in the system.
* Check timeout is for all events now, instead of being reset after an
  event is received.
* Container is removed after test is finished.

(cherry picked from commit a79dddc)
v1v added a commit to v1v/beats that referenced this pull request Oct 21, 2020
* upstream/master:
  feat: package aliases for snapshots (elastic#21960)
  [DOC] Add firewall as possible troubleshooting issue (elastic#21743)
  [Filebeat] Add max_number_of_messages config parameter for S3 input (elastic#21993)
  [Elastic Agent] Fix missing elastic_agent event data  (elastic#21994)
  Document auditbeat system process module config (elastic#21766)
  Update links (elastic#22012)
  dynamically find librpm (elastic#21936)
  Fix Istio docs (elastic#22019)
  [beats-tester][packaging] store packages in another location (elastic#21903)
  [Kubernetes] Remove redundant dockersock volume mount (elastic#22009)
  [Ingest Manager] Always try snapshot repo for agent upgrade (elastic#21951)
  Azure storage metricset values not inside the metricset name (elastic#21845)
  fix diskio and memory bugs under windows (elastic#21992)
  Fix TestDockerStart flaky test (elastic#21681)
  filebeat: add SSL options to checkpoint module (elastic#19560)
  Stop storing stateless kubernetes keystores (elastic#21880)
  [Elastic Agent] Fix named pipe communication on Windows 7 (elastic#21931)
  [Elastic Agent] Fix index for Agent monitoring to to elastic_agent. (elastic#21932)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flaky-test Unstable or unreliable test cases. Team:Platforms Label for the Integrations - Platforms team v7.11.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TestDockerStart flaky test
3 participants