Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip flaky TestActions on MacOSx #23966

Merged
merged 5 commits into from
Feb 23, 2021

Conversation

ChrsMark
Copy link
Member

What does this PR do?

This PR skips flaky test case reported at #23965.

Signed-off-by: chrismark <chrismarkou92@gmail.com>
@ChrsMark ChrsMark added flaky-test Unstable or unreliable test cases. Team:SIEM labels Feb 10, 2021
@ChrsMark ChrsMark requested a review from a team February 10, 2021 14:51
@elasticmachine
Copy link
Collaborator

Pinging @elastic/siem (Team:SIEM)

@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Feb 10, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

@elasticmachine
Copy link
Collaborator

elasticmachine commented Feb 10, 2021

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: Pull request #23966 updated

  • Start Time: 2021-02-22T21:26:02.761+0000

  • Duration: 32 min 16 sec

  • Commit: 9bc1d4f

Test stats 🧪

Test Results
Failed 0
Passed 1003
Skipped 217
Total 1220

Trends 🧪

Image of Build Times

Image of Tests

💚 Flaky test report

Tests succeeded.

Expand to view the summary

Test stats 🧪

Test Results
Failed 0
Passed 1003
Skipped 217
Total 1220

@ChrsMark ChrsMark requested a review from urso February 11, 2021 10:21
@ChrsMark
Copy link
Member Author

@urso this one affects master build, do you mind giving a +1 so as to proceed?

@@ -71,6 +71,10 @@ func TestActions(t *testing.T) {
t.Skip("Skipping flaky test: https://github.com/elastic/beats/issues/22518")
}

if runtime.GOOS == "darwin" {
t.Skip("Skipping flaky test: https://github.com/elastic/beats/issues/23965")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already a skip on windows with a different github issue. The two issues seem to be duplicates.

Skimming the test code I noticed that a go-routine is started that is supposed to write the files. The test seems to be sensitive to timing + there is a chance of the test leaking a go-routine.

Does the test fail due to timing constraints and slow IO? What happens to the test if we add a time.Sleep(2 * time.Minute) in the helper go-routine (on sleep before and after each line in the go-routine)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@urso I wonder why we should add the sleep within the goroutine and not just right after it so as to give it time to do the IO ops and then let the main test to check for the events. I actually applied this on the latest commit(b10fe6c) and seems to work.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't know how long we have to wait for the IO. The time.Sleep was for you to test if the test is too time sensitive... as evidence, not to fix the test.

e.g.

	go func() {
        time.Sleep(2 * time.Minute)
		ioutil.WriteFile(createdFilepath, []byte("hello world"), 0600)
        time.Sleep(2 * time.Minute)
		ioutil.WriteFile(updatedFilepath, []byte("hello world"), 0600)
        time.Sleep(2 * time.Minute)
	}()

If the test fails with these timeouts, it is a good indicator that the test is too sensitive to IO scheduler timings (which can easily block write operations for a minute if the system is overloaed).

If adding a time.Sleep after the go-routine really stabilizes the test the question is: why do we need to have the go-routine in the first place?

@andrewkroh Any idea if the go-routine is required for the test?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why there is a goroutine with only WriteFiles. I expected there to be a time.Sleep in the goroutine followed by the WriteFile similar to https://github.com/elastic/beats/blame/b10fe6cf95f19881705c8ac0899e4f1aad9db3bf/auditbeat/module/file_integrity/metricset_test.go#L49-L53. I think it was a basic way to schedule some activity to happen after the MetricSet is running.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. In that we should do the WriteFile without the go-routine.

Signed-off-by: chrismark <chrismarkou92@gmail.com>
@@ -143,6 +147,8 @@ func TestActions(t *testing.T) {
ioutil.WriteFile(updatedFilepath, []byte("hello world"), 0600)
}()

time.Sleep(2 * time.Minute)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better replace the go-routine with a code block:

		ioutil.WriteFile(createdFilepath, []byte("hello world"), 0600)
		ioutil.WriteFile(updatedFilepath, []byte("hello world"), 0600)

WriteFile returns when the file is written... no need to block the test suite for an extra 2 minutes.

Signed-off-by: chrismark <chrismarkou92@gmail.com>
@urso
Copy link

urso commented Feb 16, 2021

The test is still skipped on windows. Shall we unskip and see how it goes?

Signed-off-by: chrismark <chrismarkou92@gmail.com>
@urso
Copy link

urso commented Feb 22, 2021

@ChrsMark What's the status of this change? This is still one of the most flaky tests we need to address.

Signed-off-by: chrismark <chrismarkou92@gmail.com>
@ChrsMark
Copy link
Member Author

@ChrsMark What's the status of this change? This is still one of the most flaky tests we need to address.

Seems there were some lint issues, I updated again, let's see if CI goes green now.

ChrsMark added a commit to ChrsMark/beats that referenced this pull request Feb 23, 2021
v1v added a commit to v1v/beats that referenced this pull request Feb 23, 2021
…dows-7

* upstream/master:
  Remove OSS reference for kibana and elasticsearch (elastic#24164)
  Skip flaky TestActions on MacOSx (elastic#23966)
  [Filebeat][AWS] Fix vpcflow pipeline exception: Cannot invoke "Object.getClass()" because "receiver" is null (elastic#24167)
  [Elastic Agent] Fix docker entrypoint for elastic-agent. (elastic#24155)
  [PACKAGING] Push docker images with the architecture in the version (elastic#24121)
  [Agent] Add agent standalone manifests for system module & Pod's log collection (elastic#23938)
  indicator type url is in upper case (elastic#24152)
  [Filebeat] Document netflow internal_networks and set default (elastic#24110)
  [Filebeat] Adding fixes to the TI module (elastic#24133)
  [Enhancement] Add RotateOnStartup feature flag for file output (elastic#19347)
  [Ingest Manager] Fix: Successfully installed and enrolled agent running standalone (elastic#24128)
  Set Elastic licence type for APM server Beats update job (elastic#24122)
  Add logrotation section on Running Filebeat on k8s (elastic#24120)
  [CI] Run if manual UI (elastic#24116)
  [CI] enable x-pack/heartbeat in the CI (elastic#23873)
  chore: comment out the E2E (elastic#24109)
  chore: add-backport-next (elastic#24098)
  Adjust the position of the architecture name in Dockerlogbeat tarball (elastic#24095)
  Update dependencies for M1 support in System (elastic#24019)
ChrsMark added a commit that referenced this pull request Feb 23, 2021
ChrsMark added a commit that referenced this pull request Feb 23, 2021
v1v added a commit to v1v/beats that referenced this pull request Feb 23, 2021
…-arm

* upstream/master: (24 commits)
  Add example input autodsicover config (elastic#24157)
  Empty configuration options generate `<no value>` string for azure-eventhub input (elastic#24156)
  Remove OSS reference for kibana and elasticsearch (elastic#24164)
  Skip flaky TestActions on MacOSx (elastic#23966)
  [Filebeat][AWS] Fix vpcflow pipeline exception: Cannot invoke "Object.getClass()" because "receiver" is null (elastic#24167)
  [Elastic Agent] Fix docker entrypoint for elastic-agent. (elastic#24155)
  [PACKAGING] Push docker images with the architecture in the version (elastic#24121)
  [Agent] Add agent standalone manifests for system module & Pod's log collection (elastic#23938)
  indicator type url is in upper case (elastic#24152)
  [Filebeat] Document netflow internal_networks and set default (elastic#24110)
  [Filebeat] Adding fixes to the TI module (elastic#24133)
  [Enhancement] Add RotateOnStartup feature flag for file output (elastic#19347)
  [Ingest Manager] Fix: Successfully installed and enrolled agent running standalone (elastic#24128)
  Set Elastic licence type for APM server Beats update job (elastic#24122)
  Add logrotation section on Running Filebeat on k8s (elastic#24120)
  [CI] Run if manual UI (elastic#24116)
  [CI] enable x-pack/heartbeat in the CI (elastic#23873)
  chore: comment out the E2E (elastic#24109)
  chore: add-backport-next (elastic#24098)
  Adjust the position of the architecture name in Dockerlogbeat tarball (elastic#24095)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flaky-test Unstable or unreliable test cases. v7.11.2 v7.12.0 v7.13.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants