-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Detection Engine] Addresses Flakiness in ML FTR tests #188155
Conversation
The flakiness here ends up being caused by sporadic unavailability of shards during module setup. The underlying cause of that unavailability is likely a race condition between ML, ES, and/or FTR, but luckily we don't need to worry about that because simply retrying the API call causes it to eventually succeed. In those cases, some of the jobs will report a 4xx status, but that's expected. This is the result of a lot of prodding and CPU cycles on CI; see elastic#182183 for the full details.
Now that we have the robustness provided by the `setupMlModulesWithRetry` helper, these test should no longer be flaky.
Pinging @elastic/security-detection-engine (Team:Detection Engine) |
This call was found to be sporadically failing in elastic#182183. This applies the same changes made in elastic#188155, but for Cypress tests instead of FTR.
Flaky Test Runner Stats🟠 Some tests failed. - kibana-flaky-test-suite-runner#6529[❌] x-pack/test/security_solution_api_integration/test_suites/detections_response/detection_engine/rule_execution_logic/trial_license_complete_tier/configs/serverless.config.ts: 174/200 tests passed. |
Flaky Test Runner Stats🎉 All tests passed! - kibana-flaky-test-suite-runner#6528[✅] x-pack/test/security_solution_api_integration/test_suites/detections_response/detection_engine/rule_execution_logic/trial_license_complete_tier/configs/ess.config.ts: 200/200 tests passed. |
The serverless tests timed out on run 175/200, meaning the remaining jobs were cancelled. I'll try kicking off another round, but since all failures subsequent to these fixes have been environmental timeouts, I don't think this should be blocked from merging. |
💚 Build Succeeded
Metrics [docs]
History
cc @rylnd |
Flaky Test Runner Stats🟠 Some tests failed. - kibana-flaky-test-suite-runner#6533[❌] x-pack/test/security_solution_api_integration/test_suites/detections_response/detection_engine/rule_execution_logic/trial_license_complete_tier/configs/serverless.config.ts: 173/200 tests passed. |
There was a similar result on the last serverless run x 200: 25/100 failed, but they were all due to timeouts or I still haven't seen any failures specific to our ML tests on this branch 👍, but if a reviewer would like more confidence we can figure that out. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the investigation and going over the solution with us in the sync yesterday. LGTM!
💚 All backports created successfully
Note: Successful backport PRs will be merged automatically after passing CI. Questions ?Please refer to the Backport tool documentation |
## Summary The full chronicle of this endeavor can be found [here](elastic#182183), but [this comment](elastic#182183 (comment)) summarizes the identified issue: > I [finally found](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6516#01909dde-a3e8-4e47-b255-b1ff7cac8f8d/6-2368) the cause of these failures in the response to our "setup modules" request to ML. Attaching here for posterity: > > <details> > <summary>Setup Modules Failure Response</summary> > > ```json > { > "jobs": [ > { "id": "v3_linux_anomalous_network_port_activity", "success": true }, > { > "id": "v3_linux_anomalous_network_activity", > "success": false, > "error": { > "error": { > "root_cause": [ > { > "type": "no_shard_available_action_exception", > "reason": "[ftr][127.0.0.1:9300][indices:data/read/search[phase/query]]" > } > ], > "type": "search_phase_execution_exception", > "reason": "all shards failed", > "phase": "query", > "grouped": true, > "failed_shards": [ > { > "shard": 0, > "index": ".ml-anomalies-custom-v3_linux_network_configuration_discovery", > "node": "dKzpvp06ScO0OxqHilETEA", > "reason": { > "type": "no_shard_available_action_exception", > "reason": "[ftr][127.0.0.1:9300][indices:data/read/search[phase/query]]" > } > } > ] > }, > "status": 503 > } > } > ], > "datafeeds": [ > { > "id": "datafeed-v3_linux_anomalous_network_port_activity", > "success": true, > "started": false, > "awaitingMlNodeAllocation": false > }, > { > "id": "datafeed-v3_linux_anomalous_network_activity", > "success": false, > "started": false, > "awaitingMlNodeAllocation": false, > "error": { > "error": { > "root_cause": [ > { > "type": "resource_not_found_exception", > "reason": "No known job with id 'v3_linux_anomalous_network_activity'" > } > ], > "type": "resource_not_found_exception", > "reason": "No known job with id 'v3_linux_anomalous_network_activity'" > }, > "status": 404 > } > } > ], > "kibana": {} > } > > ``` > </details> This branch, then, fixes said issue by (relatively simply) retrying the failed API call until it succeeds. ### Related Issues Addresses: - elastic#171426 - elastic#187478 - elastic#187614 - elastic#182009 - elastic#171426 ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [x] [Flaky Test Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was used on any tests changed - [x] [ESS Rule Execution FTR x 200](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6528) - [x] [Serverless Rule Execution FTR x 200](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6529) ### For maintainers - [x] This was checked for breaking API changes and was [labeled appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) (cherry picked from commit 3df635e)
… (#188259) # Backport This will backport the following commits from `main` to `8.15`: - [[Detection Engine] Addresses Flakiness in ML FTR tests (#188155)](#188155) <!--- Backport version: 8.9.8 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT [{"author":{"name":"Ryland Herrick","email":"ryalnd@gmail.com"},"sourceCommit":{"committedDate":"2024-07-12T19:10:25Z","message":"[Detection Engine] Addresses Flakiness in ML FTR tests (#188155)\n\n## Summary\r\n\r\nThe full chronicle of this endeavor can be found\r\n[here](#182183), but [this\r\ncomment](https://github.com/elastic/kibana/pull/182183#issuecomment-2221517519)\r\nsummarizes the identified issue:\r\n\r\n> I [finally\r\nfound](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6516#01909dde-a3e8-4e47-b255-b1ff7cac8f8d/6-2368)\r\nthe cause of these failures in the response to our \"setup modules\"\r\nrequest to ML. Attaching here for posterity:\r\n>\r\n> <details>\r\n> <summary>Setup Modules Failure Response</summary>\r\n> \r\n> ```json\r\n> {\r\n> \"jobs\": [\r\n> { \"id\": \"v3_linux_anomalous_network_port_activity\", \"success\": true },\r\n> {\r\n> \"id\": \"v3_linux_anomalous_network_activity\",\r\n> \"success\": false,\r\n> \"error\": {\r\n> \"error\": {\r\n> \"root_cause\": [\r\n> {\r\n> \"type\": \"no_shard_available_action_exception\",\r\n> \"reason\":\r\n\"[ftr][127.0.0.1:9300][indices:data/read/search[phase/query]]\"\r\n> }\r\n> ],\r\n> \"type\": \"search_phase_execution_exception\",\r\n> \"reason\": \"all shards failed\",\r\n> \"phase\": \"query\",\r\n> \"grouped\": true,\r\n> \"failed_shards\": [\r\n> {\r\n> \"shard\": 0,\r\n> \"index\":\r\n\".ml-anomalies-custom-v3_linux_network_configuration_discovery\",\r\n> \"node\": \"dKzpvp06ScO0OxqHilETEA\",\r\n> \"reason\": {\r\n> \"type\": \"no_shard_available_action_exception\",\r\n> \"reason\":\r\n\"[ftr][127.0.0.1:9300][indices:data/read/search[phase/query]]\"\r\n> }\r\n> }\r\n> ]\r\n> },\r\n> \"status\": 503\r\n> }\r\n> }\r\n> ],\r\n> \"datafeeds\": [\r\n> {\r\n> \"id\": \"datafeed-v3_linux_anomalous_network_port_activity\",\r\n> \"success\": true,\r\n> \"started\": false,\r\n> \"awaitingMlNodeAllocation\": false\r\n> },\r\n> {\r\n> \"id\": \"datafeed-v3_linux_anomalous_network_activity\",\r\n> \"success\": false,\r\n> \"started\": false,\r\n> \"awaitingMlNodeAllocation\": false,\r\n> \"error\": {\r\n> \"error\": {\r\n> \"root_cause\": [\r\n> {\r\n> \"type\": \"resource_not_found_exception\",\r\n> \"reason\": \"No known job with id 'v3_linux_anomalous_network_activity'\"\r\n> }\r\n> ],\r\n> \"type\": \"resource_not_found_exception\",\r\n> \"reason\": \"No known job with id 'v3_linux_anomalous_network_activity'\"\r\n> },\r\n> \"status\": 404\r\n> }\r\n> }\r\n> ],\r\n> \"kibana\": {}\r\n> }\r\n> \r\n> ```\r\n> </details>\r\n\r\nThis branch, then, fixes said issue by (relatively simply) retrying the\r\nfailed API call until it succeeds.\r\n\r\n### Related Issues\r\nAddresses:\r\n- https://github.com/elastic/kibana/issues/171426\r\n- https://github.com/elastic/kibana/issues/187478\r\n- https://github.com/elastic/kibana/issues/187614\r\n- https://github.com/elastic/kibana/issues/182009\r\n- https://github.com/elastic/kibana/issues/171426\r\n\r\n### Checklist\r\n\r\n- [x] [Unit or functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere updated or added to match the most common scenarios\r\n- [x] [Flaky Test\r\nRunner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was\r\nused on any tests changed\r\n- [x] [ESS Rule Execution FTR x\r\n200](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6528)\r\n- [x] [Serverless Rule Execution FTR x\r\n200](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6529)\r\n\r\n\r\n### For maintainers\r\n\r\n- [x] This was checked for breaking API changes and was [labeled\r\nappropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)","sha":"3df635ef4a8c86c41c91ac5f59198a9b67d1dc8b","branchLabelMapping":{"^v8.16.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","backport:skip","Feature:Detection Rules","Feature:ML Rule","Feature:Security ML Jobs","Feature:Rule Creation","Team:Detection Engine","Feature:Rule Edit","v8.16.0"],"number":188155,"url":"https://github.com/elastic/kibana/pull/188155","mergeCommit":{"message":"[Detection Engine] Addresses Flakiness in ML FTR tests (#188155)\n\n## Summary\r\n\r\nThe full chronicle of this endeavor can be found\r\n[here](#182183), but [this\r\ncomment](https://github.com/elastic/kibana/pull/182183#issuecomment-2221517519)\r\nsummarizes the identified issue:\r\n\r\n> I [finally\r\nfound](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6516#01909dde-a3e8-4e47-b255-b1ff7cac8f8d/6-2368)\r\nthe cause of these failures in the response to our \"setup modules\"\r\nrequest to ML. Attaching here for posterity:\r\n>\r\n> <details>\r\n> <summary>Setup Modules Failure Response</summary>\r\n> \r\n> ```json\r\n> {\r\n> \"jobs\": [\r\n> { \"id\": \"v3_linux_anomalous_network_port_activity\", \"success\": true },\r\n> {\r\n> \"id\": \"v3_linux_anomalous_network_activity\",\r\n> \"success\": false,\r\n> \"error\": {\r\n> \"error\": {\r\n> \"root_cause\": [\r\n> {\r\n> \"type\": \"no_shard_available_action_exception\",\r\n> \"reason\":\r\n\"[ftr][127.0.0.1:9300][indices:data/read/search[phase/query]]\"\r\n> }\r\n> ],\r\n> \"type\": \"search_phase_execution_exception\",\r\n> \"reason\": \"all shards failed\",\r\n> \"phase\": \"query\",\r\n> \"grouped\": true,\r\n> \"failed_shards\": [\r\n> {\r\n> \"shard\": 0,\r\n> \"index\":\r\n\".ml-anomalies-custom-v3_linux_network_configuration_discovery\",\r\n> \"node\": \"dKzpvp06ScO0OxqHilETEA\",\r\n> \"reason\": {\r\n> \"type\": \"no_shard_available_action_exception\",\r\n> \"reason\":\r\n\"[ftr][127.0.0.1:9300][indices:data/read/search[phase/query]]\"\r\n> }\r\n> }\r\n> ]\r\n> },\r\n> \"status\": 503\r\n> }\r\n> }\r\n> ],\r\n> \"datafeeds\": [\r\n> {\r\n> \"id\": \"datafeed-v3_linux_anomalous_network_port_activity\",\r\n> \"success\": true,\r\n> \"started\": false,\r\n> \"awaitingMlNodeAllocation\": false\r\n> },\r\n> {\r\n> \"id\": \"datafeed-v3_linux_anomalous_network_activity\",\r\n> \"success\": false,\r\n> \"started\": false,\r\n> \"awaitingMlNodeAllocation\": false,\r\n> \"error\": {\r\n> \"error\": {\r\n> \"root_cause\": [\r\n> {\r\n> \"type\": \"resource_not_found_exception\",\r\n> \"reason\": \"No known job with id 'v3_linux_anomalous_network_activity'\"\r\n> }\r\n> ],\r\n> \"type\": \"resource_not_found_exception\",\r\n> \"reason\": \"No known job with id 'v3_linux_anomalous_network_activity'\"\r\n> },\r\n> \"status\": 404\r\n> }\r\n> }\r\n> ],\r\n> \"kibana\": {}\r\n> }\r\n> \r\n> ```\r\n> </details>\r\n\r\nThis branch, then, fixes said issue by (relatively simply) retrying the\r\nfailed API call until it succeeds.\r\n\r\n### Related Issues\r\nAddresses:\r\n- https://github.com/elastic/kibana/issues/171426\r\n- https://github.com/elastic/kibana/issues/187478\r\n- https://github.com/elastic/kibana/issues/187614\r\n- https://github.com/elastic/kibana/issues/182009\r\n- https://github.com/elastic/kibana/issues/171426\r\n\r\n### Checklist\r\n\r\n- [x] [Unit or functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere updated or added to match the most common scenarios\r\n- [x] [Flaky Test\r\nRunner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was\r\nused on any tests changed\r\n- [x] [ESS Rule Execution FTR x\r\n200](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6528)\r\n- [x] [Serverless Rule Execution FTR x\r\n200](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6529)\r\n\r\n\r\n### For maintainers\r\n\r\n- [x] This was checked for breaking API changes and was [labeled\r\nappropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)","sha":"3df635ef4a8c86c41c91ac5f59198a9b67d1dc8b"}},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[{"branch":"main","label":"v8.16.0","labelRegex":"^v8.16.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/188155","number":188155,"mergeCommit":{"message":"[Detection Engine] Addresses Flakiness in ML FTR tests (#188155)\n\n## Summary\r\n\r\nThe full chronicle of this endeavor can be found\r\n[here](#182183), but [this\r\ncomment](https://github.com/elastic/kibana/pull/182183#issuecomment-2221517519)\r\nsummarizes the identified issue:\r\n\r\n> I [finally\r\nfound](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6516#01909dde-a3e8-4e47-b255-b1ff7cac8f8d/6-2368)\r\nthe cause of these failures in the response to our \"setup modules\"\r\nrequest to ML. Attaching here for posterity:\r\n>\r\n> <details>\r\n> <summary>Setup Modules Failure Response</summary>\r\n> \r\n> ```json\r\n> {\r\n> \"jobs\": [\r\n> { \"id\": \"v3_linux_anomalous_network_port_activity\", \"success\": true },\r\n> {\r\n> \"id\": \"v3_linux_anomalous_network_activity\",\r\n> \"success\": false,\r\n> \"error\": {\r\n> \"error\": {\r\n> \"root_cause\": [\r\n> {\r\n> \"type\": \"no_shard_available_action_exception\",\r\n> \"reason\":\r\n\"[ftr][127.0.0.1:9300][indices:data/read/search[phase/query]]\"\r\n> }\r\n> ],\r\n> \"type\": \"search_phase_execution_exception\",\r\n> \"reason\": \"all shards failed\",\r\n> \"phase\": \"query\",\r\n> \"grouped\": true,\r\n> \"failed_shards\": [\r\n> {\r\n> \"shard\": 0,\r\n> \"index\":\r\n\".ml-anomalies-custom-v3_linux_network_configuration_discovery\",\r\n> \"node\": \"dKzpvp06ScO0OxqHilETEA\",\r\n> \"reason\": {\r\n> \"type\": \"no_shard_available_action_exception\",\r\n> \"reason\":\r\n\"[ftr][127.0.0.1:9300][indices:data/read/search[phase/query]]\"\r\n> }\r\n> }\r\n> ]\r\n> },\r\n> \"status\": 503\r\n> }\r\n> }\r\n> ],\r\n> \"datafeeds\": [\r\n> {\r\n> \"id\": \"datafeed-v3_linux_anomalous_network_port_activity\",\r\n> \"success\": true,\r\n> \"started\": false,\r\n> \"awaitingMlNodeAllocation\": false\r\n> },\r\n> {\r\n> \"id\": \"datafeed-v3_linux_anomalous_network_activity\",\r\n> \"success\": false,\r\n> \"started\": false,\r\n> \"awaitingMlNodeAllocation\": false,\r\n> \"error\": {\r\n> \"error\": {\r\n> \"root_cause\": [\r\n> {\r\n> \"type\": \"resource_not_found_exception\",\r\n> \"reason\": \"No known job with id 'v3_linux_anomalous_network_activity'\"\r\n> }\r\n> ],\r\n> \"type\": \"resource_not_found_exception\",\r\n> \"reason\": \"No known job with id 'v3_linux_anomalous_network_activity'\"\r\n> },\r\n> \"status\": 404\r\n> }\r\n> }\r\n> ],\r\n> \"kibana\": {}\r\n> }\r\n> \r\n> ```\r\n> </details>\r\n\r\nThis branch, then, fixes said issue by (relatively simply) retrying the\r\nfailed API call until it succeeds.\r\n\r\n### Related Issues\r\nAddresses:\r\n- https://github.com/elastic/kibana/issues/171426\r\n- https://github.com/elastic/kibana/issues/187478\r\n- https://github.com/elastic/kibana/issues/187614\r\n- https://github.com/elastic/kibana/issues/182009\r\n- https://github.com/elastic/kibana/issues/171426\r\n\r\n### Checklist\r\n\r\n- [x] [Unit or functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere updated or added to match the most common scenarios\r\n- [x] [Flaky Test\r\nRunner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was\r\nused on any tests changed\r\n- [x] [ESS Rule Execution FTR x\r\n200](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6528)\r\n- [x] [Serverless Rule Execution FTR x\r\n200](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6529)\r\n\r\n\r\n### For maintainers\r\n\r\n- [x] This was checked for breaking API changes and was [labeled\r\nappropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)","sha":"3df635ef4a8c86c41c91ac5f59198a9b67d1dc8b"}}]}] BACKPORT-->
This API call was found to be sporadically failing in #182183. This applies the same changes made in #188155, but for Cypress tests instead of FTR. Since none of the cypress tests are currently skipped, this PR just serves to add robustness to the suite, which performs nearly identical setup to that of the FTR tests. I think the biggest difference is how often these tests are run vs FTRs. Combined with the low failure rate for the underlying issue, cypress's auto-retrying may smooth over many of these failures when they occur. ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [ ] [Flaky Test Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was used on any tests changed - [ ] [Detection Engine Cypress - ESS x 200](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6530) - [ ] [Detection Engine Cypress - Serverless x 200](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6531)
This API call was found to be sporadically failing in elastic#182183. This applies the same changes made in elastic#188155, but for Cypress tests instead of FTR. Since none of the cypress tests are currently skipped, this PR just serves to add robustness to the suite, which performs nearly identical setup to that of the FTR tests. I think the biggest difference is how often these tests are run vs FTRs. Combined with the low failure rate for the underlying issue, cypress's auto-retrying may smooth over many of these failures when they occur. ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [ ] [Flaky Test Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was used on any tests changed - [ ] [Detection Engine Cypress - ESS x 200](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6530) - [ ] [Detection Engine Cypress - Serverless x 200](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6531) (cherry picked from commit ed934e3)
#188483) # Backport This will backport the following commits from `main` to `8.15`: - [[Detection Engine] Fix flake in ML Rule Cypress tests (#188164)](#188164) <!--- Backport version: 9.4.3 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT [{"author":{"name":"Ryland Herrick","email":"ryalnd@gmail.com"},"sourceCommit":{"committedDate":"2024-07-16T19:21:13Z","message":"[Detection Engine] Fix flake in ML Rule Cypress tests (#188164)\n\nThis API call was found to be sporadically failing in #182183. This\r\napplies the same changes made in #188155, but for Cypress tests instead\r\nof FTR.\r\n\r\nSince none of the cypress tests are currently skipped, this PR just\r\nserves to add robustness to the suite, which performs nearly identical\r\nsetup to that of the FTR tests. I think the biggest difference is how\r\noften these tests are run vs FTRs. Combined with the low failure rate\r\nfor the underlying issue, cypress's auto-retrying may smooth over many\r\nof these failures when they occur.\r\n\r\n\r\n### Checklist\r\n\r\n- [x] [Unit or functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere updated or added to match the most common scenarios\r\n- [ ] [Flaky Test\r\nRunner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was\r\nused on any tests changed\r\n- [ ] [Detection Engine Cypress - ESS x\r\n200](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6530)\r\n- [ ] [Detection Engine Cypress - Serverless x\r\n200](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6531)","sha":"ed934e3253b47a6902904633530ec181037d4946","branchLabelMapping":{"^v8.16.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","Feature:Detection Rules","Feature:ML Rule","Feature:Security ML Jobs","Feature:Rule Creation","backport:prev-minor","Team:Detection Engine","Feature:Rule Edit","v8.16.0"],"title":"[Detection Engine] Fix flake in ML Rule Cypress tests","number":188164,"url":"https://github.com/elastic/kibana/pull/188164","mergeCommit":{"message":"[Detection Engine] Fix flake in ML Rule Cypress tests (#188164)\n\nThis API call was found to be sporadically failing in #182183. This\r\napplies the same changes made in #188155, but for Cypress tests instead\r\nof FTR.\r\n\r\nSince none of the cypress tests are currently skipped, this PR just\r\nserves to add robustness to the suite, which performs nearly identical\r\nsetup to that of the FTR tests. I think the biggest difference is how\r\noften these tests are run vs FTRs. Combined with the low failure rate\r\nfor the underlying issue, cypress's auto-retrying may smooth over many\r\nof these failures when they occur.\r\n\r\n\r\n### Checklist\r\n\r\n- [x] [Unit or functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere updated or added to match the most common scenarios\r\n- [ ] [Flaky Test\r\nRunner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was\r\nused on any tests changed\r\n- [ ] [Detection Engine Cypress - ESS x\r\n200](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6530)\r\n- [ ] [Detection Engine Cypress - Serverless x\r\n200](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6531)","sha":"ed934e3253b47a6902904633530ec181037d4946"}},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[{"branch":"main","label":"v8.16.0","branchLabelMappingKey":"^v8.16.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/188164","number":188164,"mergeCommit":{"message":"[Detection Engine] Fix flake in ML Rule Cypress tests (#188164)\n\nThis API call was found to be sporadically failing in #182183. This\r\napplies the same changes made in #188155, but for Cypress tests instead\r\nof FTR.\r\n\r\nSince none of the cypress tests are currently skipped, this PR just\r\nserves to add robustness to the suite, which performs nearly identical\r\nsetup to that of the FTR tests. I think the biggest difference is how\r\noften these tests are run vs FTRs. Combined with the low failure rate\r\nfor the underlying issue, cypress's auto-retrying may smooth over many\r\nof these failures when they occur.\r\n\r\n\r\n### Checklist\r\n\r\n- [x] [Unit or functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere updated or added to match the most common scenarios\r\n- [ ] [Flaky Test\r\nRunner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was\r\nused on any tests changed\r\n- [ ] [Detection Engine Cypress - ESS x\r\n200](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6530)\r\n- [ ] [Detection Engine Cypress - Serverless x\r\n200](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6531)","sha":"ed934e3253b47a6902904633530ec181037d4946"}}]}] BACKPORT--> Co-authored-by: Ryland Herrick <ryalnd@gmail.com>
Summary
The full chronicle of this endeavor can be found here, but this comment summarizes the identified issue:
This branch, then, fixes said issue by (relatively simply) retrying the failed API call until it succeeds.
Related Issues
Addresses:
Checklist
For maintainers