Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Detection Engine] Addresses Flakiness in ML FTR tests #188155

Merged
merged 4 commits into from
Jul 12, 2024

Conversation

rylnd
Copy link
Contributor

@rylnd rylnd commented Jul 11, 2024

Summary

The full chronicle of this endeavor can be found here, but this comment summarizes the identified issue:

I finally found the cause of these failures in the response to our "setup modules" request to ML. Attaching here for posterity:

Setup Modules Failure Response
{
  "jobs": [
    { "id": "v3_linux_anomalous_network_port_activity", "success": true },
    {
      "id": "v3_linux_anomalous_network_activity",
      "success": false,
      "error": {
        "error": {
          "root_cause": [
            {
              "type": "no_shard_available_action_exception",
              "reason": "[ftr][127.0.0.1:9300][indices:data/read/search[phase/query]]"
            }
          ],
          "type": "search_phase_execution_exception",
          "reason": "all shards failed",
          "phase": "query",
          "grouped": true,
          "failed_shards": [
            {
              "shard": 0,
              "index": ".ml-anomalies-custom-v3_linux_network_configuration_discovery",
              "node": "dKzpvp06ScO0OxqHilETEA",
              "reason": {
                "type": "no_shard_available_action_exception",
                "reason": "[ftr][127.0.0.1:9300][indices:data/read/search[phase/query]]"
              }
            }
          ]
        },
        "status": 503
      }
    }
  ],
  "datafeeds": [
    {
      "id": "datafeed-v3_linux_anomalous_network_port_activity",
      "success": true,
      "started": false,
      "awaitingMlNodeAllocation": false
    },
    {
      "id": "datafeed-v3_linux_anomalous_network_activity",
      "success": false,
      "started": false,
      "awaitingMlNodeAllocation": false,
      "error": {
        "error": {
          "root_cause": [
            {
              "type": "resource_not_found_exception",
              "reason": "No known job with id 'v3_linux_anomalous_network_activity'"
            }
          ],
          "type": "resource_not_found_exception",
          "reason": "No known job with id 'v3_linux_anomalous_network_activity'"
        },
        "status": 404
      }
    }
  ],
  "kibana": {}
}

This branch, then, fixes said issue by (relatively simply) retrying the failed API call until it succeeds.

Related Issues

Addresses:

Checklist

For maintainers

rylnd added 2 commits July 11, 2024 14:09
The flakiness here ends up being caused by sporadic unavailability of
shards during module setup. The underlying cause of that unavailability
is likely a race condition between ML, ES, and/or FTR, but luckily we
don't need to worry about that because simply retrying the API call
causes it to eventually succeed.

In those cases, some of the jobs will report a 4xx status, but that's
expected.

This is the result of a lot of prodding and CPU cycles on CI; see elastic#182183 for
the full details.
Now that we have the robustness provided by the
`setupMlModulesWithRetry` helper, these test should no longer be flaky.
@rylnd rylnd added release_note:skip Skip the PR/issue when compiling release notes Feature:ML Rule Security Solution Machine Learning rule type Team:Detection Engine Security Solution Detection Engine Area labels Jul 11, 2024
@rylnd rylnd self-assigned this Jul 11, 2024
@rylnd rylnd marked this pull request as ready for review July 11, 2024 19:32
@rylnd rylnd requested review from a team as code owners July 11, 2024 19:32
@rylnd rylnd requested a review from dhurley14 July 11, 2024 19:32
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-detection-engine (Team:Detection Engine)

rylnd added a commit to rylnd/kibana that referenced this pull request Jul 11, 2024
This call was found to be sporadically failing in elastic#182183. This applies
the same changes made in elastic#188155, but for Cypress tests instead of FTR.
@rylnd rylnd added Feature:Detection Rules Security Solution rules and Detection Engine Feature:Security ML Jobs Security Solution ML Jobs Feature:Rule Creation Security Solution Detection Rule Creation workflow Feature:Rule Edit Security Solution Detection Rule Editing workflow labels Jul 11, 2024
@kibanamachine
Copy link
Contributor

Flaky Test Runner Stats

🟠 Some tests failed. - kibana-flaky-test-suite-runner#6529

[❌] x-pack/test/security_solution_api_integration/test_suites/detections_response/detection_engine/rule_execution_logic/trial_license_complete_tier/configs/serverless.config.ts: 174/200 tests passed.

see run history

@kibanamachine
Copy link
Contributor

Flaky Test Runner Stats

🎉 All tests passed! - kibana-flaky-test-suite-runner#6528

[✅] x-pack/test/security_solution_api_integration/test_suites/detections_response/detection_engine/rule_execution_logic/trial_license_complete_tier/configs/ess.config.ts: 200/200 tests passed.

see run history

@rylnd
Copy link
Contributor Author

rylnd commented Jul 12, 2024

The serverless tests timed out on run 175/200, meaning the remaining jobs were cancelled. I'll try kicking off another round, but since all failures subsequent to these fixes have been environmental timeouts, I don't think this should be blocked from merging.

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

cc @rylnd

@kibanamachine
Copy link
Contributor

Flaky Test Runner Stats

🟠 Some tests failed. - kibana-flaky-test-suite-runner#6533

[❌] x-pack/test/security_solution_api_integration/test_suites/detections_response/detection_engine/rule_execution_logic/trial_license_complete_tier/configs/serverless.config.ts: 173/200 tests passed.

see run history

@rylnd
Copy link
Contributor Author

rylnd commented Jul 12, 2024

There was a similar result on the last serverless run x 200: 25/100 failed, but they were all due to timeouts or socket hang up errors.

I still haven't seen any failures specific to our ML tests on this branch 👍, but if a reviewer would like more confidence we can figure that out.

Copy link
Contributor

@dhurley14 dhurley14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the investigation and going over the solution with us in the sync yesterday. LGTM!

@rylnd rylnd merged commit 3df635e into elastic:main Jul 12, 2024
19 checks passed
@rylnd rylnd deleted the rylnd/fix_ml_ftr_flakiness branch July 12, 2024 19:10
@kibanamachine kibanamachine added the backport:skip This commit does not require backporting label Jul 12, 2024
This was referenced Jul 12, 2024
@rylnd
Copy link
Contributor Author

rylnd commented Jul 12, 2024

💚 All backports created successfully

Status Branch Result
8.15

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

rylnd added a commit to rylnd/kibana that referenced this pull request Jul 12, 2024
## Summary

The full chronicle of this endeavor can be found
[here](elastic#182183), but [this
comment](elastic#182183 (comment))
summarizes the identified issue:

> I [finally
found](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6516#01909dde-a3e8-4e47-b255-b1ff7cac8f8d/6-2368)
the cause of these failures in the response to our "setup modules"
request to ML. Attaching here for posterity:
>
> <details>
> <summary>Setup Modules Failure Response</summary>
>
> ```json
> {
>   "jobs": [
> { "id": "v3_linux_anomalous_network_port_activity", "success": true },
>     {
>       "id": "v3_linux_anomalous_network_activity",
>       "success": false,
>       "error": {
>         "error": {
>           "root_cause": [
>             {
>               "type": "no_shard_available_action_exception",
> "reason":
"[ftr][127.0.0.1:9300][indices:data/read/search[phase/query]]"
>             }
>           ],
>           "type": "search_phase_execution_exception",
>           "reason": "all shards failed",
>           "phase": "query",
>           "grouped": true,
>           "failed_shards": [
>             {
>               "shard": 0,
> "index":
".ml-anomalies-custom-v3_linux_network_configuration_discovery",
>               "node": "dKzpvp06ScO0OxqHilETEA",
>               "reason": {
>                 "type": "no_shard_available_action_exception",
> "reason":
"[ftr][127.0.0.1:9300][indices:data/read/search[phase/query]]"
>               }
>             }
>           ]
>         },
>         "status": 503
>       }
>     }
>   ],
>   "datafeeds": [
>     {
>       "id": "datafeed-v3_linux_anomalous_network_port_activity",
>       "success": true,
>       "started": false,
>       "awaitingMlNodeAllocation": false
>     },
>     {
>       "id": "datafeed-v3_linux_anomalous_network_activity",
>       "success": false,
>       "started": false,
>       "awaitingMlNodeAllocation": false,
>       "error": {
>         "error": {
>           "root_cause": [
>             {
>               "type": "resource_not_found_exception",
> "reason": "No known job with id 'v3_linux_anomalous_network_activity'"
>             }
>           ],
>           "type": "resource_not_found_exception",
> "reason": "No known job with id 'v3_linux_anomalous_network_activity'"
>         },
>         "status": 404
>       }
>     }
>   ],
>   "kibana": {}
> }
>
> ```
> </details>

This branch, then, fixes said issue by (relatively simply) retrying the
failed API call until it succeeds.

### Related Issues
Addresses:
- elastic#171426
- elastic#187478
- elastic#187614
- elastic#182009
- elastic#171426

### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [x] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed
- [x] [ESS Rule Execution FTR x
200](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6528)
- [x] [Serverless Rule Execution FTR x
200](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6529)

### For maintainers

- [x] This was checked for breaking API changes and was [labeled
appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

(cherry picked from commit 3df635e)
rylnd added a commit that referenced this pull request Jul 12, 2024
… (#188259)

# Backport

This will backport the following commits from `main` to `8.15`:
- [[Detection Engine] Addresses Flakiness in ML FTR tests
(#188155)](#188155)

<!--- Backport version: 8.9.8 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Ryland
Herrick","email":"ryalnd@gmail.com"},"sourceCommit":{"committedDate":"2024-07-12T19:10:25Z","message":"[Detection
Engine] Addresses Flakiness in ML FTR tests (#188155)\n\n##
Summary\r\n\r\nThe full chronicle of this endeavor can be
found\r\n[here](#182183), but
[this\r\ncomment](https://github.com/elastic/kibana/pull/182183#issuecomment-2221517519)\r\nsummarizes
the identified issue:\r\n\r\n> I
[finally\r\nfound](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6516#01909dde-a3e8-4e47-b255-b1ff7cac8f8d/6-2368)\r\nthe
cause of these failures in the response to our \"setup
modules\"\r\nrequest to ML. Attaching here for posterity:\r\n>\r\n>
<details>\r\n> <summary>Setup Modules Failure Response</summary>\r\n>
\r\n> ```json\r\n> {\r\n> \"jobs\": [\r\n> { \"id\":
\"v3_linux_anomalous_network_port_activity\", \"success\": true },\r\n>
{\r\n> \"id\": \"v3_linux_anomalous_network_activity\",\r\n>
\"success\": false,\r\n> \"error\": {\r\n> \"error\": {\r\n>
\"root_cause\": [\r\n> {\r\n> \"type\":
\"no_shard_available_action_exception\",\r\n>
\"reason\":\r\n\"[ftr][127.0.0.1:9300][indices:data/read/search[phase/query]]\"\r\n>
}\r\n> ],\r\n> \"type\": \"search_phase_execution_exception\",\r\n>
\"reason\": \"all shards failed\",\r\n> \"phase\": \"query\",\r\n>
\"grouped\": true,\r\n> \"failed_shards\": [\r\n> {\r\n> \"shard\":
0,\r\n>
\"index\":\r\n\".ml-anomalies-custom-v3_linux_network_configuration_discovery\",\r\n>
\"node\": \"dKzpvp06ScO0OxqHilETEA\",\r\n> \"reason\": {\r\n> \"type\":
\"no_shard_available_action_exception\",\r\n>
\"reason\":\r\n\"[ftr][127.0.0.1:9300][indices:data/read/search[phase/query]]\"\r\n>
}\r\n> }\r\n> ]\r\n> },\r\n> \"status\": 503\r\n> }\r\n> }\r\n> ],\r\n>
\"datafeeds\": [\r\n> {\r\n> \"id\":
\"datafeed-v3_linux_anomalous_network_port_activity\",\r\n> \"success\":
true,\r\n> \"started\": false,\r\n> \"awaitingMlNodeAllocation\":
false\r\n> },\r\n> {\r\n> \"id\":
\"datafeed-v3_linux_anomalous_network_activity\",\r\n> \"success\":
false,\r\n> \"started\": false,\r\n> \"awaitingMlNodeAllocation\":
false,\r\n> \"error\": {\r\n> \"error\": {\r\n> \"root_cause\": [\r\n>
{\r\n> \"type\": \"resource_not_found_exception\",\r\n> \"reason\": \"No
known job with id 'v3_linux_anomalous_network_activity'\"\r\n> }\r\n>
],\r\n> \"type\": \"resource_not_found_exception\",\r\n> \"reason\":
\"No known job with id 'v3_linux_anomalous_network_activity'\"\r\n>
},\r\n> \"status\": 404\r\n> }\r\n> }\r\n> ],\r\n> \"kibana\": {}\r\n>
}\r\n> \r\n> ```\r\n> </details>\r\n\r\nThis branch, then, fixes said
issue by (relatively simply) retrying the\r\nfailed API call until it
succeeds.\r\n\r\n### Related Issues\r\nAddresses:\r\n-
https://github.com/elastic/kibana/issues/171426\r\n-
https://github.com/elastic/kibana/issues/187478\r\n-
https://github.com/elastic/kibana/issues/187614\r\n-
https://github.com/elastic/kibana/issues/182009\r\n-
https://github.com/elastic/kibana/issues/171426\r\n\r\n###
Checklist\r\n\r\n- [x] [Unit or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common scenarios\r\n- [x] [Flaky
Test\r\nRunner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1)
was\r\nused on any tests changed\r\n- [x] [ESS Rule Execution FTR
x\r\n200](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6528)\r\n-
[x] [Serverless Rule Execution FTR
x\r\n200](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6529)\r\n\r\n\r\n###
For maintainers\r\n\r\n- [x] This was checked for breaking API changes
and was
[labeled\r\nappropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)","sha":"3df635ef4a8c86c41c91ac5f59198a9b67d1dc8b","branchLabelMapping":{"^v8.16.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","backport:skip","Feature:Detection
Rules","Feature:ML Rule","Feature:Security ML Jobs","Feature:Rule
Creation","Team:Detection Engine","Feature:Rule
Edit","v8.16.0"],"number":188155,"url":"https://github.com/elastic/kibana/pull/188155","mergeCommit":{"message":"[Detection
Engine] Addresses Flakiness in ML FTR tests (#188155)\n\n##
Summary\r\n\r\nThe full chronicle of this endeavor can be
found\r\n[here](#182183), but
[this\r\ncomment](https://github.com/elastic/kibana/pull/182183#issuecomment-2221517519)\r\nsummarizes
the identified issue:\r\n\r\n> I
[finally\r\nfound](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6516#01909dde-a3e8-4e47-b255-b1ff7cac8f8d/6-2368)\r\nthe
cause of these failures in the response to our \"setup
modules\"\r\nrequest to ML. Attaching here for posterity:\r\n>\r\n>
<details>\r\n> <summary>Setup Modules Failure Response</summary>\r\n>
\r\n> ```json\r\n> {\r\n> \"jobs\": [\r\n> { \"id\":
\"v3_linux_anomalous_network_port_activity\", \"success\": true },\r\n>
{\r\n> \"id\": \"v3_linux_anomalous_network_activity\",\r\n>
\"success\": false,\r\n> \"error\": {\r\n> \"error\": {\r\n>
\"root_cause\": [\r\n> {\r\n> \"type\":
\"no_shard_available_action_exception\",\r\n>
\"reason\":\r\n\"[ftr][127.0.0.1:9300][indices:data/read/search[phase/query]]\"\r\n>
}\r\n> ],\r\n> \"type\": \"search_phase_execution_exception\",\r\n>
\"reason\": \"all shards failed\",\r\n> \"phase\": \"query\",\r\n>
\"grouped\": true,\r\n> \"failed_shards\": [\r\n> {\r\n> \"shard\":
0,\r\n>
\"index\":\r\n\".ml-anomalies-custom-v3_linux_network_configuration_discovery\",\r\n>
\"node\": \"dKzpvp06ScO0OxqHilETEA\",\r\n> \"reason\": {\r\n> \"type\":
\"no_shard_available_action_exception\",\r\n>
\"reason\":\r\n\"[ftr][127.0.0.1:9300][indices:data/read/search[phase/query]]\"\r\n>
}\r\n> }\r\n> ]\r\n> },\r\n> \"status\": 503\r\n> }\r\n> }\r\n> ],\r\n>
\"datafeeds\": [\r\n> {\r\n> \"id\":
\"datafeed-v3_linux_anomalous_network_port_activity\",\r\n> \"success\":
true,\r\n> \"started\": false,\r\n> \"awaitingMlNodeAllocation\":
false\r\n> },\r\n> {\r\n> \"id\":
\"datafeed-v3_linux_anomalous_network_activity\",\r\n> \"success\":
false,\r\n> \"started\": false,\r\n> \"awaitingMlNodeAllocation\":
false,\r\n> \"error\": {\r\n> \"error\": {\r\n> \"root_cause\": [\r\n>
{\r\n> \"type\": \"resource_not_found_exception\",\r\n> \"reason\": \"No
known job with id 'v3_linux_anomalous_network_activity'\"\r\n> }\r\n>
],\r\n> \"type\": \"resource_not_found_exception\",\r\n> \"reason\":
\"No known job with id 'v3_linux_anomalous_network_activity'\"\r\n>
},\r\n> \"status\": 404\r\n> }\r\n> }\r\n> ],\r\n> \"kibana\": {}\r\n>
}\r\n> \r\n> ```\r\n> </details>\r\n\r\nThis branch, then, fixes said
issue by (relatively simply) retrying the\r\nfailed API call until it
succeeds.\r\n\r\n### Related Issues\r\nAddresses:\r\n-
https://github.com/elastic/kibana/issues/171426\r\n-
https://github.com/elastic/kibana/issues/187478\r\n-
https://github.com/elastic/kibana/issues/187614\r\n-
https://github.com/elastic/kibana/issues/182009\r\n-
https://github.com/elastic/kibana/issues/171426\r\n\r\n###
Checklist\r\n\r\n- [x] [Unit or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common scenarios\r\n- [x] [Flaky
Test\r\nRunner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1)
was\r\nused on any tests changed\r\n- [x] [ESS Rule Execution FTR
x\r\n200](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6528)\r\n-
[x] [Serverless Rule Execution FTR
x\r\n200](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6529)\r\n\r\n\r\n###
For maintainers\r\n\r\n- [x] This was checked for breaking API changes
and was
[labeled\r\nappropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)","sha":"3df635ef4a8c86c41c91ac5f59198a9b67d1dc8b"}},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[{"branch":"main","label":"v8.16.0","labelRegex":"^v8.16.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/188155","number":188155,"mergeCommit":{"message":"[Detection
Engine] Addresses Flakiness in ML FTR tests (#188155)\n\n##
Summary\r\n\r\nThe full chronicle of this endeavor can be
found\r\n[here](#182183), but
[this\r\ncomment](https://github.com/elastic/kibana/pull/182183#issuecomment-2221517519)\r\nsummarizes
the identified issue:\r\n\r\n> I
[finally\r\nfound](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6516#01909dde-a3e8-4e47-b255-b1ff7cac8f8d/6-2368)\r\nthe
cause of these failures in the response to our \"setup
modules\"\r\nrequest to ML. Attaching here for posterity:\r\n>\r\n>
<details>\r\n> <summary>Setup Modules Failure Response</summary>\r\n>
\r\n> ```json\r\n> {\r\n> \"jobs\": [\r\n> { \"id\":
\"v3_linux_anomalous_network_port_activity\", \"success\": true },\r\n>
{\r\n> \"id\": \"v3_linux_anomalous_network_activity\",\r\n>
\"success\": false,\r\n> \"error\": {\r\n> \"error\": {\r\n>
\"root_cause\": [\r\n> {\r\n> \"type\":
\"no_shard_available_action_exception\",\r\n>
\"reason\":\r\n\"[ftr][127.0.0.1:9300][indices:data/read/search[phase/query]]\"\r\n>
}\r\n> ],\r\n> \"type\": \"search_phase_execution_exception\",\r\n>
\"reason\": \"all shards failed\",\r\n> \"phase\": \"query\",\r\n>
\"grouped\": true,\r\n> \"failed_shards\": [\r\n> {\r\n> \"shard\":
0,\r\n>
\"index\":\r\n\".ml-anomalies-custom-v3_linux_network_configuration_discovery\",\r\n>
\"node\": \"dKzpvp06ScO0OxqHilETEA\",\r\n> \"reason\": {\r\n> \"type\":
\"no_shard_available_action_exception\",\r\n>
\"reason\":\r\n\"[ftr][127.0.0.1:9300][indices:data/read/search[phase/query]]\"\r\n>
}\r\n> }\r\n> ]\r\n> },\r\n> \"status\": 503\r\n> }\r\n> }\r\n> ],\r\n>
\"datafeeds\": [\r\n> {\r\n> \"id\":
\"datafeed-v3_linux_anomalous_network_port_activity\",\r\n> \"success\":
true,\r\n> \"started\": false,\r\n> \"awaitingMlNodeAllocation\":
false\r\n> },\r\n> {\r\n> \"id\":
\"datafeed-v3_linux_anomalous_network_activity\",\r\n> \"success\":
false,\r\n> \"started\": false,\r\n> \"awaitingMlNodeAllocation\":
false,\r\n> \"error\": {\r\n> \"error\": {\r\n> \"root_cause\": [\r\n>
{\r\n> \"type\": \"resource_not_found_exception\",\r\n> \"reason\": \"No
known job with id 'v3_linux_anomalous_network_activity'\"\r\n> }\r\n>
],\r\n> \"type\": \"resource_not_found_exception\",\r\n> \"reason\":
\"No known job with id 'v3_linux_anomalous_network_activity'\"\r\n>
},\r\n> \"status\": 404\r\n> }\r\n> }\r\n> ],\r\n> \"kibana\": {}\r\n>
}\r\n> \r\n> ```\r\n> </details>\r\n\r\nThis branch, then, fixes said
issue by (relatively simply) retrying the\r\nfailed API call until it
succeeds.\r\n\r\n### Related Issues\r\nAddresses:\r\n-
https://github.com/elastic/kibana/issues/171426\r\n-
https://github.com/elastic/kibana/issues/187478\r\n-
https://github.com/elastic/kibana/issues/187614\r\n-
https://github.com/elastic/kibana/issues/182009\r\n-
https://github.com/elastic/kibana/issues/171426\r\n\r\n###
Checklist\r\n\r\n- [x] [Unit or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common scenarios\r\n- [x] [Flaky
Test\r\nRunner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1)
was\r\nused on any tests changed\r\n- [x] [ESS Rule Execution FTR
x\r\n200](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6528)\r\n-
[x] [Serverless Rule Execution FTR
x\r\n200](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6529)\r\n\r\n\r\n###
For maintainers\r\n\r\n- [x] This was checked for breaking API changes
and was
[labeled\r\nappropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)","sha":"3df635ef4a8c86c41c91ac5f59198a9b67d1dc8b"}}]}]
BACKPORT-->
rylnd added a commit that referenced this pull request Jul 16, 2024
This API call was found to be sporadically failing in #182183. This
applies the same changes made in #188155, but for Cypress tests instead
of FTR.

Since none of the cypress tests are currently skipped, this PR just
serves to add robustness to the suite, which performs nearly identical
setup to that of the FTR tests. I think the biggest difference is how
often these tests are run vs FTRs. Combined with the low failure rate
for the underlying issue, cypress's auto-retrying may smooth over many
of these failures when they occur.


### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [ ] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed
- [ ] [Detection Engine Cypress - ESS x
200](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6530)
- [ ] [Detection Engine Cypress - Serverless x
200](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6531)
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Jul 16, 2024
This API call was found to be sporadically failing in elastic#182183. This
applies the same changes made in elastic#188155, but for Cypress tests instead
of FTR.

Since none of the cypress tests are currently skipped, this PR just
serves to add robustness to the suite, which performs nearly identical
setup to that of the FTR tests. I think the biggest difference is how
often these tests are run vs FTRs. Combined with the low failure rate
for the underlying issue, cypress's auto-retrying may smooth over many
of these failures when they occur.

### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [ ] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed
- [ ] [Detection Engine Cypress - ESS x
200](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6530)
- [ ] [Detection Engine Cypress - Serverless x
200](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6531)

(cherry picked from commit ed934e3)
kibanamachine added a commit that referenced this pull request Jul 16, 2024
#188483)

# Backport

This will backport the following commits from `main` to `8.15`:
- [[Detection Engine] Fix flake in ML Rule Cypress tests
(#188164)](#188164)

<!--- Backport version: 9.4.3 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Ryland
Herrick","email":"ryalnd@gmail.com"},"sourceCommit":{"committedDate":"2024-07-16T19:21:13Z","message":"[Detection
Engine] Fix flake in ML Rule Cypress tests (#188164)\n\nThis API call
was found to be sporadically failing in #182183. This\r\napplies the
same changes made in #188155, but for Cypress tests instead\r\nof
FTR.\r\n\r\nSince none of the cypress tests are currently skipped, this
PR just\r\nserves to add robustness to the suite, which performs nearly
identical\r\nsetup to that of the FTR tests. I think the biggest
difference is how\r\noften these tests are run vs FTRs. Combined with
the low failure rate\r\nfor the underlying issue, cypress's
auto-retrying may smooth over many\r\nof these failures when they
occur.\r\n\r\n\r\n### Checklist\r\n\r\n- [x] [Unit or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common scenarios\r\n- [ ] [Flaky
Test\r\nRunner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1)
was\r\nused on any tests changed\r\n- [ ] [Detection Engine Cypress -
ESS
x\r\n200](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6530)\r\n-
[ ] [Detection Engine Cypress - Serverless
x\r\n200](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6531)","sha":"ed934e3253b47a6902904633530ec181037d4946","branchLabelMapping":{"^v8.16.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","Feature:Detection
Rules","Feature:ML Rule","Feature:Security ML Jobs","Feature:Rule
Creation","backport:prev-minor","Team:Detection Engine","Feature:Rule
Edit","v8.16.0"],"title":"[Detection Engine] Fix flake in ML Rule
Cypress
tests","number":188164,"url":"https://github.com/elastic/kibana/pull/188164","mergeCommit":{"message":"[Detection
Engine] Fix flake in ML Rule Cypress tests (#188164)\n\nThis API call
was found to be sporadically failing in #182183. This\r\napplies the
same changes made in #188155, but for Cypress tests instead\r\nof
FTR.\r\n\r\nSince none of the cypress tests are currently skipped, this
PR just\r\nserves to add robustness to the suite, which performs nearly
identical\r\nsetup to that of the FTR tests. I think the biggest
difference is how\r\noften these tests are run vs FTRs. Combined with
the low failure rate\r\nfor the underlying issue, cypress's
auto-retrying may smooth over many\r\nof these failures when they
occur.\r\n\r\n\r\n### Checklist\r\n\r\n- [x] [Unit or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common scenarios\r\n- [ ] [Flaky
Test\r\nRunner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1)
was\r\nused on any tests changed\r\n- [ ] [Detection Engine Cypress -
ESS
x\r\n200](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6530)\r\n-
[ ] [Detection Engine Cypress - Serverless
x\r\n200](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6531)","sha":"ed934e3253b47a6902904633530ec181037d4946"}},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[{"branch":"main","label":"v8.16.0","branchLabelMappingKey":"^v8.16.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/188164","number":188164,"mergeCommit":{"message":"[Detection
Engine] Fix flake in ML Rule Cypress tests (#188164)\n\nThis API call
was found to be sporadically failing in #182183. This\r\napplies the
same changes made in #188155, but for Cypress tests instead\r\nof
FTR.\r\n\r\nSince none of the cypress tests are currently skipped, this
PR just\r\nserves to add robustness to the suite, which performs nearly
identical\r\nsetup to that of the FTR tests. I think the biggest
difference is how\r\noften these tests are run vs FTRs. Combined with
the low failure rate\r\nfor the underlying issue, cypress's
auto-retrying may smooth over many\r\nof these failures when they
occur.\r\n\r\n\r\n### Checklist\r\n\r\n- [x] [Unit or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common scenarios\r\n- [ ] [Flaky
Test\r\nRunner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1)
was\r\nused on any tests changed\r\n- [ ] [Detection Engine Cypress -
ESS
x\r\n200](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6530)\r\n-
[ ] [Detection Engine Cypress - Serverless
x\r\n200](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6531)","sha":"ed934e3253b47a6902904633530ec181037d4946"}}]}]
BACKPORT-->

Co-authored-by: Ryland Herrick <ryalnd@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting Feature:Detection Rules Security Solution rules and Detection Engine Feature:ML Rule Security Solution Machine Learning rule type Feature:Rule Creation Security Solution Detection Rule Creation workflow Feature:Rule Edit Security Solution Detection Rule Editing workflow Feature:Security ML Jobs Security Solution ML Jobs release_note:skip Skip the PR/issue when compiling release notes Team:Detection Engine Security Solution Detection Engine Area v8.15.0 v8.16.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants