Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix flakiness on custom time range saved searches #165454

Conversation

gsoldevila
Copy link
Contributor

@gsoldevila gsoldevila commented Sep 1, 2023

A bunch of tests on dashboards are customising some of the panels settings and providing custom time ranges:

image

Currently, the logic is not waiting for the quick toggle animation to complete, before proceeding to select a time range.
This can cause a flaky behavior if the logic tries to customize the range before the button is actually available, as seen on this failed test. (part of this failed CI build)

The goal of this PR is to add a small waiting period, to make sure the toggle animation has completed, and that the time range controls are visible and clickable.

I used the opportunity to cleanup some "await delay millis" calls, reusing existing logic instead.

@gsoldevila gsoldevila requested review from a team as code owners September 1, 2023 11:08
@gsoldevila gsoldevila added bug Fixes for quality problems that affect the customer experience Team:QA Team label for QA Team release_note:skip Skip the PR/issue when compiling release notes test-failure-flaky backport:prev-minor Backport to (8.x) the previous minor version (i.e. one version back from main) v8.11.0 labels Sep 1, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-qa (Team:QA)

@ThomThomson
Copy link
Contributor

If I remember correctly, animations are disabled for CI runs, so there really shouldn't be any flakiness caused by an animation. Additionally, adding timed sleeps is usually flakier than adding a retry or awaiting the presence of some element, so IMO it might be better to restructure some of these calls to do that rather than just sleeping.

@gsoldevila
Copy link
Contributor Author

gsoldevila commented Sep 4, 2023

I checked with QA folks and it seems that they are indeed disabled.

Interestingly, I have a bunch of flaky test runner failures that show a screenshot of the welcome page interstitial fading in (half way through): https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/2851

Additionally, adding timed sleeps is usually flakier than adding a retry or awaiting the presence of some element, so IMO it might be better to restructure some of these calls to do that rather than just sleeping

Fair point, I'll update the PR accordingly.

@gsoldevila gsoldevila requested review from a team as code owners September 5, 2023 10:35
@gsoldevila
Copy link
Contributor Author

@ThomThomson you were right, I confirmed animations are disabled (both on CI an locally).
A detailed look at the debug logs showed that the custom time range was actually not opening at all (either missed click or clicked twice 🤷🏼‍♂️ ).

Either way, I added an optional boolean to these toggles, so that when we intend to enable them, we will only click if they are disabled (and viceversa).

Copy link
Contributor

@stratoula stratoula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Visualizations team changes LGTM

Copy link
Member

@nickpeihl nickpeihl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dashboard functional test changes lgtm!

thanks for fixing the flakiness!

@@ -55,10 +56,12 @@ export function DashboardCustomizePanelProvider({ getService }: FtrProviderConte
return await this.findFlyoutTestSubject('superDatePickerToggleQuickMenuButton');
}

public async clickToggleQuickMenuButton() {
public async clickToggleQuickMenuButton(open?: boolean) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of toggle methods with an open parameter, how about create separate enable/disable methods? So something like enableCustomTimeRange and disableCustomTimeRange.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we can do that too!

@@ -115,10 +122,18 @@ export function DashboardCustomizePanelProvider({ getService, getPageObject }: F
});
}

public async clickToggleShowCustomTimeRange(enable?: boolean) {
log.debug('clickToggleShowCustomTimeRange');
public async enableCustomTimeRange() {
Copy link
Contributor

@nreese nreese Sep 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. One more thing, since it sounds like missing clicks is one cause of flakyness, how about adding a retry

const toggle = await testSubjects.find(this.TOGGLE_TIME_RANGE_TEST_SUBJ);
await retry.try(async () => {
  if (!(await toggle.isSelected())) {
    await toggle.click();
    if (!(await toggle.isSelected())) {
      throw new Error('Switch click missed');
    }
  }
});

@gsoldevila
Copy link
Contributor Author

gsoldevila commented Sep 14, 2023

Triggered 150x runs with the current changes: https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/3128

Copy link
Contributor

@davismcphee davismcphee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gsoldevila Thanks for the effort on this! Unfortunately it looks like the Discover test is the only one still failing in your flaky test run... @jughosta recently looked into this failure too and opened a draft PR #166239 that seems to fix the flakiness (she also did a 150x run). If you prefer, feel free to leave the Discover test skipped in this PR so you can get it merged, and we can follow up with the fix from #166239 instead.

@gsoldevila
Copy link
Contributor Author

gsoldevila commented Sep 14, 2023

@davismcphee Frankly, I don't think that PR tackles the actual source of flakiness, which IMO is a lost click on the "Enable custom time range toggle".
I've just checked the new CI failure stack trace in my runs and came to the exact same conclusion.

As for my PR, I believe I had a working solution with the retry.try() but changed it to retry.waitFor() instead, thinking that it would re-attempt the click if I passed an onFailureBlock handler (it seems it does not).

But since it's your ownership, feel free to go ahead and merge it.
I'll keep mine around just in case the flakiness strikes back (I'll rebase / solve conflicts).

@davismcphee
Copy link
Contributor

davismcphee commented Sep 15, 2023

@gsoldevila Thanks for the explanation! In that case we'll hold off on merging #166239. I didn't want to block the PR if you were hoping to merge, but we can definitely continue with this one instead if you think you've got a reliable fix for it. Looks like the test has been an issue for quite a while so it would be great to see it gone for good.

@gsoldevila
Copy link
Contributor Author

gsoldevila commented Sep 21, 2023

I updated the branch to use a retry mechanism.
Triggered 150x runs of the flaky test runner: https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/3207
@jughosta I also updated the opening of the date picker quick menu (2nd action in the spot) using the logic in your PR.

@kibana-ci
Copy link
Collaborator

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

  • [job] [logs] FTR Configs #10 / serverless common UI Index Management Index Templates "before all" hook for "renders the index templates tab"
  • [job] [logs] FTR Configs #1 / serverless examples UI Partial Results Example "before all" hook for "should trace mouse events"

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@gsoldevila gsoldevila enabled auto-merge (squash) September 22, 2023 09:20
Copy link
Contributor

@davismcphee davismcphee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest changes LGTM and the flakiness seems to be gone! 🎉 @jughosta can you give the final approval on this one since you're most familiar with these tests?

@gsoldevila gsoldevila merged commit b98b6d0 into elastic:main Sep 25, 2023
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Sep 25, 2023
A bunch of tests on dashboards are customising some of the panels
settings and providing custom time ranges:

<img width="409" alt="image"
src="https://github.com/elastic/kibana/assets/25349407/c869c1a3-f7db-4ccd-ad00-c5403f2b4201">

Currently, the logic is not waiting for the quick toggle animation to
complete, before proceeding to select a time range.
This can cause a flaky behavior if the logic tries to customize the
range before the button is actually available, as seen on [this failed
test](https://s3.amazonaws.com/buildkiteartifacts.com/e0f3970e-3a75-4621-919f-e6c773e2bb12/0fda5127-f57f-42fb-8e5a-146b3d535916/018a4c44-5497-48b6-8521-a8a79a2f63b4/018a4c46-0e7a-4b69-9a3d-9c54c27165b0/target/test_failures/018a4c46-0e7a-4b69-9a3d-9c54c27165b0_4fcbc47e71644919129e320eea8bb3bc.html?response-content-type=text%2Fhtml&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=ASIAQPCP3C7LZWZ5UB5F%2F20230901%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230901T094837Z&X-Amz-Expires=600&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEB0aCXVzLWVhc3QtMSJGMEQCIGCyKcVLGPUawZubNzZdt5oZNb5v0saiIuPqXwI7rmwlAiAsOj%2Fiep94v%2BYZJtLY3Gw0m%2FmK5mJw2IcIBdNKFXgK%2BCr6Awjm%2F%2F%2F%2F%2F%2F%2F%2F%2F%2F8BEAAaDDAzMjM3OTcwNTMwMyIMXOd1Hm6ks%2FNE37V0Ks4DgMUso7syv87hnPcC%2BB1soxvFFnj4JnNZc6ZgkLUe93z99iPFBUsqH%2BRbUTfSbjVOEJYBKGYuvp32xvSWsYNVPXKmcej18LC0yNi%2BBzoG2X%2Bj80g%2BbGMm6YfTncjPhOE0CHHqOWXts9nQ8WpDy8XOl0zfMtuiPjzOXHo9lvw2mgYDZIJIMV72FYB9JGg8FPbLQtD3rysLGNE0VDKgl5LCnYwhY1pwRCRHnVW41QfV0pwK%2FbjNf9HjdK31LQvMY%2FGPuB3M6O2CUZLsvLGfWBeGYHtkqb0hrL9ijO1Uo28ZSS1FytPftEdF0e1kAC9C5zD56HtYm55aktOWtaaC0XPWLdWWGUq%2FKQzhxSCiXK6ovATU3zI3yPNoZs92YBYmIPMOpEI40dCCpksjPwAMCiQd%2F9gMNKP5Qp5CbYd2Khy%2FeXaT8J7HOZCueN63O0j%2FtX1tbwfznhbr74lAcRQjueRYmwboZaGSDZUQ33lSSmyZk1V9WF9eJyt88oHvIx0q9bIjvOlW05DiNKfEFWYwfBywdGuvRU6eGMs1QcDNu33Lb%2BhymudM2JZmQKIjZOcb2l3Fzctp614owH4JcRlmF4%2BIa4xHeBdRlTMysS8bTIsgMK7axacGOqYBzIpC1wgZWJ1kZ0agLWCNaMIdUl%2B4xrr7w%2Fz0843WWMhRrvbJhDTHqk5UclF%2FSROAMe0FH2XEXiQ65ILyUPlrUMels5tfQ3Pp%2FJWPi9NsQJUQ1n9uLN%2BFPDOoMo8Uxg4%2FkG2O7yTkrIdArfA6pWN9I21gFMW%2BFZy9BMYltt5T65ZKOyYAIFGpLhgfBySIBCUMgwR1kusfDhf1%2FRTvtDKD2sJKN5a0IA%3D%3D&X-Amz-SignedHeaders=host&X-Amz-Signature=35fabe908aa7514e4a92de0ed12973af85ccfb439984fc3bdd7ef3bb8fe3419b).
(part of this [failed CI
build](https://buildkite.com/elastic/kibana-pull-request/builds/155285#018a4c46-0e7a-4b69-9a3d-9c54c27165b0))

The goal of this PR is to add a small waiting period, to make sure the
toggle animation has completed, and that the time range controls are
visible and clickable.

I used the opportunity to cleanup some "await delay millis" calls,
reusing existing logic instead.

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
(cherry picked from commit b98b6d0)
@kibanamachine
Copy link
Contributor

💚 All backports created successfully

Status Branch Result
8.10

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

joemcelroy pushed a commit to joemcelroy/kibana that referenced this pull request Sep 25, 2023
A bunch of tests on dashboards are customising some of the panels
settings and providing custom time ranges:

<img width="409" alt="image"
src="https://github.com/elastic/kibana/assets/25349407/c869c1a3-f7db-4ccd-ad00-c5403f2b4201">

Currently, the logic is not waiting for the quick toggle animation to
complete, before proceeding to select a time range.
This can cause a flaky behavior if the logic tries to customize the
range before the button is actually available, as seen on [this failed
test](https://s3.amazonaws.com/buildkiteartifacts.com/e0f3970e-3a75-4621-919f-e6c773e2bb12/0fda5127-f57f-42fb-8e5a-146b3d535916/018a4c44-5497-48b6-8521-a8a79a2f63b4/018a4c46-0e7a-4b69-9a3d-9c54c27165b0/target/test_failures/018a4c46-0e7a-4b69-9a3d-9c54c27165b0_4fcbc47e71644919129e320eea8bb3bc.html?response-content-type=text%2Fhtml&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=ASIAQPCP3C7LZWZ5UB5F%2F20230901%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230901T094837Z&X-Amz-Expires=600&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEB0aCXVzLWVhc3QtMSJGMEQCIGCyKcVLGPUawZubNzZdt5oZNb5v0saiIuPqXwI7rmwlAiAsOj%2Fiep94v%2BYZJtLY3Gw0m%2FmK5mJw2IcIBdNKFXgK%2BCr6Awjm%2F%2F%2F%2F%2F%2F%2F%2F%2F%2F8BEAAaDDAzMjM3OTcwNTMwMyIMXOd1Hm6ks%2FNE37V0Ks4DgMUso7syv87hnPcC%2BB1soxvFFnj4JnNZc6ZgkLUe93z99iPFBUsqH%2BRbUTfSbjVOEJYBKGYuvp32xvSWsYNVPXKmcej18LC0yNi%2BBzoG2X%2Bj80g%2BbGMm6YfTncjPhOE0CHHqOWXts9nQ8WpDy8XOl0zfMtuiPjzOXHo9lvw2mgYDZIJIMV72FYB9JGg8FPbLQtD3rysLGNE0VDKgl5LCnYwhY1pwRCRHnVW41QfV0pwK%2FbjNf9HjdK31LQvMY%2FGPuB3M6O2CUZLsvLGfWBeGYHtkqb0hrL9ijO1Uo28ZSS1FytPftEdF0e1kAC9C5zD56HtYm55aktOWtaaC0XPWLdWWGUq%2FKQzhxSCiXK6ovATU3zI3yPNoZs92YBYmIPMOpEI40dCCpksjPwAMCiQd%2F9gMNKP5Qp5CbYd2Khy%2FeXaT8J7HOZCueN63O0j%2FtX1tbwfznhbr74lAcRQjueRYmwboZaGSDZUQ33lSSmyZk1V9WF9eJyt88oHvIx0q9bIjvOlW05DiNKfEFWYwfBywdGuvRU6eGMs1QcDNu33Lb%2BhymudM2JZmQKIjZOcb2l3Fzctp614owH4JcRlmF4%2BIa4xHeBdRlTMysS8bTIsgMK7axacGOqYBzIpC1wgZWJ1kZ0agLWCNaMIdUl%2B4xrr7w%2Fz0843WWMhRrvbJhDTHqk5UclF%2FSROAMe0FH2XEXiQ65ILyUPlrUMels5tfQ3Pp%2FJWPi9NsQJUQ1n9uLN%2BFPDOoMo8Uxg4%2FkG2O7yTkrIdArfA6pWN9I21gFMW%2BFZy9BMYltt5T65ZKOyYAIFGpLhgfBySIBCUMgwR1kusfDhf1%2FRTvtDKD2sJKN5a0IA%3D%3D&X-Amz-SignedHeaders=host&X-Amz-Signature=35fabe908aa7514e4a92de0ed12973af85ccfb439984fc3bdd7ef3bb8fe3419b).
(part of this [failed CI
build](https://buildkite.com/elastic/kibana-pull-request/builds/155285#018a4c46-0e7a-4b69-9a3d-9c54c27165b0))

The goal of this PR is to add a small waiting period, to make sure the
toggle animation has completed, and that the time range controls are
visible and clickable.

I used the opportunity to cleanup some "await delay millis" calls,
reusing existing logic instead.

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
@kibanamachine kibanamachine added the backport missing Added to PRs automatically when the are determined to be missing a backport. label Mar 9, 2024
@kibanamachine
Copy link
Contributor

Looks like this PR has a backport PR but it still hasn't been merged. Please merge it ASAP to keep the branches relatively in sync.

4 similar comments
@kibanamachine
Copy link
Contributor

Looks like this PR has a backport PR but it still hasn't been merged. Please merge it ASAP to keep the branches relatively in sync.

@kibanamachine
Copy link
Contributor

Looks like this PR has a backport PR but it still hasn't been merged. Please merge it ASAP to keep the branches relatively in sync.

@kibanamachine
Copy link
Contributor

Looks like this PR has a backport PR but it still hasn't been merged. Please merge it ASAP to keep the branches relatively in sync.

@kibanamachine
Copy link
Contributor

Looks like this PR has a backport PR but it still hasn't been merged. Please merge it ASAP to keep the branches relatively in sync.

@jbudz jbudz added the backport:skip This commit does not require backporting label Sep 30, 2024
@kibanamachine kibanamachine removed the backport missing Added to PRs automatically when the are determined to be missing a backport. label Sep 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:prev-minor Backport to (8.x) the previous minor version (i.e. one version back from main) backport:skip This commit does not require backporting bug Fixes for quality problems that affect the customer experience release_note:skip Skip the PR/issue when compiling release notes Team:QA Team label for QA Team test-failure-flaky v8.11.0
Projects
None yet