[Obs AI Assistant] Attempt to resolve flaky knowledge based user instructions test #196026

viduni94 · 2024-10-13T14:44:20Z

Closes #192222

Summary

Problem

The "when creating private and public user instructions" test has been marked as flaky and has been skipped.
Based on the error recorded in the ticket, 2 possible scenarios could be

Race Conditions: When multiple instructions are created asynchronously, the instructions might not be assigned to the right user or role. Data could be overwritten.
Data Fetching Issues: The API might return inconsistent data if the knowledge base is not properly cleared between tests, or if the instructions are not properly isolated per user.

Solution

When running the test locally, the actual output and expected outcome are the same, therefore the test passes. The flaky test runner didn't output anything meaningful either.

However, in order to resolve any missing entries, the before hook was updated to retry adding only the missing entries. Hopefully, this will help resolve the flakiness.

Checklist

Unit or functional tests were updated to match the most common scenarios
Flaky Test Runner was used on any tests changed

elasticmachine · 2024-10-13T14:44:26Z

Pinging @elastic/obs-ai-assistant (Team:Obs AI Assistant)

kibanamachine · 2024-10-13T15:25:10Z

Flaky Test Runner Stats

🎉 All tests passed! - kibana-flaky-test-suite-runner#7134

[✅] x-pack/test/observability_ai_assistant_api_integration/enterprise/config.ts: 25/25 tests passed.

see run history

kibanamachine · 2024-10-13T17:18:36Z

Flaky Test Runner Stats

🎉 All tests passed! - kibana-flaky-test-suite-runner#7135

[✅] x-pack/test/observability_ai_assistant_api_integration/enterprise/config.ts: 200/200 tests passed.

see run history

dgieselaar · 2024-10-17T07:31:50Z

...y_ai_assistant_api_integration/tests/knowledge_base/knowledge_base_user_instructions.spec.ts

@@ -88,6 +88,7 @@ export default function ApiTest({ getService }: FtrProviderContext) {
        });

        await Promise.all(promises);
+        await new Promise((resolve) => setTimeout(resolve, 500));


Ideally we can avoid sleeps - can you reproduce the flakiness? usually things like this are related to Elasticsearch not having refreshed the shard yet. Can you figure out if we are waiting for a refresh in the endpoint? (Look for refresh: 'wait_for').

Hi @dgieselaar
Thank you for the review.
I wasn't able to reproduce the flakiness.. I tried reproducing locally and via the flaky test runner, but all tests passed.

In the code for these tests, I don't see refresh: 'wait_for'

This is the output I get after locally running the tests:

@viduni94 , Dario was referring to checking whether the actual endpoint we're using querys the knowledge base with refresh: wait_for. We can see that it is, here https://github.com/elastic/kibana/blob/main/x-pack/plugins/observability_solution/observability_ai_assistant/server/service/knowledge_base_service/index.ts#L662.

Looking at the docs , it says:

Never start multiple refresh=wait_for requests in a row. Instead batch them into a single bulk request with refresh=wait_for and Elasticsearch will start them all in parallel and return only when they have all finished.

Perhaps this could be causing some flakiness. Bulk indexing would be more reliable. I was thinking we can instead use POST /internal/observability_ai_assistant/kb/entries/import when we're wanting to index multiple documents for testing, but it looks like that this not actually bulk indexing but creating multiple requests with concurrency control.

Perhaps for now we can use the retry service but think about improving the API to bulk requests. CC @dgieselaar

Thank you @neptunian
I've updated the PR with the suggested changes.

Sorry for the misunderstanding but when I said to perhaps try a retry, I meant to retry fetching them if one of the entries was missing, not to try adding the entries again. The idea behind that was because they are not yet available due to the multiple requests potentially. Since we didn't get an error when indexing them to begin with, I'm not sure how it would help to add them again.

Thank you @neptunian
I've updated the PR.

…the tests to avoid any data inconsistency (elastic#192222)

…astic#192222)

sorenlouv

Retries are always a bandaid to another problem but if they do the job realiably I'm good with this approach for now

dgieselaar · 2024-10-31T15:47:44Z

...y_ai_assistant_api_integration/tests/knowledge_base/knowledge_base_user_instructions.spec.ts

+
+          const instructions = res.body.userInstructions;
+
+          const sortByDocId = (data: any) => sortBy(data, 'doc_id');


can we avoid any here?

ah I noticed this is not your change, but imho we can very easily not use any here, so hopefully that is a small change

Sure @dgieselaar
I updated it.

dgieselaar · 2024-10-31T15:49:05Z

...y_ai_assistant_api_integration/tests/knowledge_base/knowledge_base_user_instructions.spec.ts

@@ -91,59 +92,64 @@ export default function ApiTest({ getService }: FtrProviderContext) {
      });

      it('"editor" can retrieve their own private instructions and the public instruction', async () => {
-        const res = await observabilityAIAssistantAPIClient.editorUser({
-          endpoint: 'GET /internal/observability_ai_assistant/kb/user_instructions',
+        await retry.try(async () => {


Can we remove the retry call here and re-run the flaky tests? If we can do without it, that would be my preference, otherwise we'll end up chasing flakiness everywhere (there is already a retry at the test suite level, for instance).

Thanks for the review @dgieselaar

Apologies if this is a dumb question.
How do I re-run the flaky tests? - do you mean to use the flaky test runner?

I'll send you a link!

elasticmachine · 2024-10-31T17:37:35Z

💚 Build Succeeded

Buildkite Build
Commit: 833db6d
Kibana Serverless Image: docker.elastic.co/kibana-ci/kibana-serverless:pr-196026-833db6d25879

Metrics [docs]

✅ unchanged

History

💛 Build #246997 was flaky 7e6094f
💚 Build #246144 succeeded f513303
💛 Build #246134 was flaky 7b172e4
💔 Build #246133 failed 41ea7e9
💚 Build #246108 succeeded 9552435

cc @viduni94

kibanamachine · 2024-10-31T19:17:28Z

Starting backport for target branches: 8.x

https://github.com/elastic/kibana/actions/runs/11618429376

…ructions test (elastic#196026) Closes elastic#192222 ## Summary ### Problem The "when creating private and public user instructions" test has been marked as flaky and has been skipped. Based on the error recorded in the ticket, 2 possible scenarios could be - **Race Conditions**: When multiple instructions are created asynchronously, the instructions might not be assigned to the right user or role. Data could be overwritten. - **Data Fetching Issues**: The API might return inconsistent data if the knowledge base is not properly cleared between tests, or if the instructions are not properly isolated per user. ### Solution When running the test locally, the actual output and expected outcome are the same, therefore the test passes. The flaky test runner didn't output anything meaningful either. However, in order to resolve any missing entries, the before hook was updated to retry adding only the missing entries. Hopefully, this will help resolve the flakiness. ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated to match the most common scenarios - [x] [Flaky Test Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was used on any tests changed (cherry picked from commit a7a09f7)

kibanamachine · 2024-10-31T19:22:23Z

💚 All backports created successfully

Status	Branch	Result
✅	8.x

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

…r instructions test (#196026) (#198612) # Backport This will backport the following commits from `main` to `8.x`: - [[Obs AI Assistant] Attempt to resolve flaky knowledge based user instructions test (#196026)](#196026)  ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport)  Co-authored-by: Viduni Wickramarachchi <viduni.wickramarachchi@elastic.co>

…ructions test (elastic#196026) Closes elastic#192222 ## Summary ### Problem The "when creating private and public user instructions" test has been marked as flaky and has been skipped. Based on the error recorded in the ticket, 2 possible scenarios could be - **Race Conditions**: When multiple instructions are created asynchronously, the instructions might not be assigned to the right user or role. Data could be overwritten. - **Data Fetching Issues**: The API might return inconsistent data if the knowledge base is not properly cleared between tests, or if the instructions are not properly isolated per user. ### Solution When running the test locally, the actual output and expected outcome are the same, therefore the test passes. The flaky test runner didn't output anything meaningful either. However, in order to resolve any missing entries, the before hook was updated to retry adding only the missing entries. Hopefully, this will help resolve the flakiness. ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated to match the most common scenarios - [x] [Flaky Test Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was used on any tests changed

…y tests (#200022) Closes #192222 ## Summary ### Problem The test appears to be flaky, potentially because the entries are not available at the time of retrieval. This cannot be reproduced locally or via the flaky test runner. (more details [here](#196026 (comment))) ### Solution Add a retry when fetching the instructions and check whether the number of instructions returned by the API endpoint is the same number of instructions expected. ### Checklist - [x] [Flaky Test Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was used on any tests changed

…y tests (elastic#200022) Closes elastic#192222 ## Summary ### Problem The test appears to be flaky, potentially because the entries are not available at the time of retrieval. This cannot be reproduced locally or via the flaky test runner. (more details [here](elastic#196026 (comment))) ### Solution Add a retry when fetching the instructions and check whether the number of instructions returned by the API endpoint is the same number of instructions expected. ### Checklist - [x] [Flaky Test Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was used on any tests changed (cherry picked from commit 53c05a3)

…y tests (elastic#200022) Closes elastic#192222 ## Summary ### Problem The test appears to be flaky, potentially because the entries are not available at the time of retrieval. This cannot be reproduced locally or via the flaky test runner. (more details [here](elastic#196026 (comment))) ### Solution Add a retry when fetching the instructions and check whether the number of instructions returned by the API endpoint is the same number of instructions expected. ### Checklist - [x] [Flaky Test Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was used on any tests changed

viduni94 self-assigned this Oct 13, 2024

viduni94 requested a review from a team as a code owner October 13, 2024 14:44

botelastic bot added ci:project-deploy-observability Create an Observability project Team:Obs AI Assistant Observability AI Assistant labels Oct 13, 2024

viduni94 added release_note:fix backport:prev-minor Backport to (8.x) the previous minor version (i.e. one version back from main) labels Oct 14, 2024

viduni94 force-pushed the resolve-flaky-obs-ai-assistant-test-knowledge-based-user-instructions branch from fa6e12b to 7021ce1 Compare October 14, 2024 13:17

viduni94 changed the title ~~[Obs AI Assistant] Attempt to resolve flaky knowledge based user instructions test (#192222)~~ [Obs AI Assistant] Attempt to resolve flaky knowledge based user instructions test Oct 15, 2024

viduni94 force-pushed the resolve-flaky-obs-ai-assistant-test-knowledge-based-user-instructions branch from 7021ce1 to a0f9183 Compare October 16, 2024 22:01

dgieselaar reviewed Oct 17, 2024

View reviewed changes

viduni94 force-pushed the resolve-flaky-obs-ai-assistant-test-knowledge-based-user-instructions branch 3 times, most recently from b3895c4 to 941dddd Compare October 21, 2024 11:45

viduni94 requested a review from dgieselaar October 21, 2024 20:44

viduni94 force-pushed the resolve-flaky-obs-ai-assistant-test-knowledge-based-user-instructions branch 2 times, most recently from 3efcc81 to 479ebea Compare October 25, 2024 20:01

viduni94 requested a review from neptunian October 26, 2024 01:08

viduni94 added 5 commits October 30, 2024 08:36

[Obs AI Assistant] Unskip test (elastic#192222)

8082223

[Obs AI Assistant] Add a delay to let the data settle before running …

4dba274

…the tests to avoid any data inconsistency (elastic#192222)

[Obs AI Assistant] Retry creating entries if any entry is missing (el…

21888c0

…astic#192222)

[Obs AI Assistant] Address PR comments (elastic#192222)

d59a1f6

[Obs AI Assistant] Fix type (elastic#192222)

7e6094f

viduni94 force-pushed the resolve-flaky-obs-ai-assistant-test-knowledge-based-user-instructions branch from f513303 to 7e6094f Compare October 30, 2024 12:36

sorenlouv approved these changes Oct 31, 2024

View reviewed changes

dgieselaar reviewed Oct 31, 2024

View reviewed changes

viduni94 added 2 commits October 31, 2024 12:35

[Obs AI Assistant] Update type any (elastic#192222)

89a864e

[Obs AI Assistant] Remove the retry statements (elastic#192222)

042b9ea

viduni94 requested a review from dgieselaar October 31, 2024 16:41

[Obs AI Assistant] Remove the retry statements (elastic#192222)

833db6d

dgieselaar approved these changes Oct 31, 2024

View reviewed changes

viduni94 merged commit a7a09f7 into elastic:main Oct 31, 2024
26 checks passed

kibanamachine added the v9.0.0 label Oct 31, 2024

kibanamachine mentioned this pull request Oct 31, 2024

[8.x] [Obs AI Assistant] Attempt to resolve flaky knowledge based user instructions test (#196026) #198612

Merged

kibanamachine added the v8.17.0 label Oct 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Obs AI Assistant] Attempt to resolve flaky knowledge based user instructions test #196026

[Obs AI Assistant] Attempt to resolve flaky knowledge based user instructions test #196026

viduni94 commented Oct 13, 2024 •

edited by kibanamachine

Loading

elasticmachine commented Oct 13, 2024

kibanamachine commented Oct 13, 2024

kibanamachine commented Oct 13, 2024

dgieselaar Oct 17, 2024 •

edited

Loading

viduni94 Oct 17, 2024 •

edited

Loading

neptunian Oct 25, 2024

viduni94 Oct 26, 2024

neptunian Oct 26, 2024 •

edited

Loading

viduni94 Oct 28, 2024

sorenlouv left a comment

dgieselaar Oct 31, 2024

dgieselaar Oct 31, 2024

viduni94 Oct 31, 2024

dgieselaar Oct 31, 2024

viduni94 Oct 31, 2024 •

edited

Loading

dgieselaar Oct 31, 2024

elasticmachine commented Oct 31, 2024 •

edited

Loading

kibanamachine commented Oct 31, 2024

kibanamachine commented Oct 31, 2024


		const instructions = res.body.userInstructions;

		const sortByDocId = (data: any) => sortBy(data, 'doc_id');

[Obs AI Assistant] Attempt to resolve flaky knowledge based user instructions test #196026

[Obs AI Assistant] Attempt to resolve flaky knowledge based user instructions test #196026

Conversation

viduni94 commented Oct 13, 2024 • edited by kibanamachine Loading

Summary

Problem

Solution

Checklist

elasticmachine commented Oct 13, 2024

kibanamachine commented Oct 13, 2024

Flaky Test Runner Stats

🎉 All tests passed! - kibana-flaky-test-suite-runner#7134

kibanamachine commented Oct 13, 2024

Flaky Test Runner Stats

🎉 All tests passed! - kibana-flaky-test-suite-runner#7135

dgieselaar Oct 17, 2024 • edited Loading

Choose a reason for hiding this comment

viduni94 Oct 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

neptunian Oct 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sorenlouv left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

viduni94 Oct 31, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elasticmachine commented Oct 31, 2024 • edited Loading

💚 Build Succeeded

Metrics [docs]

History

kibanamachine commented Oct 31, 2024

kibanamachine commented Oct 31, 2024

💚 All backports created successfully

Questions ?

viduni94 commented Oct 13, 2024 •

edited by kibanamachine

Loading

dgieselaar Oct 17, 2024 •

edited

Loading

viduni94 Oct 17, 2024 •

edited

Loading

neptunian Oct 26, 2024 •

edited

Loading

viduni94 Oct 31, 2024 •

edited

Loading

elasticmachine commented Oct 31, 2024 •

edited

Loading