-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] AIOps: Functional/API integration tests for text field support for log rate analysis #168177
[ML] AIOps: Functional/API integration tests for text field support for log rate analysis #168177
Conversation
b017dce
to
b401f9d
Compare
b401f9d
to
29646a6
Compare
29646a6
to
aa0ec35
Compare
Flaky test runner: https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/3397 🔴 1/50 runs failed |
aa0ec35
to
e3ca450
Compare
Flaky test runner: https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/3411 ✅ 100/100 runs passed |
Pinging @elastic/ml-ui (:ml) |
fieldValue, | ||
})), | ||
category.fieldName, | ||
{ key: `${category.key}`, count: category.doc_count, examples: [] }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a bug I've recently discovered where the query created from the category key will not match any documents.
I will create an issue for it, but I don't immediately know how to fix it.
Basically, from what have seen, if the document has a string that looks like this foo:bar
the category key might look like this foo bar
. These will not match in the query.
There is a risk that changing the query will cause it to become more greedy and match documents which aren't in the category.
Using the regex from the category would probably work, but could be very expensive, and may not work at all if the cluster disallows expensive queries.
This comment is just a heads up that you may get a 0 count here, depending on the data.
x-pack/plugins/aiops/server/routes/queries/get_simple_hierarchical_tree.ts
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Code changes LGTM 🎉 Minor nit comment to rename |
💛 Build succeeded, but was flaky
Failed CI StepsTest Failures
Metrics [docs]
History
To update your PR or re-run it, just comment with: cc @walterra |
…or log rate analysis (elastic#168177) This updates the artificial dataset generator for log rate analysis to allow to create variants including text fields. The artificial dataset is now used for 4 variants of functional and API integration tests: Testing spike and dip with both with and without a text field. The new tests surfaced some issues that were fixed as part of this PR: - Getting the counts of log patterns in combination with individual significant terms ended up with to granular groups. This PR adds additional queries to get counts for log patterns in combination with item sets already derived from significant terms. - The `support` value is returned by the frequent item sets agg and is used as a threshold whether to include an item set for grouping. This was missing from significant log patterns and is fixed by this PR. - Adds a check to not get frequent item sets for log patterns if there are no significant terms. - The way we fetched log patterns using a time filter that spans the whole of the baseline start to the deviation end caused problems with analysing dips. This PR updates those queries to only fetch the actual baseline and deviation time range. - The integration tests caught an issue where we'd still fetch the histogram for log patterns even if we'd request grouping information only. (cherry picked from commit 9259f48)
💚 All backports created successfully
Note: Successful backport PRs will be merged automatically after passing CI. Questions ?Please refer to the Backport tool documentation |
…pport for log rate analysis (#168177) (#168516) # Backport This will backport the following commits from `main` to `8.11`: - [[ML] AIOps: Functional/API integration tests for text field support for log rate analysis (#168177)](#168177) <!--- Backport version: 8.9.7 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT [{"author":{"name":"Walter Rafelsberger","email":"walter.rafelsberger@elastic.co"},"sourceCommit":{"committedDate":"2023-10-10T17:24:08Z","message":"[ML] AIOps: Functional/API integration tests for text field support for log rate analysis (#168177)\n\nThis updates the artificial dataset generator for log rate analysis to\r\nallow to create variants including text fields.\r\nThe artificial dataset is now used for 4 variants of functional and API\r\nintegration tests: Testing spike and dip with both with and without a\r\ntext field.\r\n\r\nThe new tests surfaced some issues that were fixed as part of this PR:\r\n\r\n- Getting the counts of log patterns in combination with individual\r\nsignificant terms ended up with to granular groups. This PR adds\r\nadditional queries to get counts for log patterns in combination with\r\nitem sets already derived from significant terms.\r\n- The `support` value is returned by the frequent item sets agg and is\r\nused as a threshold whether to include an item set for grouping. This\r\nwas missing from significant log patterns and is fixed by this PR.\r\n- Adds a check to not get frequent item sets for log patterns if there\r\nare no significant terms.\r\n- The way we fetched log patterns using a time filter that spans the\r\nwhole of the baseline start to the deviation end caused problems with\r\nanalysing dips. This PR updates those queries to only fetch the actual\r\nbaseline and deviation time range.\r\n- The integration tests caught an issue where we'd still fetch the\r\nhistogram for log patterns even if we'd request grouping information\r\nonly.","sha":"9259f4836e12ab5ddd2220f1523d68e98944cad8","branchLabelMapping":{"^v8.12.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["bug",":ml","release_note:skip","Feature:ML/AIOps","v8.11.0","v8.12.0"],"number":168177,"url":"https://github.com/elastic/kibana/pull/168177","mergeCommit":{"message":"[ML] AIOps: Functional/API integration tests for text field support for log rate analysis (#168177)\n\nThis updates the artificial dataset generator for log rate analysis to\r\nallow to create variants including text fields.\r\nThe artificial dataset is now used for 4 variants of functional and API\r\nintegration tests: Testing spike and dip with both with and without a\r\ntext field.\r\n\r\nThe new tests surfaced some issues that were fixed as part of this PR:\r\n\r\n- Getting the counts of log patterns in combination with individual\r\nsignificant terms ended up with to granular groups. This PR adds\r\nadditional queries to get counts for log patterns in combination with\r\nitem sets already derived from significant terms.\r\n- The `support` value is returned by the frequent item sets agg and is\r\nused as a threshold whether to include an item set for grouping. This\r\nwas missing from significant log patterns and is fixed by this PR.\r\n- Adds a check to not get frequent item sets for log patterns if there\r\nare no significant terms.\r\n- The way we fetched log patterns using a time filter that spans the\r\nwhole of the baseline start to the deviation end caused problems with\r\nanalysing dips. This PR updates those queries to only fetch the actual\r\nbaseline and deviation time range.\r\n- The integration tests caught an issue where we'd still fetch the\r\nhistogram for log patterns even if we'd request grouping information\r\nonly.","sha":"9259f4836e12ab5ddd2220f1523d68e98944cad8"}},"sourceBranch":"main","suggestedTargetBranches":["8.11"],"targetPullRequestStates":[{"branch":"8.11","label":"v8.11.0","labelRegex":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"main","label":"v8.12.0","labelRegex":"^v8.12.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/168177","number":168177,"mergeCommit":{"message":"[ML] AIOps: Functional/API integration tests for text field support for log rate analysis (#168177)\n\nThis updates the artificial dataset generator for log rate analysis to\r\nallow to create variants including text fields.\r\nThe artificial dataset is now used for 4 variants of functional and API\r\nintegration tests: Testing spike and dip with both with and without a\r\ntext field.\r\n\r\nThe new tests surfaced some issues that were fixed as part of this PR:\r\n\r\n- Getting the counts of log patterns in combination with individual\r\nsignificant terms ended up with to granular groups. This PR adds\r\nadditional queries to get counts for log patterns in combination with\r\nitem sets already derived from significant terms.\r\n- The `support` value is returned by the frequent item sets agg and is\r\nused as a threshold whether to include an item set for grouping. This\r\nwas missing from significant log patterns and is fixed by this PR.\r\n- Adds a check to not get frequent item sets for log patterns if there\r\nare no significant terms.\r\n- The way we fetched log patterns using a time filter that spans the\r\nwhole of the baseline start to the deviation end caused problems with\r\nanalysing dips. This PR updates those queries to only fetch the actual\r\nbaseline and deviation time range.\r\n- The integration tests caught an issue where we'd still fetch the\r\nhistogram for log patterns even if we'd request grouping information\r\nonly.","sha":"9259f4836e12ab5ddd2220f1523d68e98944cad8"}}]}] BACKPORT--> Co-authored-by: Walter Rafelsberger <walter.rafelsberger@elastic.co>
…or log rate analysis (elastic#168177) This updates the artificial dataset generator for log rate analysis to allow to create variants including text fields. The artificial dataset is now used for 4 variants of functional and API integration tests: Testing spike and dip with both with and without a text field. The new tests surfaced some issues that were fixed as part of this PR: - Getting the counts of log patterns in combination with individual significant terms ended up with to granular groups. This PR adds additional queries to get counts for log patterns in combination with item sets already derived from significant terms. - The `support` value is returned by the frequent item sets agg and is used as a threshold whether to include an item set for grouping. This was missing from significant log patterns and is fixed by this PR. - Adds a check to not get frequent item sets for log patterns if there are no significant terms. - The way we fetched log patterns using a time filter that spans the whole of the baseline start to the deviation end caused problems with analysing dips. This PR updates those queries to only fetch the actual baseline and deviation time range. - The integration tests caught an issue where we'd still fetch the histogram for log patterns even if we'd request grouping information only.
Summary
Part of #167467 and #164562.
This updates the artificial dataset generator for log rate analysis to allow to create variants including text fields.
The artificial dataset is now used for 4 variants of functional and API integration tests: Testing spike and dip with both with and without a text field.
The new tests surfaced some issues that were fixed as part of this PR:
support
value is returned by the frequent item sets agg and is used as a threshold whether to include an item set for grouping. This was missing from significant log patterns and is fixed by this PR.Previous:
After:
Checklist