Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Metrics UI] Add integration tests for Metric Threshold Rule and refactor to fire correctly #109971

Merged
merged 9 commits into from
Sep 2, 2021

Conversation

simianhacker
Copy link
Member

@simianhacker simianhacker commented Aug 24, 2021

Summary

This PR adds integration tests along with a refactor to the Metric Threshold Alerts so that they fire as expected. Fixes #110912

Data and Expectations

For this PR I've added a new data set under x-pack/test/functional/es_archives/infra/alerts_test_data that contains 2 real world examples. The first is a gauge style example where the data is indexed every 5 minutes for two different environments, dev and prod. The values for dev are all set to 0 and the values for prod range from 0 to 3 randomly. The very last event looks like this:

{ "@timestamp": "2021-01-01T01:00:00Z", "value": 1, "env": "prod" }

The expectation is that if the rule ran at 2021-01-01T01:00:00Z with the following criteria:

{
  timeSize: 5,
  timeUnit: 'm',
  threshold: [1],
  comparator: Comparator.GT_OR_EQ,
  aggType: Aggregators.SUM,
  metric: 'value',
}

It should trigger a shouldFire event for the prod environment. Here is the same data in a chart with the threshold line.

image

This failed on the current implementation because it was adding a timeUnit to the end value of the time frame, then moving the end to the start of the timeUnit. Also, the date_range bucket was being offset by 60s which also contributed the situation where it would miss the bucket.

This PR removed the end modification for both gauge and counter style queries. For gauge style queries, the date_range bucketing has been removed. The date_range bucketing was problematic because it uses the following logic @timestamp >= from && @timestamp < to which excluded the last event (which is included in the range query), which we expect to trigger the rule.

The implementation in this PR now relies on the range in the query along with a basic metric aggregation. For gauge style rules, it will only query the same amount of data that is expressed with timeUnit and timeSize; for the example above, that would be the last 5 minutes.

For counter style rules that uses Aggregators.RATE, everything should work the same as before and only trigger on the last "full" bucket. The chart below is from the second data set and has been confirmed with an integration test.

image

Checklist

Glossary

  • gauge style metrics are queries that use basic metric aggregations like sum, avg, max, min, percentile to aggregate values across multiple documents. What makes a metric a gauge is that the value is a snapshot in time. For example system.cpu.total.pct is the total CPU usage being used at that point in time.
  • counter style metrics are monotonically increasing numbers where the value is a cumulative sum over time. To find the value at a point in time you have to typically use derivative pipeline aggregation or rate (which is derivative(max(value))).

@simianhacker simianhacker changed the title [Metrics UI] Add integration tests for Metric Threshold Alerts and refactor to correctly [Metrics UI] Add integration tests for Metric Threshold Alerts and refactor to fire correctly Aug 24, 2021
@simianhacker simianhacker changed the title [Metrics UI] Add integration tests for Metric Threshold Alerts and refactor to fire correctly [Metrics UI] Add integration tests for Metric Threshold Rule and refactor to fire correctly Aug 24, 2021
@simianhacker simianhacker added Feature:Metrics UI Metrics UI feature Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services v8.0.0 v7.16.0 auto-backport Deprecated - use backport:version if exact versions are needed release_note:fix labels Aug 25, 2021
@simianhacker simianhacker marked this pull request as ready for review August 26, 2021 14:43
@simianhacker simianhacker requested a review from a team as a code owner August 26, 2021 14:43
@elasticmachine
Copy link
Contributor

Pinging @elastic/logs-metrics-ui (Team:logs-metrics-ui)

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

Copy link
Member

@jasonrhodes jasonrhodes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for writing tests

@neptunian
Copy link
Contributor

Tested the branch and was able to get metric threshold alerts to trigger for both metric types.

@simianhacker simianhacker merged commit 0452483 into elastic:master Sep 2, 2021
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Sep 2, 2021
…ctor to fire correctly (elastic#109971)

* [Metrics UI] Add integration tests for Metric Threshold and refactor to fire correctly

* Removing unused variables

* Fixing tests for metric_threshold_executor

* Fixing test for metric_query

* fixing test

* Changing type guard
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Sep 2, 2021
…ctor to fire correctly (elastic#109971)

* [Metrics UI] Add integration tests for Metric Threshold and refactor to fire correctly

* Removing unused variables

* Fixing tests for metric_threshold_executor

* Fixing test for metric_query

* fixing test

* Changing type guard
@kibanamachine
Copy link
Contributor

💚 Backport successful

Status Branch Result
7.15
7.x

The backport PRs will be merged automatically after passing CI.

kibanamachine added a commit that referenced this pull request Sep 2, 2021
…ctor to fire correctly (#109971) (#111012)

* [Metrics UI] Add integration tests for Metric Threshold and refactor to fire correctly

* Removing unused variables

* Fixing tests for metric_threshold_executor

* Fixing test for metric_query

* fixing test

* Changing type guard

Co-authored-by: Chris Cowan <chris@chriscowan.us>
jloleysens added a commit to jloleysens/kibana that referenced this pull request Sep 3, 2021
…eporting-to-v2

* 'master' of github.com:elastic/kibana: (65 commits)
  Move to vis_types folder part 2 (elastic#110574)
  [SOR] use initialNamespaces when checking for conflict for `create` and `bulkCreate` (elastic#111023)
  [Discover] Remove export* syntax (elastic#110934)
  [Event log][7.x] Updated event log client to search across legacy IDs (elastic#109365)
  [Security Solution][Detection Rules] Changes 'activated' text on rule details page  (elastic#111044)
  [Metrics UI] Filter out APM nodes from the inventory view (elastic#110300)
  [package testing] Update logging and pid configuration (elastic#111059)
  [Dashboard] Read App State from URL on Soft Refresh (elastic#109354)
  Add correct roles to test user for functional tests in dashboard (elastic#110880)
  [DOCS] Adds Lens Inspector and minor edits (elastic#109736)
  [DOCS] Updates Spaces page (elastic#111005)
  normalize initialNamespaces (elastic#110936)
  [Reporting] Clean up `any` usage, reorganize server route files (elastic#110740)
  [Security Solution] [CTI] Fixes bug that caused Threshold and Indicator Match rules to ignore custom rule filters if a saved query was used in the rule definition. (elastic#109253)
  skip flaky suites: elastic#111001, elastic#111022
  [Security Solution][RAC] - Update reason field text (elastic#110308)
  [RAC][Security Solution] Make analyzer work with EuiDataGrid full screen (elastic#110913)
  [Metrics UI] Add integration tests for Metric Threshold Rule and refactor to fire correctly (elastic#109971)
  [DOCS] Updates Discover docs (elastic#110346)
  [RAC] Persistent timeline fields fix (elastic#110685)
  ...
@kibanamachine kibanamachine added the backport missing Added to PRs automatically when the are determined to be missing a backport. label Sep 3, 2021
@kibanamachine
Copy link
Contributor

Looks like this PR has backport PRs but they still haven't been merged. Please merge them ASAP to keep the branches relatively in sync.

@kibanamachine kibanamachine removed the backport missing Added to PRs automatically when the are determined to be missing a backport. label Sep 3, 2021
kibanamachine added a commit that referenced this pull request Sep 3, 2021
…ctor to fire correctly (#109971) (#111011)

* [Metrics UI] Add integration tests for Metric Threshold and refactor to fire correctly

* Removing unused variables

* Fixing tests for metric_threshold_executor

* Fixing test for metric_query

* fixing test

* Changing type guard

Co-authored-by: Chris Cowan <chris@chriscowan.us>
@simianhacker simianhacker deleted the metric-threshold-test branch April 17, 2024 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Deprecated - use backport:version if exact versions are needed Feature:Metrics UI Metrics UI feature release_note:fix Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services v7.15.0 v7.16.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Metrics UI] Metrics Threshold Alert fails to trigger
5 participants