Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom equation editor for Metric Threshold Rule #148732

Conversation

simianhacker
Copy link
Member

@simianhacker simianhacker commented Jan 11, 2023

Summary

This PR closes #145444 by adding a custom equation editor to the Metric Threshold rule. I also added support for custom metrics to the Metric Explorer API which powers the preview chart on the rule editor. Eventually we could do a follow up PR to the Metrics Explorer UI to expose this new functionality; which is outside the scope of this PR.

Notable changes with this PR

I changed the reason message for Metric Threshold rules which do not have a group by. The original message would say something like system.cpu.user.pct is 82% in the last 1 min for all hosts. Alert when > 81%. I removed the for all hosts portion because the Metric Threshold rule is not limited to just the concept of hosts, our users rely on this rule as their "Swiss Army Knife" rule for all types of data.

I also had to change the format of the currentPeriod bucket for the Metric Threshold aggregation to support the "document count with KQL filter" use case. One of the requirements of a filter aggregation is that it must be a child of a multi-bucket aggregation. This is why I converted it from a 'filter' aggregation to a filters aggregation with an all key for the time range query.

I added basic validation for the equations with a regular expression that just limits the characters to the allowable: A-Z, +, -, /, *, (, ), ?, !, &, :, |, >, <, =. I feel like for now this is good enough. If we want to expose some of the Painless Math.* libraries then we can follow up in a later release with a PegJS parser which would do some syntax validation as well.

Rule with custom equation

image

Rule with custom ratio equation

image

Reason message with custom label

image

@simianhacker simianhacker changed the title Adding Custom Metrics (with math) to Metric Threshold Rule Custom Metrics (with math) for Metric Threshold Rule Jan 11, 2023
@StephanErb
Copy link

Would you intend that this shall have the full power of Lens Formular's or do yo have a different scope in mind? (https://www.elastic.co/guide/en/kibana/current/lens.html#lens-formulas )

@simianhacker simianhacker changed the title Custom Metrics (with math) for Metric Threshold Rule Custom equation editor for Metric Threshold Rule Jan 19, 2023
@simianhacker
Copy link
Member Author

simianhacker commented Jan 19, 2023

@StephanErb We are exploring ideas for an expressive rule type in the future which will be based on something like the Len's expressions or ESQL. If and when we deliver that, it will be a separate rule type.

@simianhacker simianhacker marked this pull request as ready for review January 25, 2023 22:44
@simianhacker simianhacker requested a review from a team as a code owner January 25, 2023 22:44
@simianhacker
Copy link
Member Author

@maryam-saeidi @crespocarlos Yes, it's expected that Document count's filter does not need a value. That way you can defined 2 "Document count" aggregation, one with a filter and one without. This enables the user to do "filter ratios" where they might want to alert on error ratios where the numerator is event.type: error and the denominator is "everything" (blank filter).

@simianhacker
Copy link
Member Author

I had a Zoom with @maryam-saeidi and we figured out how to reproduce the validation error she was seeing:

  1. Create a new rule, give it a title and select "Metric Threshold"
  2. Change the condition to `WHEN average OF system.cpu.user.pct IS ABOVE 80"
  3. Change average to custom equation
  4. Configure a custom equation
  5. Save
  6. You should see a validation error.

Copy link
Contributor

@crespocarlos crespocarlos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! thanks for applying the suggestions and fixing the bugs.

Copy link
Member

@maryam-saeidi maryam-saeidi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validation when changing condition from average to custom works but it does not work the other way around
image

I added some comments as well.

import { validateMetricThreshold } from '../validation';

export default {
title: 'infra/alerting/CustomEquationEditor',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😍

x-pack/plugins/infra/server/lib/metrics/index.ts Outdated Show resolved Hide resolved
@simianhacker
Copy link
Member Author

@maryam-saeidi Thanks for the review! I've addressed all the issue you found. Please give it another review, thanks in advance.

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
infra 1242 1248 +6

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
infra 1.3MB 1.3MB +7.3KB

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
infra 85.9KB 87.0KB +1.0KB
Unknown metric groups

async chunk count

id before after diff
infra 18 19 +1

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

Copy link
Member

@maryam-saeidi maryam-saeidi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally, and it worked as expected; fantastic job! 🎉

Just to double-check something here, when I test on a fresh instance everything is working fine, but when I use my oblt instance, I get the following error:

image

Is it because of some recent changes in this PR, so the rule that I created with the earlier version of this PR is not valid anymore? What can cause this issue? (Maybe it is not related to this PR at all ¯_(ツ)_/¯)

@simianhacker
Copy link
Member Author

@maryam-saeidi That's an error in a different rule:

image

export const LOG_DOCUMENT_COUNT_RULE_TYPE_ID = 'logs.alert.document.count';

@simianhacker simianhacker merged commit 6e70bdb into elastic:main Feb 1, 2023
@kibanamachine kibanamachine added the backport:skip This commit does not require backporting label Feb 1, 2023
kqualters-elastic pushed a commit to kqualters-elastic/kibana that referenced this pull request Feb 6, 2023
## Summary

This PR closes elastic#145444 by adding a custom equation editor to the Metric
Threshold rule. I also added support for custom metrics to the Metric
Explorer API which powers the preview chart on the rule editor.
Eventually we could do a follow up PR to the Metrics Explorer UI to
expose this new functionality; which is outside the scope of this PR.

### Notable changes with this PR

I changed the reason message for Metric Threshold rules which do not
have a group by. The original message would say something like
`system.cpu.user.pct is 82% in the last 1 min for all hosts. Alert when
> 81%.` I removed the `for all hosts` portion because the Metric
Threshold rule is not limited to just the concept of hosts, our users
rely on this rule as their "Swiss Army Knife" rule for all types of
data.

I also had to change the format of the `currentPeriod` bucket for the
Metric Threshold aggregation to support the "document count with KQL
filter" use case. One of the requirements of a `filter` aggregation is
that it must be a child of a multi-bucket aggregation. This is why I
converted it from a 'filter' aggregation to a `filters` aggregation with
an `all` key for the time range query.

I added basic validation for the equations with a regular expression
that just limits the characters to the allowable: `A-Z, +, -, /, *, (,
), ?, !, &, :, |, >, <, =`. I feel like for now this is good enough. If
we want to expose some of the Painless `Math.*` libraries then we can
follow up in a later release with a PegJS parser which would do some
syntax validation as well.

### Rule with custom equation

<img width="538" alt="image"
src="https://user-images.githubusercontent.com/41702/213583128-1adbc405-828e-4571-aeb4-9900baeaabee.png">

### Rule with custom ratio equation

<img width="538" alt="image"
src="https://user-images.githubusercontent.com/41702/213583239-a39d15d2-7023-4daf-af97-cb25a9965433.png">


### Reason message with custom label


![image](https://user-images.githubusercontent.com/41702/211936062-4b696f0c-dfec-4e48-b89c-b0462fb5f7f0.png)

---------

Co-authored-by: Carlos Crespo <crespocarlos@users.noreply.github.com>
Co-authored-by: Maryam Saeidi <maryam.saeidi@elastic.co>
@simianhacker simianhacker deleted the issue-145444-custom-metric-editor-for-mtr branch April 17, 2024 15:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting Feature:Metrics UI Metrics UI feature release_note:enhancement Team: Actionable Observability - DEPRECATED For Observability Alerting and SLOs use "Team:obs-ux-management", for AIops "Team:obs-knowledge" Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services v8.7.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Alerting framework: Ability to do multiple queries and arithmetics between Metric threshold's metric values
7 participants