Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Security Solution][Alerts] Alert suppression per rule execution #142686

Merged
merged 89 commits into from
Nov 15, 2022

Conversation

marshallmain
Copy link
Contributor

@marshallmain marshallmain commented Oct 4, 2022

Summary

Addresses #130699

This PR implements alert throttling per rule execution for query and saved query rules. The implementation is very similar in concept to threshold rules. We allow users to pick one or more fields to group source documents by and use a composite aggregation to collect documents bucketed by those fields. We create 1 alert for each bucket based on the first document in the bucket and add metadata to the alert that represents how to retrieve the rest of the documents in the bucket.

The metadata fields are:

  • kibana.alert.suppression.terms: Array<{field: string; value: string | number | null}> An array of objects, each object represents one of the terms used to group these alerts
  • kibana.alert.suppression.start: Date The timestamp of the first document in the bucket
  • kibana.alert.suppression.end: Date The timestamp of the last document in the bucket
  • kibana.alert.suppression.docs_count: number The number of suppressed alerts

There is one new rule parameter, currently implemented at the solution level, to enable this feature: alertSuppression.groupBy: string[].

Similar to threshold rules, the throttled query rules keep track of created alerts in the rule state in order to filter out duplicate documents in subsequent rule executions. When a throttled alert is created, we store the bucket information including field names, values, and end date in the rule state. Subsequent rule executions convert this state into a filter that excludes documents that have already been covered by existing alerts. This is necessary because consecutive rule executions will typically query overlapping time ranges.

Licensing

Alert suppression is licensed as a platinum-level feature. However, the rule management APIs do not prevent the suppression configuration fields from being set without a platinum license. Instead, the license is checked at rule execution time: if the license is insufficient, then the suppression configuration is ignored and the rule executes without suppressing alerts. This avoids potential issues with rule management and license downgrades - if a customer has rules enabled with suppression and the license expires/downgrades, they are not prevented from exporting/importing/editing their rules in any way, but the suppression will no longer be applied until the license is reinstated.

We check the license in the UI and display the suppression configuration accordingly. If the license is insufficient, then in rule creation the suppression configuration is disabled. When editing a rule, if the license is insufficient and the rule does not already have suppression configured, then suppression configuration is also disabled. If the license is insufficient but a rule does have suppression configured already, then we allow editing the suppression configuration in the UI.

The rule details page shows an indicator if suppression is configured and the license is insufficient to notify users that the suppression configuration will not be applied during rule execution.

Screenshots

Rule Create/Edit With License

image

Rule Details With License

image

Rule Create, or Rule Edit of a rule without existing suppression configuration, Without License

image

Editing a rule that has existing suppression configuration, but without the correct license, still allows changing the configuration (to allow removing the params)

image

Rule Details Without License

image

Alerts table

image

Known issues

  • The layers icon in the rule name for suppressed alerts does not show up in the rule preview table

madirey added 30 commits July 26, 2022 20:24
Copy link
Contributor

@dhurley14 dhurley14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

platform changes look good

Copy link
Contributor

@vitaliidm vitaliidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, @marshallmain. This feature will bring more capabilities to our customers!
Thank you for addressing my comments.

I left a few minor as well, that can be addressed later. The only big question I have it's regarding licensing. What's the plan on it? Will it be added in 8.6 release? As currently it's only handled on UI

<>
{label}&nbsp;
<EuiToolTip position="top" content={i18n.ALERT_SUPPRESSION_INSUFFICIENT_LICENSE}>
<EuiIcon type={'alert'} size="l" color="#BD271E" />
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use value for colour from eui theme?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this should come from the theme

})
);

export const minimumLicenseForSuppression = 'platinum';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see check for license is only present in UI.
What about having it on API level of Security and Platform? Is there a consideration to have it in 8.6?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched from validating the params on create/edit of a rule to checking the license at runtime in the executor to avoid edge cases around license expiration/degradation. E.g. if the license expires, rules would be exportable but then would fail to import, patch would allow editing a rule while leaving suppression params untouched but update would not, etc.

The behavior now is that if a license is insufficient and suppression is enabled then rules will continue to work, but the suppression configuration will be ignored. If the platinum license is restored, then suppression will start to work again on those rules automatically. This way we can keep the functionality behind the license appropriately but not cause issues with rule management if the license level does drop.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining, that makes sense 👍

I switched from validating the params on create/edit of a rule to checking the license at runtime in the executor to avoid edge cases around license expiration/degradation

Btw, can we use there minimumLicenseForSuppression instead of hardcoded platinum value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, yeah that should be minimumLicenseForSuppression as well. I'll add it to the follow up list.

[ALERT_SUPPRESSION_TERMS]: bucket.terms,
[ALERT_SUPPRESSION_START]: bucket.start,
[ALERT_SUPPRESSION_END]: bucket.end,
[ALERT_SUPPRESSION_DOCS_COUNT]: bucket.count - 1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why bucket.count need to be decremented?

Screenshot 2022-11-15 at 10 00 37

Is an idea behind it: if there is only one document, we still generate an alert, but display count as 0, because no other documents exist to be actually suppressed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the idea is to display here the number of alerts that were not created due to suppression, so we subtract 1 because the created alert is included in bucket.count as well.

});
});

it('should not count documents that were covered by previous alerts', async () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to test state involved executions through actual rule run, as it involves storing state in task manager and operating with it.
As for preview, it's stored in memory during preview execution, so we don't have full coverage of this functionality

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, will add in a follow up

@xcrzx xcrzx requested review from xcrzx and removed request for maximpn November 15, 2022 12:55
Copy link
Contributor

@xcrzx xcrzx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code changes in the rules area LGTM! While testing locally found a couple of minor UI bugs, though.

Values should be comma separated:
Screenshot 2022-11-15 at 15 46 52

Long values overflow outside of the screen:
Screenshot 2022-11-15 at 15 44 11

The total number of alerts displayed on the chart (11) doesn't match the total number of alerts in the table (9). Not sure where that difference comes from. Probably not related to this PR at all.

Screenshot 2022-11-15 at 15 27 01

@marshallmain
Copy link
Contributor Author

Values should be comma separated:

Good point, updated to handle the array of values correctly (and updated screenshots in the PR description to match).

The total number of alerts displayed on the chart (11) doesn't match the total number of alerts in the table (9). Not sure where that difference comes from.

I've seen this before if one event has multiple event.category values, as it gets counted multiple times in the histogram then -was that the case for your test data?

@kibana-ci
Copy link
Collaborator

kibana-ci commented Nov 15, 2022

💚 Build Succeeded

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
securitySolution 3286 3290 +4

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
@kbn/rule-data-utils 83 89 +6

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
observability 487.7KB 488.3KB +582.0B
securitySolution 9.6MB 9.6MB +20.1KB
triggersActionsUi 663.6KB 665.6KB +2.0KB
total +22.7KB

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
apm 29.9KB 30.1KB +169.0B
cases 126.7KB 126.8KB +169.0B
infra 85.6KB 85.7KB +169.0B
synthetics 25.1KB 25.3KB +173.0B
timelines 138.8KB 139.0KB +169.0B
total +849.0B
Unknown metric groups

API count

id before after diff
@kbn/rule-data-utils 86 92 +6

ESLint disabled in files

id before after diff
osquery 1 2 +1

ESLint disabled line counts

id before after diff
enterpriseSearch 19 21 +2
fleet 59 65 +6
osquery 108 113 +5
securitySolution 441 447 +6
total +19

Total ESLint disabled count

id before after diff
enterpriseSearch 20 22 +2
fleet 67 73 +6
osquery 109 115 +6
securitySolution 518 524 +6
total +20

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@xcrzx
Copy link
Contributor

xcrzx commented Nov 15, 2022

I've seen this before if one event has multiple event.category values, as it gets counted multiple times in the histogram then -was that the case for your test data?

Oh, yeah, that was the reason. Then I think it is okay to have one event counted multiple times.

And thanks for fixing the layout issues 👍

Copy link
Contributor

@mikecote mikecote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rule registry changes look good to me at this time! We'll meet later this week with y'all to discuss proposed changes but we won't block this PR from merging.

@marshallmain marshallmain merged commit a2647ab into elastic:main Nov 15, 2022
@kibanamachine kibanamachine added the backport:skip This commit does not require backporting label Nov 15, 2022
marshallmain added a commit that referenced this pull request Nov 28, 2022
…ression and new terms multi fields (#145775)

## Summary

Adds new tour highlighting new rule capabilities in 8.6 - new terms
multi fields (#143943) and alert
suppression (#142686).

I tried using the generic `RulesFeatureTour` again
(main...marshallmain:kibana:failed-tour)
but it still crashes the page.

I also looked at integrating this tour with the Guided onboarding tour
for rules management (#145223),
but concluded that they should be separate since guided onboarding is
experimental and this tour should be displayed to users even if they are
not new users.

This PR is essentially a copy of the new terms tour in 8.4
(#138469).
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Nov 28, 2022
…ression and new terms multi fields (elastic#145775)

## Summary

Adds new tour highlighting new rule capabilities in 8.6 - new terms
multi fields (elastic#143943) and alert
suppression (elastic#142686).

I tried using the generic `RulesFeatureTour` again
(elastic/kibana@main...marshallmain:kibana:failed-tour)
but it still crashes the page.

I also looked at integrating this tour with the Guided onboarding tour
for rules management (elastic#145223),
but concluded that they should be separate since guided onboarding is
experimental and this tour should be displayed to users even if they are
not new users.

This PR is essentially a copy of the new terms tour in 8.4
(elastic#138469).

(cherry picked from commit 13c1b0b)
kibanamachine referenced this pull request Nov 29, 2022
…r suppression and new terms multi fields (#145775) (#146479)

# Backport

This will backport the following commits from `main` to `8.6`:
- [[Security Solution][Alerts] Add tour to rule management page for
suppression and new terms multi fields
(#145775)](#145775)

<!--- Backport version: 8.9.7 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Marshall
Main","email":"55718608+marshallmain@users.noreply.github.com"},"sourceCommit":{"committedDate":"2022-11-28T21:35:02Z","message":"[Security
Solution][Alerts] Add tour to rule management page for suppression and
new terms multi fields (#145775)\n\n## Summary\r\n\r\nAdds new tour
highlighting new rule capabilities in 8.6 - new terms\r\nmulti fields
(#143943) and alert\r\nsuppression
(https://github.com/elastic/kibana/pull/142686).\r\n\r\nI tried using
the generic `RulesFeatureTour`
again\r\n(https://github.com/elastic/kibana/compare/main...marshallmain:kibana:failed-tour)\r\nbut
it still crashes the page.\r\n\r\nI also looked at integrating this tour
with the Guided onboarding tour\r\nfor rules management
(https://github.com/elastic/kibana/pull/145223),\r\nbut concluded that
they should be separate since guided onboarding is\r\nexperimental and
this tour should be displayed to users even if they are\r\nnot new
users.\r\n\r\nThis PR is essentially a copy of the new terms tour in
8.4\r\n(https://github.com/elastic/kibana/pull/138469).","sha":"13c1b0b863b7d8b324d33f2aaf45d90d5c8c108e","branchLabelMapping":{"^v8.7.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","Team:
SecuritySolution","Team:Detection
Alerts","v8.6.0","v8.7.0"],"number":145775,"url":"https://github.com/elastic/kibana/pull/145775","mergeCommit":{"message":"[Security
Solution][Alerts] Add tour to rule management page for suppression and
new terms multi fields (#145775)\n\n## Summary\r\n\r\nAdds new tour
highlighting new rule capabilities in 8.6 - new terms\r\nmulti fields
(#143943) and alert\r\nsuppression
(https://github.com/elastic/kibana/pull/142686).\r\n\r\nI tried using
the generic `RulesFeatureTour`
again\r\n(https://github.com/elastic/kibana/compare/main...marshallmain:kibana:failed-tour)\r\nbut
it still crashes the page.\r\n\r\nI also looked at integrating this tour
with the Guided onboarding tour\r\nfor rules management
(https://github.com/elastic/kibana/pull/145223),\r\nbut concluded that
they should be separate since guided onboarding is\r\nexperimental and
this tour should be displayed to users even if they are\r\nnot new
users.\r\n\r\nThis PR is essentially a copy of the new terms tour in
8.4\r\n(https://github.com/elastic/kibana/pull/138469).","sha":"13c1b0b863b7d8b324d33f2aaf45d90d5c8c108e"}},"sourceBranch":"main","suggestedTargetBranches":["8.6"],"targetPullRequestStates":[{"branch":"8.6","label":"v8.6.0","labelRegex":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"main","label":"v8.7.0","labelRegex":"^v8.7.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/145775","number":145775,"mergeCommit":{"message":"[Security
Solution][Alerts] Add tour to rule management page for suppression and
new terms multi fields (#145775)\n\n## Summary\r\n\r\nAdds new tour
highlighting new rule capabilities in 8.6 - new terms\r\nmulti fields
(#143943) and alert\r\nsuppression
(https://github.com/elastic/kibana/pull/142686).\r\n\r\nI tried using
the generic `RulesFeatureTour`
again\r\n(https://github.com/elastic/kibana/compare/main...marshallmain:kibana:failed-tour)\r\nbut
it still crashes the page.\r\n\r\nI also looked at integrating this tour
with the Guided onboarding tour\r\nfor rules management
(https://github.com/elastic/kibana/pull/145223),\r\nbut concluded that
they should be separate since guided onboarding is\r\nexperimental and
this tour should be displayed to users even if they are\r\nnot new
users.\r\n\r\nThis PR is essentially a copy of the new terms tour in
8.4\r\n(https://github.com/elastic/kibana/pull/138469).","sha":"13c1b0b863b7d8b324d33f2aaf45d90d5c8c108e"}}]}]
BACKPORT-->

Co-authored-by: Marshall Main <55718608+marshallmain@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting ci:cloud-deploy Create or update a Cloud deployment release_note:feature Makes this part of the condensed release notes Team:Detection Alerts Security Detection Alerts Area Team Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. v8.6.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.