Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SIEM] Check ML Job status on ML Rule execution #61715

Merged
merged 14 commits into from
Mar 30, 2020

Conversation

rylnd
Copy link
Contributor

@rylnd rylnd commented Mar 28, 2020

Summary

The first leg of this PR (diff) adds the actual enhancement: checking of ML Jobs during rule execution, and setting of rule status appropriately. Things required to get that to work:

  • Add explicit, optional dependency on ML plugin
  • Expose mlClient from ML plugin in order to use e.g. ml.jobsServiceProvider

If necessary, I'm happy to open a separate PR against 7.7 with just that work.

However, our rule executor function is large (and contains multitudes?), and the additional ML logic pushed us over the eslint complexity limit, so the second half of this PR is moving logic out of that function, notably:

  • Extracting a ruleStatusSavedObjectsClient to handle SO calls
  • Extracting a ruleStatusService to encapsulate logic of creating/updating/GCing RuleStatuses
  • Adds consistent rule metadata to every status/log message

Checklist

For maintainers

rylnd added 12 commits March 27, 2020 19:05
And use it during rule execution as well.
This was unintentionally removed in a previous merge commit.
This allows dependent plugins to leverage the exposed services without
having to define their own ml paths, e.g. "ml.jobs"
These are pure functions and used on both the client and server.
This works, but unfortunately it pushes this executor function to a
complexity of 25. We're gonna refactor this next.
These are used on both the frontend and the backend, and can be shared.
RuleStatusService holds the logic for updating the current status as
well as adding an error status. It leverages a simple
RuleStatusSavedObjectClient to handle the communication with
SavedObjects.

This removes the need for our specialized 'writeError', 'writeGap', and
'writeSuccess' functions, which duplicated much of the rule status
logic and code. It also fixes a bug with gap failures, with should have
been treated the same as other failures.

NB that an error does not necessarily prevent the rule from running, as
in the case of a gap or an ML Job not running.

This also adds a buildRuleMessage helper to reduce the noise of
generating logs/messages, and to make them more consistent.
We're not awaiting here, so we can just return the promise.
We weren't doing anything async here, and in fact the returning of a
promise was causing a bug when we tried to spread it into our attributes
object.
This mapping could be done within the ruleStatusService, but it
lives outside it for now.

Also renames the object holding these values to the more general
'result,' as creationSuccess implies it always succeeds.
Only ruleStatusSavedObjectsClient receives a savedObjectsClient, the
other functions receive the ruleStatusSavedObjectsClient

* pluralizes savedObjects in ruleStatusSavedObjectsClient
* Backfills tests
@rylnd rylnd added Team:SIEM v8.0.0 release_note:skip Skip the PR/issue when compiling release notes v7.7.0 v7.8.0 labels Mar 28, 2020
@rylnd rylnd self-assigned this Mar 28, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/siem (Team:SIEM)

@rylnd
Copy link
Contributor Author

rylnd commented Mar 29, 2020

Opened #61753 against 7.7. Once that's merged, this can go into 7.x.

Copy link
Contributor

@dhurley14 dhurley14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - this cleans things up very nicely. Appreciate the wrappers around saved objects and creating a clean api for the rule status SO.

We were storing state in our RuleStatusClient, and consequently could
get into a situation where that state did not reflect reality, and we
would incorrectly try to delete a SavedObject that had already been
deleted.

Rather than try to store the _correct_ state in the service, we remove
state entirely and just fetch our statuses on each action.
@rylnd rylnd added the v7.7.0 label Mar 30, 2020
@rylnd rylnd marked this pull request as ready for review March 30, 2020 19:46
@rylnd rylnd requested review from a team as code owners March 30, 2020 19:46
@rylnd
Copy link
Contributor Author

rylnd commented Mar 30, 2020

@elasticmachine merge upstream

@rylnd
Copy link
Contributor Author

rylnd commented Mar 30, 2020

@dhurley14 I added a bugfix/simplification in 7cb5c2c if you could give this one more pass.

I encountered a bug where you can't actually create multiple error statuses if you've already got 5 failures, and the fix was much easier to do here than it would be on #61753, so I think we should get this in 7.7 instead.

Copy link
Contributor

@alvarezmelissa87 alvarezmelissa87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ⚡️

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@rylnd rylnd merged commit 8b31ce0 into elastic:master Mar 30, 2020
@rylnd rylnd deleted the check_job_status_on_ml_rule_execution branch March 30, 2020 21:35
rylnd added a commit to rylnd/kibana that referenced this pull request Mar 30, 2020
* Move isMlRule helper to a more general location

And use it during rule execution as well.

* Add error message back to rule error status

This was unintentionally removed in a previous merge commit.

* Expose mlClient as part of ML's Setup contract

This allows dependent plugins to leverage the exposed services without
having to define their own ml paths, e.g. "ml.jobs"

* Move ML Job predicates to common folder

These are pure functions and used on both the client and server.

* WIP: Check ML Job status on ML Rule execution

This works, but unfortunately it pushes this executor function to a
complexity of 25. We're gonna refactor this next.

* Move isMlRule and RuleType to common

These are used on both the frontend and the backend, and can be shared.

* Refactor Signal Rule executor to use RuleStatusService

RuleStatusService holds the logic for updating the current status as
well as adding an error status. It leverages a simple
RuleStatusSavedObjectClient to handle the communication with
SavedObjects.

This removes the need for our specialized 'writeError', 'writeGap', and
'writeSuccess' functions, which duplicated much of the rule status
logic and code. It also fixes a bug with gap failures, with should have
been treated the same as other failures.

NB that an error does not necessarily prevent the rule from running, as
in the case of a gap or an ML Job not running.

This also adds a buildRuleMessage helper to reduce the noise of
generating logs/messages, and to make them more consistent.

* Remove unneeded 'async' keywords

We're not awaiting here, so we can just return the promise.

* Make buildRuleStatusAttributes synchronous

We weren't doing anything async here, and in fact the returning of a
promise was causing a bug when we tried to spread it into our attributes
object.

* Fix incorrectly-named RuleStatus attributes

This mapping could be done within the ruleStatusService, but it
lives outside it for now.

Also renames the object holding these values to the more general
'result,' as creationSuccess implies it always succeeds.

* Move our rule message helpers to a separate file

Adds some tests, as well.

* Refactor how rule status objects interact

Only ruleStatusSavedObjectsClient receives a savedObjectsClient, the
other functions receive the ruleStatusSavedObjectsClient

* pluralizes savedObjects in ruleStatusSavedObjectsClient
* Backfills tests

* Handle adding multiple errors during a single rule execution

We were storing state in our RuleStatusClient, and consequently could
get into a situation where that state did not reflect reality, and we
would incorrectly try to delete a SavedObject that had already been
deleted.

Rather than try to store the _correct_ state in the service, we remove
state entirely and just fetch our statuses on each action.

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
rylnd added a commit to rylnd/kibana that referenced this pull request Mar 30, 2020
* Move isMlRule helper to a more general location

And use it during rule execution as well.

* Add error message back to rule error status

This was unintentionally removed in a previous merge commit.

* Expose mlClient as part of ML's Setup contract

This allows dependent plugins to leverage the exposed services without
having to define their own ml paths, e.g. "ml.jobs"

* Move ML Job predicates to common folder

These are pure functions and used on both the client and server.

* WIP: Check ML Job status on ML Rule execution

This works, but unfortunately it pushes this executor function to a
complexity of 25. We're gonna refactor this next.

* Move isMlRule and RuleType to common

These are used on both the frontend and the backend, and can be shared.

* Refactor Signal Rule executor to use RuleStatusService

RuleStatusService holds the logic for updating the current status as
well as adding an error status. It leverages a simple
RuleStatusSavedObjectClient to handle the communication with
SavedObjects.

This removes the need for our specialized 'writeError', 'writeGap', and
'writeSuccess' functions, which duplicated much of the rule status
logic and code. It also fixes a bug with gap failures, with should have
been treated the same as other failures.

NB that an error does not necessarily prevent the rule from running, as
in the case of a gap or an ML Job not running.

This also adds a buildRuleMessage helper to reduce the noise of
generating logs/messages, and to make them more consistent.

* Remove unneeded 'async' keywords

We're not awaiting here, so we can just return the promise.

* Make buildRuleStatusAttributes synchronous

We weren't doing anything async here, and in fact the returning of a
promise was causing a bug when we tried to spread it into our attributes
object.

* Fix incorrectly-named RuleStatus attributes

This mapping could be done within the ruleStatusService, but it
lives outside it for now.

Also renames the object holding these values to the more general
'result,' as creationSuccess implies it always succeeds.

* Move our rule message helpers to a separate file

Adds some tests, as well.

* Refactor how rule status objects interact

Only ruleStatusSavedObjectsClient receives a savedObjectsClient, the
other functions receive the ruleStatusSavedObjectsClient

* pluralizes savedObjects in ruleStatusSavedObjectsClient
* Backfills tests

* Handle adding multiple errors during a single rule execution

We were storing state in our RuleStatusClient, and consequently could
get into a situation where that state did not reflect reality, and we
would incorrectly try to delete a SavedObject that had already been
deleted.

Rather than try to store the _correct_ state in the service, we remove
state entirely and just fetch our statuses on each action.

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
rylnd added a commit that referenced this pull request Mar 31, 2020
* Move isMlRule helper to a more general location

And use it during rule execution as well.

* Add error message back to rule error status

This was unintentionally removed in a previous merge commit.

* Expose mlClient as part of ML's Setup contract

This allows dependent plugins to leverage the exposed services without
having to define their own ml paths, e.g. "ml.jobs"

* Move ML Job predicates to common folder

These are pure functions and used on both the client and server.

* WIP: Check ML Job status on ML Rule execution

This works, but unfortunately it pushes this executor function to a
complexity of 25. We're gonna refactor this next.

* Move isMlRule and RuleType to common

These are used on both the frontend and the backend, and can be shared.

* Refactor Signal Rule executor to use RuleStatusService

RuleStatusService holds the logic for updating the current status as
well as adding an error status. It leverages a simple
RuleStatusSavedObjectClient to handle the communication with
SavedObjects.

This removes the need for our specialized 'writeError', 'writeGap', and
'writeSuccess' functions, which duplicated much of the rule status
logic and code. It also fixes a bug with gap failures, with should have
been treated the same as other failures.

NB that an error does not necessarily prevent the rule from running, as
in the case of a gap or an ML Job not running.

This also adds a buildRuleMessage helper to reduce the noise of
generating logs/messages, and to make them more consistent.

* Remove unneeded 'async' keywords

We're not awaiting here, so we can just return the promise.

* Make buildRuleStatusAttributes synchronous

We weren't doing anything async here, and in fact the returning of a
promise was causing a bug when we tried to spread it into our attributes
object.

* Fix incorrectly-named RuleStatus attributes

This mapping could be done within the ruleStatusService, but it
lives outside it for now.

Also renames the object holding these values to the more general
'result,' as creationSuccess implies it always succeeds.

* Move our rule message helpers to a separate file

Adds some tests, as well.

* Refactor how rule status objects interact

Only ruleStatusSavedObjectsClient receives a savedObjectsClient, the
other functions receive the ruleStatusSavedObjectsClient

* pluralizes savedObjects in ruleStatusSavedObjectsClient
* Backfills tests

* Handle adding multiple errors during a single rule execution

We were storing state in our RuleStatusClient, and consequently could
get into a situation where that state did not reflect reality, and we
would incorrectly try to delete a SavedObject that had already been
deleted.

Rather than try to store the _correct_ state in the service, we remove
state entirely and just fetch our statuses on each action.

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
rylnd added a commit that referenced this pull request Mar 31, 2020
* Move isMlRule helper to a more general location

And use it during rule execution as well.

* Add error message back to rule error status

This was unintentionally removed in a previous merge commit.

* Expose mlClient as part of ML's Setup contract

This allows dependent plugins to leverage the exposed services without
having to define their own ml paths, e.g. "ml.jobs"

* Move ML Job predicates to common folder

These are pure functions and used on both the client and server.

* WIP: Check ML Job status on ML Rule execution

This works, but unfortunately it pushes this executor function to a
complexity of 25. We're gonna refactor this next.

* Move isMlRule and RuleType to common

These are used on both the frontend and the backend, and can be shared.

* Refactor Signal Rule executor to use RuleStatusService

RuleStatusService holds the logic for updating the current status as
well as adding an error status. It leverages a simple
RuleStatusSavedObjectClient to handle the communication with
SavedObjects.

This removes the need for our specialized 'writeError', 'writeGap', and
'writeSuccess' functions, which duplicated much of the rule status
logic and code. It also fixes a bug with gap failures, with should have
been treated the same as other failures.

NB that an error does not necessarily prevent the rule from running, as
in the case of a gap or an ML Job not running.

This also adds a buildRuleMessage helper to reduce the noise of
generating logs/messages, and to make them more consistent.

* Remove unneeded 'async' keywords

We're not awaiting here, so we can just return the promise.

* Make buildRuleStatusAttributes synchronous

We weren't doing anything async here, and in fact the returning of a
promise was causing a bug when we tried to spread it into our attributes
object.

* Fix incorrectly-named RuleStatus attributes

This mapping could be done within the ruleStatusService, but it
lives outside it for now.

Also renames the object holding these values to the more general
'result,' as creationSuccess implies it always succeeds.

* Move our rule message helpers to a separate file

Adds some tests, as well.

* Refactor how rule status objects interact

Only ruleStatusSavedObjectsClient receives a savedObjectsClient, the
other functions receive the ruleStatusSavedObjectsClient

* pluralizes savedObjects in ruleStatusSavedObjectsClient
* Backfills tests

* Handle adding multiple errors during a single rule execution

We were storing state in our RuleStatusClient, and consequently could
get into a situation where that state did not reflect reality, and we
would incorrectly try to delete a SavedObject that had already been
deleted.

Rather than try to store the _correct_ state in the service, we remove
state entirely and just fetch our statuses on each action.

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
@MindyRS MindyRS added the Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. label Sep 23, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-solution (Team: SecuritySolution)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release_note:skip Skip the PR/issue when compiling release notes Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. Team:SIEM v7.7.0 v7.8.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants