-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SIEM] Check ML Job status on ML Rule execution #61715
[SIEM] Check ML Job status on ML Rule execution #61715
Conversation
And use it during rule execution as well.
This was unintentionally removed in a previous merge commit.
This allows dependent plugins to leverage the exposed services without having to define their own ml paths, e.g. "ml.jobs"
These are pure functions and used on both the client and server.
This works, but unfortunately it pushes this executor function to a complexity of 25. We're gonna refactor this next.
These are used on both the frontend and the backend, and can be shared.
RuleStatusService holds the logic for updating the current status as well as adding an error status. It leverages a simple RuleStatusSavedObjectClient to handle the communication with SavedObjects. This removes the need for our specialized 'writeError', 'writeGap', and 'writeSuccess' functions, which duplicated much of the rule status logic and code. It also fixes a bug with gap failures, with should have been treated the same as other failures. NB that an error does not necessarily prevent the rule from running, as in the case of a gap or an ML Job not running. This also adds a buildRuleMessage helper to reduce the noise of generating logs/messages, and to make them more consistent.
We're not awaiting here, so we can just return the promise.
We weren't doing anything async here, and in fact the returning of a promise was causing a bug when we tried to spread it into our attributes object.
This mapping could be done within the ruleStatusService, but it lives outside it for now. Also renames the object holding these values to the more general 'result,' as creationSuccess implies it always succeeds.
Adds some tests, as well.
Only ruleStatusSavedObjectsClient receives a savedObjectsClient, the other functions receive the ruleStatusSavedObjectsClient * pluralizes savedObjects in ruleStatusSavedObjectsClient * Backfills tests
Pinging @elastic/siem (Team:SIEM) |
Opened #61753 against 7.7. Once that's merged, this can go into 7.x. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - this cleans things up very nicely. Appreciate the wrappers around saved objects and creating a clean api for the rule status SO.
We were storing state in our RuleStatusClient, and consequently could get into a situation where that state did not reflect reality, and we would incorrectly try to delete a SavedObject that had already been deleted. Rather than try to store the _correct_ state in the service, we remove state entirely and just fetch our statuses on each action.
@elasticmachine merge upstream |
@dhurley14 I added a bugfix/simplification in 7cb5c2c if you could give this one more pass. I encountered a bug where you can't actually create multiple error statuses if you've already got 5 failures, and the fix was much easier to do here than it would be on #61753, so I think we should get this in 7.7 instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM ⚡️
💚 Build SucceededHistory
To update your PR or re-run it, just comment with: |
* Move isMlRule helper to a more general location And use it during rule execution as well. * Add error message back to rule error status This was unintentionally removed in a previous merge commit. * Expose mlClient as part of ML's Setup contract This allows dependent plugins to leverage the exposed services without having to define their own ml paths, e.g. "ml.jobs" * Move ML Job predicates to common folder These are pure functions and used on both the client and server. * WIP: Check ML Job status on ML Rule execution This works, but unfortunately it pushes this executor function to a complexity of 25. We're gonna refactor this next. * Move isMlRule and RuleType to common These are used on both the frontend and the backend, and can be shared. * Refactor Signal Rule executor to use RuleStatusService RuleStatusService holds the logic for updating the current status as well as adding an error status. It leverages a simple RuleStatusSavedObjectClient to handle the communication with SavedObjects. This removes the need for our specialized 'writeError', 'writeGap', and 'writeSuccess' functions, which duplicated much of the rule status logic and code. It also fixes a bug with gap failures, with should have been treated the same as other failures. NB that an error does not necessarily prevent the rule from running, as in the case of a gap or an ML Job not running. This also adds a buildRuleMessage helper to reduce the noise of generating logs/messages, and to make them more consistent. * Remove unneeded 'async' keywords We're not awaiting here, so we can just return the promise. * Make buildRuleStatusAttributes synchronous We weren't doing anything async here, and in fact the returning of a promise was causing a bug when we tried to spread it into our attributes object. * Fix incorrectly-named RuleStatus attributes This mapping could be done within the ruleStatusService, but it lives outside it for now. Also renames the object holding these values to the more general 'result,' as creationSuccess implies it always succeeds. * Move our rule message helpers to a separate file Adds some tests, as well. * Refactor how rule status objects interact Only ruleStatusSavedObjectsClient receives a savedObjectsClient, the other functions receive the ruleStatusSavedObjectsClient * pluralizes savedObjects in ruleStatusSavedObjectsClient * Backfills tests * Handle adding multiple errors during a single rule execution We were storing state in our RuleStatusClient, and consequently could get into a situation where that state did not reflect reality, and we would incorrectly try to delete a SavedObject that had already been deleted. Rather than try to store the _correct_ state in the service, we remove state entirely and just fetch our statuses on each action. Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
* Move isMlRule helper to a more general location And use it during rule execution as well. * Add error message back to rule error status This was unintentionally removed in a previous merge commit. * Expose mlClient as part of ML's Setup contract This allows dependent plugins to leverage the exposed services without having to define their own ml paths, e.g. "ml.jobs" * Move ML Job predicates to common folder These are pure functions and used on both the client and server. * WIP: Check ML Job status on ML Rule execution This works, but unfortunately it pushes this executor function to a complexity of 25. We're gonna refactor this next. * Move isMlRule and RuleType to common These are used on both the frontend and the backend, and can be shared. * Refactor Signal Rule executor to use RuleStatusService RuleStatusService holds the logic for updating the current status as well as adding an error status. It leverages a simple RuleStatusSavedObjectClient to handle the communication with SavedObjects. This removes the need for our specialized 'writeError', 'writeGap', and 'writeSuccess' functions, which duplicated much of the rule status logic and code. It also fixes a bug with gap failures, with should have been treated the same as other failures. NB that an error does not necessarily prevent the rule from running, as in the case of a gap or an ML Job not running. This also adds a buildRuleMessage helper to reduce the noise of generating logs/messages, and to make them more consistent. * Remove unneeded 'async' keywords We're not awaiting here, so we can just return the promise. * Make buildRuleStatusAttributes synchronous We weren't doing anything async here, and in fact the returning of a promise was causing a bug when we tried to spread it into our attributes object. * Fix incorrectly-named RuleStatus attributes This mapping could be done within the ruleStatusService, but it lives outside it for now. Also renames the object holding these values to the more general 'result,' as creationSuccess implies it always succeeds. * Move our rule message helpers to a separate file Adds some tests, as well. * Refactor how rule status objects interact Only ruleStatusSavedObjectsClient receives a savedObjectsClient, the other functions receive the ruleStatusSavedObjectsClient * pluralizes savedObjects in ruleStatusSavedObjectsClient * Backfills tests * Handle adding multiple errors during a single rule execution We were storing state in our RuleStatusClient, and consequently could get into a situation where that state did not reflect reality, and we would incorrectly try to delete a SavedObject that had already been deleted. Rather than try to store the _correct_ state in the service, we remove state entirely and just fetch our statuses on each action. Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
* Move isMlRule helper to a more general location And use it during rule execution as well. * Add error message back to rule error status This was unintentionally removed in a previous merge commit. * Expose mlClient as part of ML's Setup contract This allows dependent plugins to leverage the exposed services without having to define their own ml paths, e.g. "ml.jobs" * Move ML Job predicates to common folder These are pure functions and used on both the client and server. * WIP: Check ML Job status on ML Rule execution This works, but unfortunately it pushes this executor function to a complexity of 25. We're gonna refactor this next. * Move isMlRule and RuleType to common These are used on both the frontend and the backend, and can be shared. * Refactor Signal Rule executor to use RuleStatusService RuleStatusService holds the logic for updating the current status as well as adding an error status. It leverages a simple RuleStatusSavedObjectClient to handle the communication with SavedObjects. This removes the need for our specialized 'writeError', 'writeGap', and 'writeSuccess' functions, which duplicated much of the rule status logic and code. It also fixes a bug with gap failures, with should have been treated the same as other failures. NB that an error does not necessarily prevent the rule from running, as in the case of a gap or an ML Job not running. This also adds a buildRuleMessage helper to reduce the noise of generating logs/messages, and to make them more consistent. * Remove unneeded 'async' keywords We're not awaiting here, so we can just return the promise. * Make buildRuleStatusAttributes synchronous We weren't doing anything async here, and in fact the returning of a promise was causing a bug when we tried to spread it into our attributes object. * Fix incorrectly-named RuleStatus attributes This mapping could be done within the ruleStatusService, but it lives outside it for now. Also renames the object holding these values to the more general 'result,' as creationSuccess implies it always succeeds. * Move our rule message helpers to a separate file Adds some tests, as well. * Refactor how rule status objects interact Only ruleStatusSavedObjectsClient receives a savedObjectsClient, the other functions receive the ruleStatusSavedObjectsClient * pluralizes savedObjects in ruleStatusSavedObjectsClient * Backfills tests * Handle adding multiple errors during a single rule execution We were storing state in our RuleStatusClient, and consequently could get into a situation where that state did not reflect reality, and we would incorrectly try to delete a SavedObject that had already been deleted. Rather than try to store the _correct_ state in the service, we remove state entirely and just fetch our statuses on each action. Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
* Move isMlRule helper to a more general location And use it during rule execution as well. * Add error message back to rule error status This was unintentionally removed in a previous merge commit. * Expose mlClient as part of ML's Setup contract This allows dependent plugins to leverage the exposed services without having to define their own ml paths, e.g. "ml.jobs" * Move ML Job predicates to common folder These are pure functions and used on both the client and server. * WIP: Check ML Job status on ML Rule execution This works, but unfortunately it pushes this executor function to a complexity of 25. We're gonna refactor this next. * Move isMlRule and RuleType to common These are used on both the frontend and the backend, and can be shared. * Refactor Signal Rule executor to use RuleStatusService RuleStatusService holds the logic for updating the current status as well as adding an error status. It leverages a simple RuleStatusSavedObjectClient to handle the communication with SavedObjects. This removes the need for our specialized 'writeError', 'writeGap', and 'writeSuccess' functions, which duplicated much of the rule status logic and code. It also fixes a bug with gap failures, with should have been treated the same as other failures. NB that an error does not necessarily prevent the rule from running, as in the case of a gap or an ML Job not running. This also adds a buildRuleMessage helper to reduce the noise of generating logs/messages, and to make them more consistent. * Remove unneeded 'async' keywords We're not awaiting here, so we can just return the promise. * Make buildRuleStatusAttributes synchronous We weren't doing anything async here, and in fact the returning of a promise was causing a bug when we tried to spread it into our attributes object. * Fix incorrectly-named RuleStatus attributes This mapping could be done within the ruleStatusService, but it lives outside it for now. Also renames the object holding these values to the more general 'result,' as creationSuccess implies it always succeeds. * Move our rule message helpers to a separate file Adds some tests, as well. * Refactor how rule status objects interact Only ruleStatusSavedObjectsClient receives a savedObjectsClient, the other functions receive the ruleStatusSavedObjectsClient * pluralizes savedObjects in ruleStatusSavedObjectsClient * Backfills tests * Handle adding multiple errors during a single rule execution We were storing state in our RuleStatusClient, and consequently could get into a situation where that state did not reflect reality, and we would incorrectly try to delete a SavedObject that had already been deleted. Rather than try to store the _correct_ state in the service, we remove state entirely and just fetch our statuses on each action. Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Pinging @elastic/security-solution (Team: SecuritySolution) |
Summary
The first leg of this PR (diff) adds the actual enhancement: checking of ML Jobs during rule execution, and setting of rule status appropriately. Things required to get that to work:
ml.jobsServiceProvider
If necessary, I'm happy to open a separate PR against 7.7 with just that work.
However, our rule executor function is large (and contains multitudes?), and the additional ML logic pushed us over the eslint complexity limit, so the second half of this PR is moving logic out of that function, notably:
ruleStatusSavedObjectsClient
to handle SO callsruleStatusService
to encapsulate logic of creating/updating/GCing RuleStatusesChecklist
For maintainers