-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SIEM] Create ML Rules #58053
[SIEM] Create ML Rules #58053
Conversation
Not sure what was causing issues here, but it's gone now.
This will be integrated into the regular rule creation workflow, but for now this simple form should allow us to exercise the full ML rule workflow.
The rest of this is going to be easier if I have actual data. For now this is mostly copy/pasted and simplified ML code. I've hardcoded time ranges to a period I know has data for a particular job.
It's a bummer that we normalize our rule alert params across all rule types currently, but that's the deal.
Next step: generate signals
This uses as much of the existing signal-creation code as possible. I skipped the search_after stuff for now because it would require us recreating the anomalies query which we really shouldn't own. For now, here's how it works: * Adds a separate branch of the rule executor for machine_learning rules * In that branch, we call our new bulkCreateMlSignal function * This function first transforms the anomaly document into ECS fields * We then pass the transformed documents to singleBulkCreate, which does the rest * After both branches, we update the rule's status appropriately. We need to do some more work on the anomaly transformation, but this works!
We were doing this identically in three places.
This adds most of the markup and logic to allow an ML rule type to be selected. We still need to add things like license-checking and showing/hiding of fields based on type.
These are still getting set on the form. We'll need to filter these fields before we send off the data, and not show them on the readonly display either. ALso, edit is majorly broken.
TIL that isEmpty returns false for numbers and other non-iterable values. I don't think it's exactly what we want here, but until I figure out the intention this gets our anomalyThreshold showing up without a separate logic branch here. Removes the unnecessary branch that was redundant with the 'else' clause.
This is not the same as the mockups and lacks some functionality, but it'll allow us to select a job for now.
So that we don't get rejected due to snake case vs camelcase.
It was previously hardcoded to a time period I knew had anomalies.
In that we don't initialize them like we do the query (default) fields.
This makes any query- or ML-specific fields optional on a Rule, and performs some logic on the frontend to group and include these fieldsets conditionally based on the user's selection. The one place we don't handle this well is on the readonly view of a completed step in the rules creation, but we'll address that.
It's no longer tabular data. If we need that, we can use the ML client.
This response isn't going to HTTP, which is where this option would matter.
I made a happy accident and flipped the logic here, which meant we weren't capping the signals we created.
Value is a little more ambiguous than data, here: this is our step data.
When sending off to the backend, or displaying on the readonly view, we inspect which rule type we've currently selected, and filter our form values appropriately.
We need to inherit the field value from our form on initial render, and everything works as expected.
Mainly leverages ML typings instead of our placeholder types. This required handling a null case in our formatting of anomalies.
We were notably lacking this ECS field in our post-conversion anomalies, and typescript was rightly complaining about it.
* Stricter threshold type * More robust date parsing * More informative log/error messages * Remove redundant runtime checks
* Fix types on our Rest types * Use less ambiguous machineLearningJobId over mlJobId * Declare our ML params as required keys, and ensure we pass them around everywhere we might need them (creating, importing, updating rules).
FormSchema has a very generic index signature such that our filterRuleFieldsForType helper cannot infer that it has our necessary rule fields (when in fact it does). By removing the FormSchema hint we get the actual keys of our schema, and things work as expected. All other uses of schema continue to work because they're expecting FormSchema, which is effectively { [key: string]: any }.
Rather than setting a null and then never using it, let's just make it truly optional in terms of default values.
For ML Rules, we don't use them.
We don't need to specify this, and we should continue not to for backwards compatibility.
The concern is that not typing our schemae as FormSchema could break our form if there are upstream changes. For now, we simply use the intersection of FormSchema and our generic parameter to satisfy our use within the function.
* threshold and job id are conditional on type * makes query and language mutually exclusive with above
We were sending invalid parameters.
@@ -45,6 +45,8 @@ export const output_index = t.string; | |||
export const saved_id = t.string; | |||
export const timeline_id = t.string; | |||
export const timeline_title = t.string; | |||
export const anomaly_threshold = PositiveInteger; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Later we can change this to be more strict, like RiskScore codec type which ensures things are between two values such as 0 and 100. No changes needed. Just a pointer.
We support not having an index here, as getInputIndex will return the current UI setting if none is specified.
We were previously able to create a rule without an input index; we should continue to support that, as verified by this test!
…h_after If a rule does not specify an input index pattern on creation, we use the current UI default when the rule is evaluated. This ensures that any subsequent searches use that same index. We're not currently persisting that runtime index to the generated signal, but we should.
We added a new argument, but didn't update the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow! Really impressive to see you learn the ends and outs of how to thread something as complex as Machine Learning through something that was originally only query based.
Awesome job coordinating with the ML team, the intel people and setting them up early, and learning all the different schema's and the weird warts but managing to make the code cleaner and more robust than it was before.
LGTM
We will fix any fallout and tests and other small things after this. So much goodness here, people are going to be really excited to be able to integrate their Machine Learning Jobs with signals.
💚 Build SucceededHistory
To update your PR or re-run it, just comment with: |
* master: (35 commits) [Alerting] Adds navigation by consumer and alert type to alerting (elastic#58997) Introduce search interceptor (elastic#60523) [ML] Add functional tests for file data visualizer (elastic#60413) [APM] Optimize service map query (elastic#60412) [SIEM][Detection Engine] Adds lists feature flag and list values to the REST interfaces Enhancement/update esdocs datasource (elastic#59512) [junit] only include stdout in report for failures (elastic#60530) Update dependency nock to v12 (elastic#60422) upgrade execa to get stdout/stderr in error messages (elastic#60537) skip flaky suite (elastic#60471) [Ingest] Agent Config Details - Data sources list ui (elastic#60429) [SIEM] Create ML Rules (elastic#58053) skip flaky suite (elastic#60559) fix agent type (elastic#60554) Fixed default message for index threshold includes both threshold values (elastic#60545) [Ingest] Add support for `yaml` field types (elastic#60440) Solved the issue for a GROUP BY expression validation (elastic#60558) [Maps] Mark instance state as readonly (elastic#60557) Move ui/indices into es_ui_shared plugin. (elastic#60186) ServiceNow action improvements (elastic#60052) ...
* master: (64 commits) [Alerting] Adds navigation by consumer and alert type to alerting (elastic#58997) Introduce search interceptor (elastic#60523) [ML] Add functional tests for file data visualizer (elastic#60413) [APM] Optimize service map query (elastic#60412) [SIEM][Detection Engine] Adds lists feature flag and list values to the REST interfaces Enhancement/update esdocs datasource (elastic#59512) [junit] only include stdout in report for failures (elastic#60530) Update dependency nock to v12 (elastic#60422) upgrade execa to get stdout/stderr in error messages (elastic#60537) skip flaky suite (elastic#60471) [Ingest] Agent Config Details - Data sources list ui (elastic#60429) [SIEM] Create ML Rules (elastic#58053) skip flaky suite (elastic#60559) fix agent type (elastic#60554) Fixed default message for index threshold includes both threshold values (elastic#60545) [Ingest] Add support for `yaml` field types (elastic#60440) Solved the issue for a GROUP BY expression validation (elastic#60558) [Maps] Mark instance state as readonly (elastic#60557) Move ui/indices into es_ui_shared plugin. (elastic#60186) ServiceNow action improvements (elastic#60052) ...
…-cluster-replication * 'master' of github.com:elastic/kibana: (89 commits) Sort by name when fetching alerts and connectors (elastic#60506) Make slack param validation handle empty messages (elastic#60468) [Alerting] Adds navigation by consumer and alert type to alerting (elastic#58997) Introduce search interceptor (elastic#60523) [ML] Add functional tests for file data visualizer (elastic#60413) [APM] Optimize service map query (elastic#60412) [SIEM][Detection Engine] Adds lists feature flag and list values to the REST interfaces Enhancement/update esdocs datasource (elastic#59512) [junit] only include stdout in report for failures (elastic#60530) Update dependency nock to v12 (elastic#60422) upgrade execa to get stdout/stderr in error messages (elastic#60537) skip flaky suite (elastic#60471) [Ingest] Agent Config Details - Data sources list ui (elastic#60429) [SIEM] Create ML Rules (elastic#58053) skip flaky suite (elastic#60559) fix agent type (elastic#60554) Fixed default message for index threshold includes both threshold values (elastic#60545) [Ingest] Add support for `yaml` field types (elastic#60440) Solved the issue for a GROUP BY expression validation (elastic#60558) [Maps] Mark instance state as readonly (elastic#60557) ... # Conflicts: # x-pack/legacy/plugins/cross_cluster_replication/public/np_ready/app/components/auto_follow_pattern_form.js # x-pack/legacy/plugins/cross_cluster_replication/public/np_ready/app/components/follower_index_form/follower_index_form.js # x-pack/legacy/plugins/cross_cluster_replication/public/np_ready/app/components/follower_index_form/follower_index_form.test.js # x-pack/legacy/plugins/cross_cluster_replication/public/np_ready/app/services/auto_follow_pattern_validators.js # x-pack/legacy/plugins/cross_cluster_replication/public/np_ready/app/services/input_validation.js
…ole/cleanup * 'master' of github.com:elastic/kibana: (47 commits) [Remote clusters] Update copy (elastic#60382) Sort by name when fetching alerts and connectors (elastic#60506) Make slack param validation handle empty messages (elastic#60468) [Alerting] Adds navigation by consumer and alert type to alerting (elastic#58997) Introduce search interceptor (elastic#60523) [ML] Add functional tests for file data visualizer (elastic#60413) [APM] Optimize service map query (elastic#60412) [SIEM][Detection Engine] Adds lists feature flag and list values to the REST interfaces Enhancement/update esdocs datasource (elastic#59512) [junit] only include stdout in report for failures (elastic#60530) Update dependency nock to v12 (elastic#60422) upgrade execa to get stdout/stderr in error messages (elastic#60537) skip flaky suite (elastic#60471) [Ingest] Agent Config Details - Data sources list ui (elastic#60429) [SIEM] Create ML Rules (elastic#58053) skip flaky suite (elastic#60559) fix agent type (elastic#60554) Fixed default message for index threshold includes both threshold values (elastic#60545) [Ingest] Add support for `yaml` field types (elastic#60440) Solved the issue for a GROUP BY expression validation (elastic#60558) ...
* master: (39 commits) [SIEM][CASE] Configuration page action bar (elastic#60608) [Remote clusters] Update copy (elastic#60382) Sort by name when fetching alerts and connectors (elastic#60506) Make slack param validation handle empty messages (elastic#60468) [Alerting] Adds navigation by consumer and alert type to alerting (elastic#58997) Introduce search interceptor (elastic#60523) [ML] Add functional tests for file data visualizer (elastic#60413) [APM] Optimize service map query (elastic#60412) [SIEM][Detection Engine] Adds lists feature flag and list values to the REST interfaces Enhancement/update esdocs datasource (elastic#59512) [junit] only include stdout in report for failures (elastic#60530) Update dependency nock to v12 (elastic#60422) upgrade execa to get stdout/stderr in error messages (elastic#60537) skip flaky suite (elastic#60471) [Ingest] Agent Config Details - Data sources list ui (elastic#60429) [SIEM] Create ML Rules (elastic#58053) skip flaky suite (elastic#60559) fix agent type (elastic#60554) Fixed default message for index threshold includes both threshold values (elastic#60545) [Ingest] Add support for `yaml` field types (elastic#60440) ...
* Remove unnecessary linter exceptions Not sure what was causing issues here, but it's gone now. * WIP: Simple form to test creation of ML rules This will be integrated into the regular rule creation workflow, but for now this simple form should allow us to exercise the full ML rule workflow. * WIP: Adds POST to backend, and type/payload changes necessary to make that work * Simplify logic with Math.min * WIP: Failed spike of making an http call * WIP: Hacking together an ML client The rest of this is going to be easier if I have actual data. For now this is mostly copy/pasted and simplified ML code. I've hardcoded time ranges to a period I know has data for a particular job. * Threading through our new ML Rule params It's a bummer that we normalize our rule alert params across all rule types currently, but that's the deal. * Retrieve our anomalies during rule execution Next step: generate signals * WIP: Generate ECS-compatible ML Signals This uses as much of the existing signal-creation code as possible. I skipped the search_after stuff for now because it would require us recreating the anomalies query which we really shouldn't own. For now, here's how it works: * Adds a separate branch of the rule executor for machine_learning rules * In that branch, we call our new bulkCreateMlSignal function * This function first transforms the anomaly document into ECS fields * We then pass the transformed documents to singleBulkCreate, which does the rest * After both branches, we update the rule's status appropriately. We need to do some more work on the anomaly transformation, but this works! * Extract setting of rule failure to helper function We were doing this identically in three places. * Remove unused import * Define a field for our Rule Type selection This adds most of the markup and logic to allow an ML rule type to be selected. We still need to add things like license-checking and showing/hiding of fields based on type. * Hide Query Fields when ML is selected These are still getting set on the form. We'll need to filter these fields before we send off the data, and not show them on the readonly display either. ALso, edit is majorly broken. * Add input field for anomaly threshold * Display numberic values in the readonly view of a step TIL that isEmpty returns false for numbers and other non-iterable values. I don't think it's exactly what we want here, but until I figure out the intention this gets our anomalyThreshold showing up without a separate logic branch here. Removes the unnecessary branch that was redundant with the 'else' clause. * Add field for selecting an ML job This is not the same as the mockups and lacks some functionality, but it'll allow us to select a job for now. * Format our new ML Fields when sending them to the server So that we don't get rejected due to snake case vs camelcase. * Put back code that respects a rule's schedule It was previously hardcoded to a time period I knew had anomalies. * ML fields are optional in our creation step In that we don't initialize them like we do the query (default) fields. * Only send along type-specific Rule fields from form This makes any query- or ML-specific fields optional on a Rule, and performs some logic on the frontend to group and include these fieldsets conditionally based on the user's selection. The one place we don't handle this well is on the readonly view of a completed step in the rules creation, but we'll address that. * Rename anomalies query It's no longer tabular data. If we need that, we can use the ML client. * Remove spike page with simple form * Remove unneeded ES option This response isn't going to HTTP, which is where this option would matter. * Fix bulk create logic I made a happy accident and flipped the logic here, which meant we weren't capping the signals we created. * Rename argument Value is a little more ambiguous than data, here: this is our step data. * Create Rule form stores all values, but filters by type for use When sending off to the backend, or displaying on the readonly view, we inspect which rule type we've currently selected, and filter our form values appropriately. * Fix editing of ML fields on Rule Create We need to inherit the field value from our form on initial render, and everything works as expected. * Clear form errors when switching between rule types Validation errors prevent us from moving to the next step, so it was previously possible to get an error for Query fields, switch to an ML rule, and be unable to continue because the form had Query errors. This also adds a helper for checking whether a ruleType is ML, to prevent having to change all these references if the type string changes. * Validate the selection of an ML Job * Fix type errors on frontend According to the types, this is essentially the opposite of formatRule, so we need to reinflate all potential form values from the rule. * Don't set defaults for query-specific rules For ML rules these types should not be included. * Return ML Fields in Rule responses This adds these fields to our rule serialization, and then adds conditional validation around those fields if the rule type is ML. Conversely, we moved the 'language' and 'query' fields to be conditionally validated if the rule is a query/saved_query rule. * Fix editing of ML rules by changing who controls the field values The source of truth for their state is the parent form object; these inputs should not have local state. * Fix type errors related to new ML fields In adding the new ML fields, some other fields (e.g. `query` and `index`) that were previously required but implicitly part of Query Rules are now marked as optional. Consequently, any downstream code that actually required these fields started to complain. In general, the fix was to verify that those fields exist, and throw an error otherwise as to appease the linter. Runtime-wise, the new ML rules/signals follow a separate code path and both branches should be unaffected by these changes; the issue is simply that our conditional types don't work well with Typescript. * Fix failing route tests Error message changed. * Fix integration tests We were not sending required properties when creating a rule(index and language). * Fix non-ML Rule creation I was accidentally dropping this parameter for our POST payload. Whoops. * More informative logging during ML signal generation The messaging diverged from the normal path here because we don't have index patterns to display. However, we have the rest of the rule context, and should report it appropriately. * Prefer keyof for string union types * Tidy up our new form components * Type them as React.FCs * Remove unnecessary use of styled-components * Prefer destructuring to lodash's omit * Fix mock params for helper functions These were updated to take simpler parameters. * Remove any type This could have been a boolean all along, whoops * Fix mock types * Update outdated tests These were added on master, but behavior has been changed on my branch. * Add some tests around our helper function I need to refactor it, so this is as good a time as any to pin down the behavior. * Remove uses of any in favor of actual types Mainly leverages ML typings instead of our placeholder types. This required handling a null case in our formatting of anomalies. * Annotate our anomalies with @timestamp field We were notably lacking this ECS field in our post-conversion anomalies, and typescript was rightly complaining about it. * ml_job_id -> machine_learning_job_id * PR Feedback * Stricter threshold type * More robust date parsing * More informative log/error messages * Remove redundant runtime checks * Cleaning up our new ML types * Fix types on our Rest types * Use less ambiguous machineLearningJobId over mlJobId * Declare our ML params as required keys, and ensure we pass them around everywhere we might need them (creating, importing, updating rules). * Use implicit type to avoid the need for a ts-ignore FormSchema has a very generic index signature such that our filterRuleFieldsForType helper cannot infer that it has our necessary rule fields (when in fact it does). By removing the FormSchema hint we get the actual keys of our schema, and things work as expected. All other uses of schema continue to work because they're expecting FormSchema, which is effectively { [key: string]: any }. * New ML params are not nullable Rather than setting a null and then never using it, let's just make it truly optional in terms of default values. * Query and language are conditional based on rule type For ML Rules, we don't use them. * Remove defaulted parameter in API test We don't need to specify this, and we should continue not to for backwards compatibility. * Use explicit types over implicit ones The concern is that not typing our schemae as FormSchema could break our form if there are upstream changes. For now, we simply use the intersection of FormSchema and our generic parameter to satisfy our use within the function. * Add integration test for creation of ML Rule * Add ML fields to route schemae * threshold and job id are conditional on type * makes query and language mutually exclusive with above * Fix router test for creating an ML rule We were sending invalid parameters. * Remove null check against index for query rules We support not having an index here, as getInputIndex will return the current UI setting if none is specified. * Add regression test for API compatibility We were previously able to create a rule without an input index; we should continue to support that, as verified by this test! * Respect the index pattern determined at runtime when performing search_after If a rule does not specify an input index pattern on creation, we use the current UI default when the rule is evaluated. This ensures that any subsequent searches use that same index. We're not currently persisting that runtime index to the generated signal, but we should. * Fix type errors in our bulk create tests We added a new argument, but didn't update the tests.
Summary
#60301
This adds the ability to create an ML Rule for a specific ML job, and generate signals from that job's anomalies.
filters
,index
,query
,language
) optionalanomalyThreshold
andmlJobId
Screenshots:
Known issues
Checklist
Delete any items that are not applicable to this PR.
For maintainers