Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SIEM] Create ML Rules #58053

Merged
merged 66 commits into from
Mar 19, 2020
Merged

[SIEM] Create ML Rules #58053

merged 66 commits into from
Mar 19, 2020

Conversation

rylnd
Copy link
Contributor

@rylnd rylnd commented Feb 19, 2020

Summary

#60301

This adds the ability to create an ML Rule for a specific ML job, and generate signals from that job's anomalies.

  • Makes Query Rule fields (filters, index, query, language) optional
  • Adds optional ML Rule fields: anomalyThreshold and mlJobId
  • Adds new form fields for creating an ML Rule
    • Adds a simplified ML Job picker, for now
  • Adds frontend logic to display/send only type-specific rule fields
  • Adds server logic to create ML Rule
  • Adds alerting execution logic (new code path) to query ML anomalies and create signals from them
    • Does not use the ML Client

Screenshots:
Detections_-_Kibana
a36b3013-9e54-4eb1-85b8-e36e8e0f6b36_-_Kibana

Known issues

  • ML Rule creation does not perform search_after; all anomaly signals are bulk-inserted at once
  • Stopped/closed jobs are not listed on the form, and not opened/started automatically

Checklist

Delete any items that are not applicable to this PR.

For maintainers

@rylnd rylnd requested a review from spong February 19, 2020 21:19
@rylnd rylnd force-pushed the ml_signals_spike branch from 0d5cb57 to 31a8c4c Compare March 12, 2020 02:49
rylnd added 13 commits March 14, 2020 14:55
Not sure what was causing issues here, but it's gone now.
This will be integrated into the regular rule creation workflow, but for
now this simple form should allow us to exercise the full ML rule
workflow.
The rest of this is going to be easier if I have actual data. For now
this is mostly copy/pasted and simplified ML code. I've hardcoded time
ranges to a period I know has data for a particular job.
It's a bummer that we normalize our rule alert params across all rule
types currently, but that's the deal.
This uses as much of the existing signal-creation code as possible. I
skipped the search_after stuff for now because it would require us
recreating the anomalies query which we really shouldn't own. For now,
here's how it works:

* Adds a separate branch of the rule executor for machine_learning rules
* In that branch, we call our new bulkCreateMlSignal function
  * This function first transforms the anomaly document into ECS fields
  * We then pass the transformed documents to singleBulkCreate, which
  does the rest
* After both branches, we update the rule's status appropriately.

We need to do some more work on the anomaly transformation, but this
works!
We were doing this identically in three places.
This adds most of the markup and logic to allow an ML rule type to be
selected. We still need to add things like license-checking and
showing/hiding of fields based on type.
These are still getting set on the form. We'll need to filter these
fields before we send off the data, and not show them on the readonly
display either.

ALso, edit is majorly broken.
@rylnd rylnd force-pushed the ml_signals_spike branch from 255690c to bf2b391 Compare March 14, 2020 20:05
rylnd added 14 commits March 14, 2020 18:56
TIL that isEmpty returns false for numbers and other non-iterable
values. I don't think it's exactly what we want here, but until I figure
out the intention this gets our anomalyThreshold showing up without a
separate logic branch here. Removes the unnecessary branch that was
redundant with the 'else' clause.
This is not the same as the mockups and lacks some functionality, but
it'll allow us to select a job for now.
So that we don't get rejected due to snake case vs camelcase.
It was previously hardcoded to a time period I knew had anomalies.
In that we don't initialize them like we do the query (default) fields.
This makes any query- or ML-specific fields optional on a Rule, and
performs some logic on the frontend to group and include these fieldsets
conditionally based on the user's selection. The one place we don't
handle this well is on the readonly view of a completed step in the
rules creation, but we'll address that.
It's no longer tabular data. If we need that, we can use the ML client.
This response isn't going to HTTP, which is where this option would
matter.
I made a happy accident and flipped the logic here, which meant we
weren't capping the signals we created.
Value is a little more ambiguous than data, here: this is our step data.
When sending off to the backend, or displaying on the readonly view, we
inspect which rule type we've currently selected, and filter our form
values appropriately.
We need to inherit the field value from our form on initial render, and
everything works as expected.
rylnd added 14 commits March 17, 2020 18:05
Mainly leverages ML typings instead of our placeholder types. This
required handling a null case in our formatting of anomalies.
We were notably lacking this ECS field in our post-conversion anomalies,
and typescript was rightly complaining about it.
* Stricter threshold type
* More robust date parsing
* More informative log/error messages
* Remove redundant runtime checks
* Fix types on our Rest types
* Use less ambiguous machineLearningJobId over mlJobId
* Declare our ML params as required keys, and ensure we pass them around
everywhere we might need them (creating, importing, updating rules).
FormSchema has a very generic index signature such that our
filterRuleFieldsForType helper cannot infer that it has our necessary
rule fields (when in fact it does). By removing the FormSchema hint we
get the actual keys of our schema, and things work as expected.

All other uses of schema continue to work because they're expecting
FormSchema, which is effectively { [key: string]: any }.
Rather than setting a null and then never using it, let's just make it
truly optional in terms of default values.
We don't need to specify this, and we should continue not to for
backwards compatibility.
The concern is that not typing our schemae as FormSchema could break our
form if there are upstream changes. For now, we simply use the
intersection of FormSchema and our generic parameter to satisfy our use
within the function.
* threshold and job id are conditional on type
* makes query and language mutually exclusive with above
We were sending invalid parameters.
@@ -45,6 +45,8 @@ export const output_index = t.string;
export const saved_id = t.string;
export const timeline_id = t.string;
export const timeline_title = t.string;
export const anomaly_threshold = PositiveInteger;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Later we can change this to be more strict, like RiskScore codec type which ensures things are between two values such as 0 and 100. No changes needed. Just a pointer.

rylnd added 4 commits March 18, 2020 16:58
We support not having an index here, as getInputIndex will return the
current UI setting if none is specified.
We were previously able to create a rule without an input index; we
should continue to support that, as verified by this test!
…h_after

If a rule does not specify an input index pattern on creation, we use
the current UI default when the rule is evaluated. This ensures that any
subsequent searches use that same index.

We're not currently persisting that runtime index to the generated
signal, but we should.
We added a new argument, but didn't update the tests.
Copy link
Contributor

@FrankHassanabad FrankHassanabad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow! Really impressive to see you learn the ends and outs of how to thread something as complex as Machine Learning through something that was originally only query based.

Awesome job coordinating with the ML team, the intel people and setting them up early, and learning all the different schema's and the weird warts but managing to make the code cleaner and more robust than it was before.

LGTM

We will fix any fallout and tests and other small things after this. So much goodness here, people are going to be really excited to be able to integrate their Machine Learning Jobs with signals.

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@rylnd rylnd merged commit a05a612 into elastic:master Mar 19, 2020
gmmorris added a commit to gmmorris/kibana that referenced this pull request Mar 19, 2020
* master: (35 commits)
  [Alerting] Adds navigation by consumer and alert type to alerting (elastic#58997)
  Introduce search interceptor (elastic#60523)
  [ML] Add functional tests for file data visualizer (elastic#60413)
  [APM] Optimize service map query (elastic#60412)
  [SIEM][Detection Engine] Adds lists feature flag and list values to the REST interfaces
  Enhancement/update esdocs datasource (elastic#59512)
  [junit] only include stdout in report for failures (elastic#60530)
  Update dependency nock to v12 (elastic#60422)
  upgrade execa to get stdout/stderr in error messages (elastic#60537)
  skip flaky suite (elastic#60471)
  [Ingest] Agent Config Details - Data sources list ui (elastic#60429)
  [SIEM] Create ML Rules (elastic#58053)
  skip flaky suite (elastic#60559)
  fix agent type (elastic#60554)
  Fixed default message for index threshold includes both threshold values (elastic#60545)
  [Ingest] Add support for `yaml` field types (elastic#60440)
  Solved the issue for a GROUP BY expression validation (elastic#60558)
  [Maps] Mark instance state as readonly (elastic#60557)
  Move ui/indices into es_ui_shared plugin. (elastic#60186)
  ServiceNow action improvements (elastic#60052)
  ...
gmmorris added a commit to gmmorris/kibana that referenced this pull request Mar 19, 2020
* master: (64 commits)
  [Alerting] Adds navigation by consumer and alert type to alerting (elastic#58997)
  Introduce search interceptor (elastic#60523)
  [ML] Add functional tests for file data visualizer (elastic#60413)
  [APM] Optimize service map query (elastic#60412)
  [SIEM][Detection Engine] Adds lists feature flag and list values to the REST interfaces
  Enhancement/update esdocs datasource (elastic#59512)
  [junit] only include stdout in report for failures (elastic#60530)
  Update dependency nock to v12 (elastic#60422)
  upgrade execa to get stdout/stderr in error messages (elastic#60537)
  skip flaky suite (elastic#60471)
  [Ingest] Agent Config Details - Data sources list ui (elastic#60429)
  [SIEM] Create ML Rules (elastic#58053)
  skip flaky suite (elastic#60559)
  fix agent type (elastic#60554)
  Fixed default message for index threshold includes both threshold values (elastic#60545)
  [Ingest] Add support for `yaml` field types (elastic#60440)
  Solved the issue for a GROUP BY expression validation (elastic#60558)
  [Maps] Mark instance state as readonly (elastic#60557)
  Move ui/indices into es_ui_shared plugin. (elastic#60186)
  ServiceNow action improvements (elastic#60052)
  ...
jloleysens added a commit to jloleysens/kibana that referenced this pull request Mar 19, 2020
…-cluster-replication

* 'master' of github.com:elastic/kibana: (89 commits)
  Sort by name when fetching alerts and connectors (elastic#60506)
  Make slack param validation handle empty messages (elastic#60468)
  [Alerting] Adds navigation by consumer and alert type to alerting (elastic#58997)
  Introduce search interceptor (elastic#60523)
  [ML] Add functional tests for file data visualizer (elastic#60413)
  [APM] Optimize service map query (elastic#60412)
  [SIEM][Detection Engine] Adds lists feature flag and list values to the REST interfaces
  Enhancement/update esdocs datasource (elastic#59512)
  [junit] only include stdout in report for failures (elastic#60530)
  Update dependency nock to v12 (elastic#60422)
  upgrade execa to get stdout/stderr in error messages (elastic#60537)
  skip flaky suite (elastic#60471)
  [Ingest] Agent Config Details - Data sources list ui (elastic#60429)
  [SIEM] Create ML Rules (elastic#58053)
  skip flaky suite (elastic#60559)
  fix agent type (elastic#60554)
  Fixed default message for index threshold includes both threshold values (elastic#60545)
  [Ingest] Add support for `yaml` field types (elastic#60440)
  Solved the issue for a GROUP BY expression validation (elastic#60558)
  [Maps] Mark instance state as readonly (elastic#60557)
  ...

# Conflicts:
#	x-pack/legacy/plugins/cross_cluster_replication/public/np_ready/app/components/auto_follow_pattern_form.js
#	x-pack/legacy/plugins/cross_cluster_replication/public/np_ready/app/components/follower_index_form/follower_index_form.js
#	x-pack/legacy/plugins/cross_cluster_replication/public/np_ready/app/components/follower_index_form/follower_index_form.test.js
#	x-pack/legacy/plugins/cross_cluster_replication/public/np_ready/app/services/auto_follow_pattern_validators.js
#	x-pack/legacy/plugins/cross_cluster_replication/public/np_ready/app/services/input_validation.js
jloleysens added a commit to jloleysens/kibana that referenced this pull request Mar 19, 2020
…ole/cleanup

* 'master' of github.com:elastic/kibana: (47 commits)
  [Remote clusters] Update copy (elastic#60382)
  Sort by name when fetching alerts and connectors (elastic#60506)
  Make slack param validation handle empty messages (elastic#60468)
  [Alerting] Adds navigation by consumer and alert type to alerting (elastic#58997)
  Introduce search interceptor (elastic#60523)
  [ML] Add functional tests for file data visualizer (elastic#60413)
  [APM] Optimize service map query (elastic#60412)
  [SIEM][Detection Engine] Adds lists feature flag and list values to the REST interfaces
  Enhancement/update esdocs datasource (elastic#59512)
  [junit] only include stdout in report for failures (elastic#60530)
  Update dependency nock to v12 (elastic#60422)
  upgrade execa to get stdout/stderr in error messages (elastic#60537)
  skip flaky suite (elastic#60471)
  [Ingest] Agent Config Details - Data sources list ui (elastic#60429)
  [SIEM] Create ML Rules (elastic#58053)
  skip flaky suite (elastic#60559)
  fix agent type (elastic#60554)
  Fixed default message for index threshold includes both threshold values (elastic#60545)
  [Ingest] Add support for `yaml` field types (elastic#60440)
  Solved the issue for a GROUP BY expression validation (elastic#60558)
  ...
gmmorris added a commit to gmmorris/kibana that referenced this pull request Mar 19, 2020
* master: (39 commits)
  [SIEM][CASE]  Configuration page action bar (elastic#60608)
  [Remote clusters] Update copy (elastic#60382)
  Sort by name when fetching alerts and connectors (elastic#60506)
  Make slack param validation handle empty messages (elastic#60468)
  [Alerting] Adds navigation by consumer and alert type to alerting (elastic#58997)
  Introduce search interceptor (elastic#60523)
  [ML] Add functional tests for file data visualizer (elastic#60413)
  [APM] Optimize service map query (elastic#60412)
  [SIEM][Detection Engine] Adds lists feature flag and list values to the REST interfaces
  Enhancement/update esdocs datasource (elastic#59512)
  [junit] only include stdout in report for failures (elastic#60530)
  Update dependency nock to v12 (elastic#60422)
  upgrade execa to get stdout/stderr in error messages (elastic#60537)
  skip flaky suite (elastic#60471)
  [Ingest] Agent Config Details - Data sources list ui (elastic#60429)
  [SIEM] Create ML Rules (elastic#58053)
  skip flaky suite (elastic#60559)
  fix agent type (elastic#60554)
  Fixed default message for index threshold includes both threshold values (elastic#60545)
  [Ingest] Add support for `yaml` field types (elastic#60440)
  ...
rylnd added a commit that referenced this pull request Mar 19, 2020
* Remove unnecessary linter exceptions

Not sure what was causing issues here, but it's gone now.

* WIP: Simple form to test creation of ML rules

This will be integrated into the regular rule creation workflow, but for
now this simple form should allow us to exercise the full ML rule
workflow.

* WIP: Adds POST to backend, and type/payload changes necessary to make that work

* Simplify logic with Math.min

* WIP: Failed spike of making an http call

* WIP: Hacking together an ML client

The rest of this is going to be easier if I have actual data. For now
this is mostly copy/pasted and simplified ML code. I've hardcoded time
ranges to a period I know has data for a particular job.

* Threading through our new ML Rule params

It's a bummer that we normalize our rule alert params across all rule
types currently, but that's the deal.

* Retrieve our anomalies during rule execution

Next step: generate signals

* WIP: Generate ECS-compatible ML Signals

This uses as much of the existing signal-creation code as possible. I
skipped the search_after stuff for now because it would require us
recreating the anomalies query which we really shouldn't own. For now,
here's how it works:

* Adds a separate branch of the rule executor for machine_learning rules
* In that branch, we call our new bulkCreateMlSignal function
  * This function first transforms the anomaly document into ECS fields
  * We then pass the transformed documents to singleBulkCreate, which
  does the rest
* After both branches, we update the rule's status appropriately.

We need to do some more work on the anomaly transformation, but this
works!

* Extract setting of rule failure to helper function

We were doing this identically in three places.

* Remove unused import

* Define a field for our Rule Type selection

This adds most of the markup and logic to allow an ML rule type to be
selected. We still need to add things like license-checking and
showing/hiding of fields based on type.

* Hide Query Fields when ML is selected

These are still getting set on the form. We'll need to filter these
fields before we send off the data, and not show them on the readonly
display either.

ALso, edit is majorly broken.

* Add input field for anomaly threshold

* Display numberic values in the readonly view of a step

TIL that isEmpty returns false for numbers and other non-iterable
values. I don't think it's exactly what we want here, but until I figure
out the intention this gets our anomalyThreshold showing up without a
separate logic branch here. Removes the unnecessary branch that was
redundant with the 'else' clause.

* Add field for selecting an ML job

This is not the same as the mockups and lacks some functionality, but
it'll allow us to select a job for now.

* Format our new ML Fields when sending them to the server

So that we don't get rejected due to snake case vs camelcase.

* Put back code that respects a rule's schedule

It was previously hardcoded to a time period I knew had anomalies.

* ML fields are optional in our creation step

In that we don't initialize them like we do the query (default) fields.

* Only send along type-specific Rule fields from form

This makes any query- or ML-specific fields optional on a Rule, and
performs some logic on the frontend to group and include these fieldsets
conditionally based on the user's selection. The one place we don't
handle this well is on the readonly view of a completed step in the
rules creation, but we'll address that.

* Rename anomalies query

It's no longer tabular data. If we need that, we can use the ML client.

* Remove spike page with simple form

* Remove unneeded ES option

This response isn't going to HTTP, which is where this option would
matter.

* Fix bulk create logic

I made a happy accident and flipped the logic here, which meant we
weren't capping the signals we created.

* Rename argument

Value is a little more ambiguous than data, here: this is our step data.

* Create Rule form stores all values, but filters by type for use

When sending off to the backend, or displaying on the readonly view, we
inspect which rule type we've currently selected, and filter our form
values appropriately.

* Fix editing of ML fields on Rule Create

We need to inherit the field value from our form on initial render, and
everything works as expected.

* Clear form errors when switching between rule types

Validation errors prevent us from moving to the next step, so it was
previously possible to get an error for Query fields, switch to an ML
rule, and be unable to continue because the form had Query errors.

This also adds a helper for checking whether a ruleType is ML, to
prevent having to change all these references if the type string
changes.

* Validate the selection of an ML Job

* Fix type errors on frontend

According to the types, this is essentially the opposite of formatRule,
so we need to reinflate all potential form values from the rule.

* Don't set defaults for query-specific rules

For ML rules these types should not be included.

* Return ML Fields in Rule responses

This adds these fields to our rule serialization, and then adds
conditional validation around those fields if the rule type is ML.
Conversely, we moved the 'language' and 'query' fields to be
conditionally validated if the rule is a query/saved_query rule.

* Fix editing of ML rules by changing who controls the field values

The source of truth for their state is the parent form object; these
inputs should not have local state.

* Fix type errors related to new ML fields

In adding the new ML fields, some other fields (e.g. `query` and
`index`) that were previously required but implicitly part of Query
Rules are now marked as optional.

Consequently, any downstream code that actually required these fields
started to complain. In general, the fix was to verify that those fields
exist, and throw an error otherwise as to appease the linter.

Runtime-wise, the new ML rules/signals follow a separate code path and
both branches should be unaffected by these changes; the issue is simply
that our conditional types don't work well with Typescript.

* Fix failing route tests

Error message changed.

* Fix integration tests

We were not sending required properties when creating a rule(index and
language).

* Fix non-ML Rule creation

I was accidentally dropping this parameter for our POST payload. Whoops.

* More informative logging during ML signal generation

The messaging diverged from the normal path here because we don't have
index patterns to display. However, we have the rest of the rule
context, and should report it appropriately.

* Prefer keyof for string union types

* Tidy up our new form components

* Type them as React.FCs
* Remove unnecessary use of styled-components

* Prefer destructuring to lodash's omit

* Fix mock params for helper functions

These were updated to take simpler parameters.

* Remove any type

This could have been a boolean all along, whoops

* Fix mock types

* Update outdated tests

These were added on master, but behavior has been changed on my branch.

* Add some tests around our helper function

I need to refactor it, so this is as good a time as any to pin down the
behavior.

* Remove uses of any in favor of actual types

Mainly leverages ML typings instead of our placeholder types. This
required handling a null case in our formatting of anomalies.

* Annotate our anomalies with @timestamp field

We were notably lacking this ECS field in our post-conversion anomalies,
and typescript was rightly complaining about it.

* ml_job_id -> machine_learning_job_id

* PR Feedback

* Stricter threshold type
* More robust date parsing
* More informative log/error messages
* Remove redundant runtime checks

* Cleaning up our new ML types

* Fix types on our Rest types
* Use less ambiguous machineLearningJobId over mlJobId
* Declare our ML params as required keys, and ensure we pass them around
everywhere we might need them (creating, importing, updating rules).

* Use implicit type to avoid the need for a ts-ignore

FormSchema has a very generic index signature such that our
filterRuleFieldsForType helper cannot infer that it has our necessary
rule fields (when in fact it does). By removing the FormSchema hint we
get the actual keys of our schema, and things work as expected.

All other uses of schema continue to work because they're expecting
FormSchema, which is effectively { [key: string]: any }.

* New ML params are not nullable

Rather than setting a null and then never using it, let's just make it
truly optional in terms of default values.

* Query and language are conditional based on rule type

For ML Rules, we don't use them.

* Remove defaulted parameter in API test

We don't need to specify this, and we should continue not to for
backwards compatibility.

* Use explicit types over implicit ones

The concern is that not typing our schemae as FormSchema could break our
form if there are upstream changes. For now, we simply use the
intersection of FormSchema and our generic parameter to satisfy our use
within the function.

* Add integration test for creation of ML Rule

* Add ML fields to route schemae

* threshold and job id are conditional on type
* makes query and language mutually exclusive with above

* Fix router test for creating an ML rule

We were sending invalid parameters.

* Remove null check against index for query rules

We support not having an index here, as getInputIndex will return the
current UI setting if none is specified.

* Add regression test for API compatibility

We were previously able to create a rule without an input index; we
should continue to support that, as verified by this test!

* Respect the index pattern determined at runtime when performing search_after

If a rule does not specify an input index pattern on creation, we use
the current UI default when the rule is evaluated. This ensures that any
subsequent searches use that same index.

We're not currently persisting that runtime index to the generated
signal, but we should.

* Fix type errors in our bulk create tests

We added a new argument, but didn't update the tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants