Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Implement new rules design (#31110) #31294

Merged

Conversation

dimitris-athanasiou
Copy link
Contributor

Rules allow users to supply a detector with domain
knowledge that can improve the quality of the results.
The model detects statistically anomalous results but it
has no knowledge of the meaning of the values being modelled.

For example, a detector that performs a population analysis
over IP addresses could benefit from a list of IP addresses
that the user knows to be safe. Then anomalous results for
those IP addresses will not be created and will not affect
the quantiles either.

Another example would be a detector looking for anomalies
in the median value of CPU utilization. A user might want
to inform the detector that any results where the actual
value is less than 5 is not interesting.

This commit introduces a custom_rules field to the Detector.
A detector may have multiple rules which are combined with or.

A rule has 3 fields: actions, scope and conditions.

Actions is a list of what should happen when the rule applies.
The current options include skip_result and skip_model_update.
The default value for actions is the skip_result action.

Scope is optional and allows for applying filters on any of the
partition/over/by field. When not defined the rule applies to
all series. The filter_id needs to be specified to match the id
of the filter to be used. Optionally, the filter_type can be specified
as either include (default) or exclude. When set to include
the rule applies to entities that are in the filter. When set to
exclude the rule only applies to entities not in the filter.

There may be zero or more conditions. A condition requires applies_to,
operator and value to be specified. The applies_to value can be
either actual, typical or diff_from_typical and it specifies
the numerical value to which the condition applies. The operator
(lt, lte, gt, gte) and value complete the definition.
Conditions are combined with and and allow to specify numerical
conditions for when a rule applies.

A rule must either have a scope or one or more conditions. Finally,
a rule with scope and conditions applies when all of them apply.

Backport of #31110

@dimitris-athanasiou dimitris-athanasiou added :ml Machine learning v6.4.0 labels Jun 13, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core

@dimitris-athanasiou dimitris-athanasiou changed the title [ML] Implement new rules design [ML] Implement new rules design (#31110) Jun 13, 2018
@dimitris-athanasiou
Copy link
Contributor Author

retest this please

@dimitris-athanasiou dimitris-athanasiou force-pushed the backport-new-rules-api-6x branch 2 times, most recently from b2caefb to 0464a84 Compare June 14, 2018 09:22
@dimitris-athanasiou
Copy link
Contributor Author

retest this please

1 similar comment
@dimitris-athanasiou
Copy link
Contributor Author

retest this please

@dimitris-athanasiou dimitris-athanasiou force-pushed the backport-new-rules-api-6x branch from 0464a84 to 2de5d71 Compare June 14, 2018 23:48
@dimitris-athanasiou
Copy link
Contributor Author

@droberts195 It turns out there was a BWC issue. I had to add a method that reads the old format as if an older node has rules, we need to read it off the stream to pave the way for the rest. We do not have that problem on master because rolling upgrades are only supported from 6.latest which will have the new rules implementation. Could you take a look at the additional commit please?

Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally we said that we'd rely on nobody having used the rules functionality prior to 6.4, as it was undocumented. I think with this assumption the new method wouldn't be necessary, as the list of old rules would always be empty.

Anyway, since you've created a method to read the old rule formats we might as well stick with it for robustness.


- skip:
version: "6.4.0 - "
reason: "Rules were replaced by custom_rules on 6.4.0"
version: "6.2.0 - "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be 6.2.0 - 6.3.99? Otherwise this will go wrong when the 6.x branch is for 6.5 and the "old" cluster is running 6.4.

@@ -171,10 +171,10 @@
}

---
"Test job with pre 6.4 rules - dummy job 6.4 onwards":
"Test job with pre 6.4 rules - dummy job 6.2 onwards":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The job that this test PUTs doesn't have any explicit rules in it. Should it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment explaining why.

Rules allow users to supply a detector with domain
knowledge that can improve the quality of the results.
The model detects statistically anomalous results but it
has no knowledge of the meaning of the values being modelled.

For example, a detector that performs a population analysis
over IP addresses could benefit from a list of IP addresses
that the user knows to be safe. Then anomalous results for
those IP addresses will not be created and will not affect
the quantiles either.

Another example would be a detector looking for anomalies
in the median value of CPU utilization. A user might want
to inform the detector that any results where the actual
value is less than 5 is not interesting.

This commit introduces a `custom_rules` field to the `Detector`.
A detector may have multiple rules which are combined with `or`.

A rule has 3 fields: `actions`, `scope` and `conditions`.

Actions is a list of what should happen when the rule applies.
The current options include `skip_result` and `skip_model_update`.
The default value for `actions` is the `skip_result` action.

Scope is optional and allows for applying filters on any of the
partition/over/by field. When not defined the rule applies to
all series. The `filter_id` needs to be specified to match the id
of the filter to be used. Optionally, the `filter_type` can be specified
as either `include` (default) or `exclude`. When set to `include`
the rule applies to entities that are in the filter. When set to
`exclude` the rule only applies to entities not in the filter.

There may be zero or more conditions. A condition requires `applies_to`,
`operator` and `value` to be specified. The `applies_to` value can be
either `actual`, `typical` or `diff_from_typical` and it specifies
the numerical value to which the condition applies. The `operator`
(`lt`, `lte`, `gt`, `gte`) and `value` complete the definition.
Conditions are combined with `and` and allow to specify numerical
conditions for when a rule applies.

A rule must either have a scope or one or more conditions. Finally,
a rule with scope and conditions applies when all of them apply.
@dimitris-athanasiou dimitris-athanasiou force-pushed the backport-new-rules-api-6x branch from e6a8f6b to 7cb4891 Compare June 15, 2018 09:50
@dimitris-athanasiou dimitris-athanasiou merged commit b1c1977 into elastic:6.x Jun 15, 2018
@dimitris-athanasiou dimitris-athanasiou deleted the backport-new-rules-api-6x branch June 15, 2018 15:05
dnhatn added a commit that referenced this pull request Jun 15, 2018
* 6.x:
  Upgrade to Lucene-7.4.0-snapshot-518d303506 (#31360)
  [ML] Implement new rules design (#31110) (#31294)
  Remove RestGetAllAliasesAction (#31308)
  CCS: don't proxy requests for already connected node (#31273)
  Rankeval: Fold template test project into main module (#31203)
  [Docs] Remove reference to repository-s3 plugin creating an S3 bucket (#31359)
  More detailed tracing when writing metadata (#31319)
  Add details section for dcg ranking metric (#31177)
@lcawl lcawl added the >feature label Aug 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants