-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Implement new rules design (#31110) #31294
[ML] Implement new rules design (#31110) #31294
Conversation
Pinging @elastic/ml-core |
retest this please |
b2caefb
to
0464a84
Compare
retest this please |
1 similar comment
retest this please |
0464a84
to
2de5d71
Compare
@droberts195 It turns out there was a BWC issue. I had to add a method that reads the old format as if an older node has rules, we need to read it off the stream to pave the way for the rest. We do not have that problem on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally we said that we'd rely on nobody having used the rules functionality prior to 6.4, as it was undocumented. I think with this assumption the new method wouldn't be necessary, as the list of old rules would always be empty.
Anyway, since you've created a method to read the old rule formats we might as well stick with it for robustness.
|
||
- skip: | ||
version: "6.4.0 - " | ||
reason: "Rules were replaced by custom_rules on 6.4.0" | ||
version: "6.2.0 - " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be 6.2.0 - 6.3.99
? Otherwise this will go wrong when the 6.x branch is for 6.5 and the "old" cluster is running 6.4.
@@ -171,10 +171,10 @@ | |||
} | |||
|
|||
--- | |||
"Test job with pre 6.4 rules - dummy job 6.4 onwards": | |||
"Test job with pre 6.4 rules - dummy job 6.2 onwards": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The job that this test PUTs doesn't have any explicit rules in it. Should it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment explaining why.
Rules allow users to supply a detector with domain knowledge that can improve the quality of the results. The model detects statistically anomalous results but it has no knowledge of the meaning of the values being modelled. For example, a detector that performs a population analysis over IP addresses could benefit from a list of IP addresses that the user knows to be safe. Then anomalous results for those IP addresses will not be created and will not affect the quantiles either. Another example would be a detector looking for anomalies in the median value of CPU utilization. A user might want to inform the detector that any results where the actual value is less than 5 is not interesting. This commit introduces a `custom_rules` field to the `Detector`. A detector may have multiple rules which are combined with `or`. A rule has 3 fields: `actions`, `scope` and `conditions`. Actions is a list of what should happen when the rule applies. The current options include `skip_result` and `skip_model_update`. The default value for `actions` is the `skip_result` action. Scope is optional and allows for applying filters on any of the partition/over/by field. When not defined the rule applies to all series. The `filter_id` needs to be specified to match the id of the filter to be used. Optionally, the `filter_type` can be specified as either `include` (default) or `exclude`. When set to `include` the rule applies to entities that are in the filter. When set to `exclude` the rule only applies to entities not in the filter. There may be zero or more conditions. A condition requires `applies_to`, `operator` and `value` to be specified. The `applies_to` value can be either `actual`, `typical` or `diff_from_typical` and it specifies the numerical value to which the condition applies. The `operator` (`lt`, `lte`, `gt`, `gte`) and `value` complete the definition. Conditions are combined with `and` and allow to specify numerical conditions for when a rule applies. A rule must either have a scope or one or more conditions. Finally, a rule with scope and conditions applies when all of them apply.
e6a8f6b
to
7cb4891
Compare
* 6.x: Upgrade to Lucene-7.4.0-snapshot-518d303506 (#31360) [ML] Implement new rules design (#31110) (#31294) Remove RestGetAllAliasesAction (#31308) CCS: don't proxy requests for already connected node (#31273) Rankeval: Fold template test project into main module (#31203) [Docs] Remove reference to repository-s3 plugin creating an S3 bucket (#31359) More detailed tracing when writing metadata (#31319) Add details section for dcg ranking metric (#31177)
Rules allow users to supply a detector with domain
knowledge that can improve the quality of the results.
The model detects statistically anomalous results but it
has no knowledge of the meaning of the values being modelled.
For example, a detector that performs a population analysis
over IP addresses could benefit from a list of IP addresses
that the user knows to be safe. Then anomalous results for
those IP addresses will not be created and will not affect
the quantiles either.
Another example would be a detector looking for anomalies
in the median value of CPU utilization. A user might want
to inform the detector that any results where the actual
value is less than 5 is not interesting.
This commit introduces a
custom_rules
field to theDetector
.A detector may have multiple rules which are combined with
or
.A rule has 3 fields:
actions
,scope
andconditions
.Actions is a list of what should happen when the rule applies.
The current options include
skip_result
andskip_model_update
.The default value for
actions
is theskip_result
action.Scope is optional and allows for applying filters on any of the
partition/over/by field. When not defined the rule applies to
all series. The
filter_id
needs to be specified to match the idof the filter to be used. Optionally, the
filter_type
can be specifiedas either
include
(default) orexclude
. When set toinclude
the rule applies to entities that are in the filter. When set to
exclude
the rule only applies to entities not in the filter.There may be zero or more conditions. A condition requires
applies_to
,operator
andvalue
to be specified. Theapplies_to
value can beeither
actual
,typical
ordiff_from_typical
and it specifiesthe numerical value to which the condition applies. The
operator
(
lt
,lte
,gt
,gte
) andvalue
complete the definition.Conditions are combined with
and
and allow to specify numericalconditions for when a rule applies.
A rule must either have a scope or one or more conditions. Finally,
a rule with scope and conditions applies when all of them apply.
Backport of #31110