Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Security Solution][Alerts] adds support for multi fields in new terms rule #143943

Merged
merged 53 commits into from
Nov 14, 2022

Conversation

vitaliidm
Copy link
Contributor

@vitaliidm vitaliidm commented Oct 25, 2022

Summary

  • addresses [Security Solution][Alerts] Allow using multiple fields in New Terms rule #142862
    • allows up to 3 fields in New terms
    • displays new terms fields in alerts details
  • For multiple new terms fields(['source.host', 'source.ip']), in terms aggregation uses a runtime field. Which is created by joining values from new terms fields into one single keyword. Field values encoded in base64 and joined with a configured delimiter symbol, which is not part of base64 symbols(a–Z, 0–9, +, /, =) to avoid a situation when delimiter can be part of field value. Include parameter consists of encoded in base64 results from Phase 1.
    For single field, implementation remains the same to avoid performance penalties
  • Performance measurements:

UI

Alert details

Before

Screenshot 2022-10-26 at 18 29 00

After

Screenshot 2022-11-09 at 19 05 10

Delete any items that are not applicable to this PR.

For maintainers

@vitaliidm vitaliidm self-assigned this Oct 25, 2022
@vitaliidm vitaliidm added release_note:feature Makes this part of the condensed release notes Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. Feature:Detection Alerts Security Solution Detection Alerts Feature Team:Detection Alerts Security Detection Alerts Area Team v8.6.0 release_note:enhancement and removed release_note:feature Makes this part of the condensed release notes labels Oct 25, 2022
Copy link
Contributor

@marshallmain marshallmain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome! Just a couple questions about how the runtime field handles edge cases, then I think it's good to go.


return buckets.map((bucket) =>
Object.values(bucket.key)
.filter((value): value is string | number => value != null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we filter out individual null values from a particular bucket, is there any chance of the encoded value from the bucket matching the runtime field-calculated value since the runtime field will still use all of the fields, including the nulls? I wonder if we should exclude the entire bucket if any values are null, or include the nulls and base64 encode them as well.

Copy link
Contributor Author

@vitaliidm vitaliidm Nov 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like, we don't need this check at all.
Our query is following

POST /null-*/_search?allow_no_indices=true&ignore_unavailable=true
{
   "query":{
      "bool":{
         "filter":[
            {
               "bool":{
                  "must":[
                     
                  ],
                  "filter":[
                     {
                        "query_string":{
                           "query":"*"
                        }
                     }
                  ],
                  "should":[
                     
                  ],
                  "must_not":[
                     
                  ]
               }
            },
            {
               "range":{
                  "@timestamp":{
                     "lte":"2022-11-10T14:46:11.680Z",
                     "gte":"2022-11-08T14:45:11.680Z",
                     "format":"strict_date_optional_time"
                  }
               }
            },
            {
               "match_all":{
                  
               }
            }
         ]
      }
   },
   "fields":[
      {
         "field":"*",
         "include_unmapped":true
      },
      {
         "field":"@timestamp",
         "format":"strict_date_optional_time"
      }
   ],
   "aggregations":{
      "new_terms":{
         "composite":{
            "sources":[
               {
                  "host.name":{
                     "terms":{
                        "field":"host.name"
                     }
                  }
               }
            ],
            "size":10000
         }
      }
   },
   "runtime_mappings":{
      
   },
   "sort":[
      {
         "@timestamp":{
            "order":"asc",
            "unmapped_type":"date"
         }
      }
   ],
   "size":0
}

And it doesn't return null values in buckets. Per documentation, we need to introduce a parameter in "missing_bucket": true, on sources terms item.
I think we can dismiss this check entirely then. Downside of this, we would need to cast typing here, as bucket type here is following:

export interface AggregationsCompositeBucketKeys extends AggregationsMultiBucketBase {
    key: Record<string, any>;
}

On the other note: should we treat null value in term as a new one? In current implementation we dismiss nulls. But could it be a valid case: some field has become nullable for the first time in comparison to historical, do we want to generate alert about it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, the current phase 1 query excludes null values, but I think it's good to be defensive here so the rule is robust in case we do end up getting null values back by accident - either a subtle change in ES, or the query changes in the future. The include param on the terms agg for phase 2 rejects null values, so if a null did slip in with a single field rule it would break the rule.

null as a new value seems like it could be a valid use case, and seems like it would be possible with the new runtime field implementation for multiple fields since the null could be base64 encoded before inclusion in the terms include param. We'd just need to add some kind of special handling for it in the single field case - maybe a single extra query checking for null in that field over the history window when null is detected in the recent window?

I think adding full support for nulls is something we can leave for the future though, don't want to rush it in here. My immediate focus is just ensuring the implementation is robust.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added filtering for buckets with multiple fields and functional tests for it

Comment on lines 95 to 108
void traverseDocFields(def doc, def fields, def index, def line) {
if (index === fields.length) {
emit(line);
} else {
for (field in doc[fields[index]]) {
def delimiter = index === 0 ? '' : '${DELIMITER}';
def nextLine = line + delimiter + String.valueOf(field).encodeBase64();

traverseDocFields(doc, fields, index + 1, nextLine);
}
}
}

traverseDocFields(doc, params['fields'], 0, '');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome! Definitely the most complicated runtime field I've seen so far, but well worth it for this feature IMO.
A couple questions:

  1. Does anything break if we select multiple fields that each have many values? Certain fields such as process.args can be very large arrays of values, so even though we limit it to 3 fields, if each one has 100 values we'd be emitting 1,000,000 values here. Adding a hard cap on the number of emitted values could be useful, if ES doesn't have a circuit breaker built in already.
  2. Some integration test coverage to execute this runtime field specifically against test data and verify that it creates the expected values, particularly in edge cases like null values and large arrays, would be beneficial.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does anything break if we select multiple fields that each have many values? Certain fields such as process.args can be very large arrays of values, so even though we limit it to 3 fields, if each one has 100 values we'd be emitting 1,000,000 values here. Adding a hard cap on the number of emitted values could be useful, if ES doesn't have a circuit breaker built in already.

I've tried to perform test on large array: there is a limit on 100 emitted fields

    {
        "shard": 0,
        "index": "array-high-cardinality-data-fake_hosts-2022-11-04",
        "node": "yZmoJFdNRHe4uovryh5mDw",
        "reason": {
          "type": "script_exception",
          "reason": "runtime error",
          "script_stack": [
            "org.elasticsearch.server@8.6.0-SNAPSHOT/org.elasticsearch.script.AbstractFieldScript.checkMaxSize(AbstractFieldScript.java:125)",
            "org.elasticsearch.server@8.6.0-SNAPSHOT/org.elasticsearch.script.StringFieldScript$Emit.emit(StringFieldScript.java:145)",
            """emit(line);
            } else {
              """,
            "     ^---- HERE"
          ],
          "script": " ...",
          "lang": "painless",
          "position": {
            "offset": 140,
            "start": 135,
            "end": 182
          },
          "caused_by": {
            "type": "illegal_argument_exception",
            "reason": "Runtime field [new_terms_values] is emitting [101] values while the maximum number of values allowed is [100]"
          }
        }
      },

Which is, doesn't look to many indeed, but would prevent overuse of resources on ES side

Copy link
Contributor

@marshallmain marshallmain Nov 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the exception prevent results from being returned at all for that query, even for other documents that do not have too many values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marshallmain
Added following functionality

  • ensured that runtime field script does not emit more than 100 fields
  • added tests for preview that cover arrays with large number of items

TODO:

  • add tests for runtime script

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added tests for runtime script bc50e91

@vitaliidm vitaliidm enabled auto-merge (squash) November 14, 2022 15:23
Copy link
Contributor

@marshallmain marshallmain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice work!

@kibana-ci
Copy link
Collaborator

kibana-ci commented Nov 14, 2022

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

  • [job] [logs] Security Solution Tests #1 / Alert Flyout Opens a new timeline investigation with alert ids from the process ancestry

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
securitySolution 3285 3286 +1

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
securitySolution 9.6MB 9.6MB +1.3KB

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
securitySolution 51.0KB 51.0KB +41.0B
Unknown metric groups

ESLint disabled in files

id before after diff
osquery 1 2 +1

ESLint disabled line counts

id before after diff
enterpriseSearch 19 21 +2
fleet 59 65 +6
osquery 108 113 +5
securitySolution 441 447 +6
total +19

Total ESLint disabled count

id before after diff
enterpriseSearch 20 22 +2
fleet 67 73 +6
osquery 109 115 +6
securitySolution 518 524 +6
total +20

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @vitaliidm

@vitaliidm vitaliidm merged commit f1117c8 into elastic:main Nov 14, 2022
@kibanamachine kibanamachine added the backport:skip This commit does not require backporting label Nov 14, 2022
vitaliidm added a commit that referenced this pull request Nov 18, 2022
…ields (#145167)

## Summary

This PR improves performance of new terms multiple fields implementation
released in #143943
In comparison with single value fields it's faster in 2-2.5 times
In comparison with array field the win even more significant: 3-4 times

### Technical implementation:
Value for runtime field is emitted only if any of new terms fields
present in `include` clause of terms aggregation. It's achieved by
passing a values map of new terms fields to the runtime script and
checking new terms values in it. So, there is a significant performance
wins if runtime field is not matched with includes values:
   - new terms fields are not encoded in base64 within runtime script
   - runtime script doesn't emit any values
- runtime field doesn't have any value to be compared against during
aggregation, as its empty
- it eliminates possible unyielding execution branches early: if one of
items in array of first new terms field is not present in values map, no
need to run through the rest of combinations

As a result, this implementation overcomes the issue with non exhaustive
results due to the large number of emitted fields.
ES doesn't allow emitting more than 100 values in the runtime script, so
if the number of all combinations in new terms fields is greater than
100, only the first 100 combinations will be used for terms aggregation.
With this new implementation only matched fields will be emitting. Even
if its number is greater than 100, we will hit circuit breaker in
Security Solution: as rule run can't generate more than 100 alerts

### Performance measurements


Implementation | Shards | Docs per shard | Simultaneous Rule Executions
| Fields cardinality | Rule Execution Time (improved) | Rule Execution
Time (old)
-- | -- | -- | -- | -- | -- | --
Terms 1 field | 10 | 900,000 | 1 | 100,000 | 7s |  
Terms 2 fields | 10 | 900,000 | 1 | 100,000 | 17s | 33s
Terms 2 fields | 10 | 900,000 | 2 | 100,000 | 19s |  
Terms 3 fields | 10 | 900,000 | 1 | 100,000 | 18s | 46s
Terms 3 fields | 10 | 900,000 | 2 | 100,000 | 20s |  
  |   |   |   |   |   |  
Terms 1 field | 20 | 900,000 | 1 | 100,000 | 10.5s |  
Terms 2 fields | 20 | 900,000 | 1 | 100,000 | 28s | 55s
Terms 2 fields | 20 | 900,000 | 2 | 100,000 | 28.5s | 56s
Terms 3 fields | 20 | 900,000 | 1 | 100,000 | 30s | 75s
Terms 3 fields | 20 | 900,000 | 2 | 100,000 | 31s | 75s
  |   |   |   |   |   |  
Terms 1 field | 10 | 1,800,000 | 1 | 100,000 | 7s |  
Terms 2 fields | 10 | 1,800,000 | 1 | 100,000 | 24s | 50s
Terms 3 fields | 10 | 1,800,000 | 1 | 100,000 | 26s | 68s
  |   |   |   |   |   |  
array of unique values length 10 |   |   |   |   |   |  
Terms 1 field | 10 | 900,000 | 1 | 100,000 | 9.5s |  
Terms 2 fields | 10 | 900,000 | 1 | 100,000 | 75s | 3.5m
Terms 3 fields | 10 | 900,000 | 1 | 100,000 | 83s | 6m


### Checklist

Delete any items that are not applicable to this PR.

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Co-authored-by: Marshall Main <55718608+marshallmain@users.noreply.github.com>
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Nov 18, 2022
…ields (elastic#145167)

## Summary

This PR improves performance of new terms multiple fields implementation
released in elastic#143943
In comparison with single value fields it's faster in 2-2.5 times
In comparison with array field the win even more significant: 3-4 times

### Technical implementation:
Value for runtime field is emitted only if any of new terms fields
present in `include` clause of terms aggregation. It's achieved by
passing a values map of new terms fields to the runtime script and
checking new terms values in it. So, there is a significant performance
wins if runtime field is not matched with includes values:
   - new terms fields are not encoded in base64 within runtime script
   - runtime script doesn't emit any values
- runtime field doesn't have any value to be compared against during
aggregation, as its empty
- it eliminates possible unyielding execution branches early: if one of
items in array of first new terms field is not present in values map, no
need to run through the rest of combinations

As a result, this implementation overcomes the issue with non exhaustive
results due to the large number of emitted fields.
ES doesn't allow emitting more than 100 values in the runtime script, so
if the number of all combinations in new terms fields is greater than
100, only the first 100 combinations will be used for terms aggregation.
With this new implementation only matched fields will be emitting. Even
if its number is greater than 100, we will hit circuit breaker in
Security Solution: as rule run can't generate more than 100 alerts

### Performance measurements

Implementation | Shards | Docs per shard | Simultaneous Rule Executions
| Fields cardinality | Rule Execution Time (improved) | Rule Execution
Time (old)
-- | -- | -- | -- | -- | -- | --
Terms 1 field | 10 | 900,000 | 1 | 100,000 | 7s |  
Terms 2 fields | 10 | 900,000 | 1 | 100,000 | 17s | 33s
Terms 2 fields | 10 | 900,000 | 2 | 100,000 | 19s |  
Terms 3 fields | 10 | 900,000 | 1 | 100,000 | 18s | 46s
Terms 3 fields | 10 | 900,000 | 2 | 100,000 | 20s |  
  |   |   |   |   |   |  
Terms 1 field | 20 | 900,000 | 1 | 100,000 | 10.5s |  
Terms 2 fields | 20 | 900,000 | 1 | 100,000 | 28s | 55s
Terms 2 fields | 20 | 900,000 | 2 | 100,000 | 28.5s | 56s
Terms 3 fields | 20 | 900,000 | 1 | 100,000 | 30s | 75s
Terms 3 fields | 20 | 900,000 | 2 | 100,000 | 31s | 75s
  |   |   |   |   |   |  
Terms 1 field | 10 | 1,800,000 | 1 | 100,000 | 7s |  
Terms 2 fields | 10 | 1,800,000 | 1 | 100,000 | 24s | 50s
Terms 3 fields | 10 | 1,800,000 | 1 | 100,000 | 26s | 68s
  |   |   |   |   |   |  
array of unique values length 10 |   |   |   |   |   |  
Terms 1 field | 10 | 900,000 | 1 | 100,000 | 9.5s |  
Terms 2 fields | 10 | 900,000 | 1 | 100,000 | 75s | 3.5m
Terms 3 fields | 10 | 900,000 | 1 | 100,000 | 83s | 6m

### Checklist

Delete any items that are not applicable to this PR.

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Co-authored-by: Marshall Main <55718608+marshallmain@users.noreply.github.com>
(cherry picked from commit b985ec1)
kibanamachine referenced this pull request Nov 18, 2022
…ulti fields (#145167) (#145697)

# Backport

This will backport the following commits from `main` to `8.6`:
- [[Security Solution][Alerts] improves performance of new terms multi
fields (#145167)](#145167)

<!--- Backport version: 8.9.7 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Vitalii
Dmyterko","email":"92328789+vitaliidm@users.noreply.github.com"},"sourceCommit":{"committedDate":"2022-11-18T09:02:15Z","message":"[Security
Solution][Alerts] improves performance of new terms multi fields
(#145167)\n\n## Summary\r\n\r\nThis PR improves performance of new terms
multiple fields implementation\r\nreleased in
https://github.com/elastic/kibana/pull/143943\r\nIn comparison with
single value fields it's faster in 2-2.5 times\r\nIn comparison with
array field the win even more significant: 3-4 times\r\n\r\n###
Technical implementation:\r\nValue for runtime field is emitted only if
any of new terms fields\r\npresent in `include` clause of terms
aggregation. It's achieved by\r\npassing a values map of new terms
fields to the runtime script and\r\nchecking new terms values in it. So,
there is a significant performance\r\nwins if runtime field is not
matched with includes values:\r\n - new terms fields are not encoded in
base64 within runtime script\r\n - runtime script doesn't emit any
values\r\n- runtime field doesn't have any value to be compared against
during\r\naggregation, as its empty\r\n- it eliminates possible
unyielding execution branches early: if one of\r\nitems in array of
first new terms field is not present in values map, no\r\nneed to run
through the rest of combinations\r\n\r\nAs a result, this implementation
overcomes the issue with non exhaustive\r\nresults due to the large
number of emitted fields.\r\nES doesn't allow emitting more than 100
values in the runtime script, so\r\nif the number of all combinations in
new terms fields is greater than\r\n100, only the first 100 combinations
will be used for terms aggregation.\r\nWith this new implementation only
matched fields will be emitting. Even\r\nif its number is greater than
100, we will hit circuit breaker in\r\nSecurity Solution: as rule run
can't generate more than 100 alerts\r\n\r\n### Performance
measurements\r\n\r\n\r\nImplementation | Shards | Docs per shard |
Simultaneous Rule Executions\r\n| Fields cardinality | Rule Execution
Time (improved) | Rule Execution\r\nTime (old)\r\n-- | -- | -- | -- | --
| -- | --\r\nTerms 1 field | 10 | 900,000 | 1 | 100,000 | 7s |
 \r\nTerms 2 fields | 10 | 900,000 | 1 | 100,000 | 17s | 33s\r\nTerms 2
fields | 10 | 900,000 | 2 | 100,000 | 19s |  \r\nTerms 3 fields | 10 |
900,000 | 1 | 100,000 | 18s | 46s\r\nTerms 3 fields | 10 | 900,000 | 2 |
100,000 | 20s |  \r\n  |   |   |   |   |   |  \r\nTerms 1 field | 20 |
900,000 | 1 | 100,000 | 10.5s |  \r\nTerms 2 fields | 20 | 900,000 | 1 |
100,000 | 28s | 55s\r\nTerms 2 fields | 20 | 900,000 | 2 | 100,000 |
28.5s | 56s\r\nTerms 3 fields | 20 | 900,000 | 1 | 100,000 | 30s |
75s\r\nTerms 3 fields | 20 | 900,000 | 2 | 100,000 | 31s | 75s\r\n  |  
|   |   |   |   |  \r\nTerms 1 field | 10 | 1,800,000 | 1 | 100,000 | 7s
|  \r\nTerms 2 fields | 10 | 1,800,000 | 1 | 100,000 | 24s |
50s\r\nTerms 3 fields | 10 | 1,800,000 | 1 | 100,000 | 26s | 68s\r\n  |
  |   |   |   |   |  \r\narray of unique values length 10 |   |   |   |
  |   |  \r\nTerms 1 field | 10 | 900,000 | 1 | 100,000 | 9.5s |
 \r\nTerms 2 fields | 10 | 900,000 | 1 | 100,000 | 75s | 3.5m\r\nTerms 3
fields | 10 | 900,000 | 1 | 100,000 | 83s | 6m\r\n\r\n\r\n###
Checklist\r\n\r\nDelete any items that are not applicable to this
PR.\r\n\r\n- [x] [Unit or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common
scenarios\r\n\r\nCo-authored-by: kibanamachine
<42973632+kibanamachine@users.noreply.github.com>\r\nCo-authored-by:
Marshall Main
<55718608+marshallmain@users.noreply.github.com>","sha":"b985ec1735d432bf4cf13fb4a4610dabb5c987c6","branchLabelMapping":{"^v8.7.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["performance","release_note:skip","Team:
SecuritySolution","Team:Detection
Alerts","backport:prev-minor","v8.6.0","v8.7.0"],"number":145167,"url":"https://github.com/elastic/kibana/pull/145167","mergeCommit":{"message":"[Security
Solution][Alerts] improves performance of new terms multi fields
(#145167)\n\n## Summary\r\n\r\nThis PR improves performance of new terms
multiple fields implementation\r\nreleased in
https://github.com/elastic/kibana/pull/143943\r\nIn comparison with
single value fields it's faster in 2-2.5 times\r\nIn comparison with
array field the win even more significant: 3-4 times\r\n\r\n###
Technical implementation:\r\nValue for runtime field is emitted only if
any of new terms fields\r\npresent in `include` clause of terms
aggregation. It's achieved by\r\npassing a values map of new terms
fields to the runtime script and\r\nchecking new terms values in it. So,
there is a significant performance\r\nwins if runtime field is not
matched with includes values:\r\n - new terms fields are not encoded in
base64 within runtime script\r\n - runtime script doesn't emit any
values\r\n- runtime field doesn't have any value to be compared against
during\r\naggregation, as its empty\r\n- it eliminates possible
unyielding execution branches early: if one of\r\nitems in array of
first new terms field is not present in values map, no\r\nneed to run
through the rest of combinations\r\n\r\nAs a result, this implementation
overcomes the issue with non exhaustive\r\nresults due to the large
number of emitted fields.\r\nES doesn't allow emitting more than 100
values in the runtime script, so\r\nif the number of all combinations in
new terms fields is greater than\r\n100, only the first 100 combinations
will be used for terms aggregation.\r\nWith this new implementation only
matched fields will be emitting. Even\r\nif its number is greater than
100, we will hit circuit breaker in\r\nSecurity Solution: as rule run
can't generate more than 100 alerts\r\n\r\n### Performance
measurements\r\n\r\n\r\nImplementation | Shards | Docs per shard |
Simultaneous Rule Executions\r\n| Fields cardinality | Rule Execution
Time (improved) | Rule Execution\r\nTime (old)\r\n-- | -- | -- | -- | --
| -- | --\r\nTerms 1 field | 10 | 900,000 | 1 | 100,000 | 7s |
 \r\nTerms 2 fields | 10 | 900,000 | 1 | 100,000 | 17s | 33s\r\nTerms 2
fields | 10 | 900,000 | 2 | 100,000 | 19s |  \r\nTerms 3 fields | 10 |
900,000 | 1 | 100,000 | 18s | 46s\r\nTerms 3 fields | 10 | 900,000 | 2 |
100,000 | 20s |  \r\n  |   |   |   |   |   |  \r\nTerms 1 field | 20 |
900,000 | 1 | 100,000 | 10.5s |  \r\nTerms 2 fields | 20 | 900,000 | 1 |
100,000 | 28s | 55s\r\nTerms 2 fields | 20 | 900,000 | 2 | 100,000 |
28.5s | 56s\r\nTerms 3 fields | 20 | 900,000 | 1 | 100,000 | 30s |
75s\r\nTerms 3 fields | 20 | 900,000 | 2 | 100,000 | 31s | 75s\r\n  |  
|   |   |   |   |  \r\nTerms 1 field | 10 | 1,800,000 | 1 | 100,000 | 7s
|  \r\nTerms 2 fields | 10 | 1,800,000 | 1 | 100,000 | 24s |
50s\r\nTerms 3 fields | 10 | 1,800,000 | 1 | 100,000 | 26s | 68s\r\n  |
  |   |   |   |   |  \r\narray of unique values length 10 |   |   |   |
  |   |  \r\nTerms 1 field | 10 | 900,000 | 1 | 100,000 | 9.5s |
 \r\nTerms 2 fields | 10 | 900,000 | 1 | 100,000 | 75s | 3.5m\r\nTerms 3
fields | 10 | 900,000 | 1 | 100,000 | 83s | 6m\r\n\r\n\r\n###
Checklist\r\n\r\nDelete any items that are not applicable to this
PR.\r\n\r\n- [x] [Unit or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common
scenarios\r\n\r\nCo-authored-by: kibanamachine
<42973632+kibanamachine@users.noreply.github.com>\r\nCo-authored-by:
Marshall Main
<55718608+marshallmain@users.noreply.github.com>","sha":"b985ec1735d432bf4cf13fb4a4610dabb5c987c6"}},"sourceBranch":"main","suggestedTargetBranches":["8.6"],"targetPullRequestStates":[{"branch":"8.6","label":"v8.6.0","labelRegex":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"main","label":"v8.7.0","labelRegex":"^v8.7.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/145167","number":145167,"mergeCommit":{"message":"[Security
Solution][Alerts] improves performance of new terms multi fields
(#145167)\n\n## Summary\r\n\r\nThis PR improves performance of new terms
multiple fields implementation\r\nreleased in
https://github.com/elastic/kibana/pull/143943\r\nIn comparison with
single value fields it's faster in 2-2.5 times\r\nIn comparison with
array field the win even more significant: 3-4 times\r\n\r\n###
Technical implementation:\r\nValue for runtime field is emitted only if
any of new terms fields\r\npresent in `include` clause of terms
aggregation. It's achieved by\r\npassing a values map of new terms
fields to the runtime script and\r\nchecking new terms values in it. So,
there is a significant performance\r\nwins if runtime field is not
matched with includes values:\r\n - new terms fields are not encoded in
base64 within runtime script\r\n - runtime script doesn't emit any
values\r\n- runtime field doesn't have any value to be compared against
during\r\naggregation, as its empty\r\n- it eliminates possible
unyielding execution branches early: if one of\r\nitems in array of
first new terms field is not present in values map, no\r\nneed to run
through the rest of combinations\r\n\r\nAs a result, this implementation
overcomes the issue with non exhaustive\r\nresults due to the large
number of emitted fields.\r\nES doesn't allow emitting more than 100
values in the runtime script, so\r\nif the number of all combinations in
new terms fields is greater than\r\n100, only the first 100 combinations
will be used for terms aggregation.\r\nWith this new implementation only
matched fields will be emitting. Even\r\nif its number is greater than
100, we will hit circuit breaker in\r\nSecurity Solution: as rule run
can't generate more than 100 alerts\r\n\r\n### Performance
measurements\r\n\r\n\r\nImplementation | Shards | Docs per shard |
Simultaneous Rule Executions\r\n| Fields cardinality | Rule Execution
Time (improved) | Rule Execution\r\nTime (old)\r\n-- | -- | -- | -- | --
| -- | --\r\nTerms 1 field | 10 | 900,000 | 1 | 100,000 | 7s |
 \r\nTerms 2 fields | 10 | 900,000 | 1 | 100,000 | 17s | 33s\r\nTerms 2
fields | 10 | 900,000 | 2 | 100,000 | 19s |  \r\nTerms 3 fields | 10 |
900,000 | 1 | 100,000 | 18s | 46s\r\nTerms 3 fields | 10 | 900,000 | 2 |
100,000 | 20s |  \r\n  |   |   |   |   |   |  \r\nTerms 1 field | 20 |
900,000 | 1 | 100,000 | 10.5s |  \r\nTerms 2 fields | 20 | 900,000 | 1 |
100,000 | 28s | 55s\r\nTerms 2 fields | 20 | 900,000 | 2 | 100,000 |
28.5s | 56s\r\nTerms 3 fields | 20 | 900,000 | 1 | 100,000 | 30s |
75s\r\nTerms 3 fields | 20 | 900,000 | 2 | 100,000 | 31s | 75s\r\n  |  
|   |   |   |   |  \r\nTerms 1 field | 10 | 1,800,000 | 1 | 100,000 | 7s
|  \r\nTerms 2 fields | 10 | 1,800,000 | 1 | 100,000 | 24s |
50s\r\nTerms 3 fields | 10 | 1,800,000 | 1 | 100,000 | 26s | 68s\r\n  |
  |   |   |   |   |  \r\narray of unique values length 10 |   |   |   |
  |   |  \r\nTerms 1 field | 10 | 900,000 | 1 | 100,000 | 9.5s |
 \r\nTerms 2 fields | 10 | 900,000 | 1 | 100,000 | 75s | 3.5m\r\nTerms 3
fields | 10 | 900,000 | 1 | 100,000 | 83s | 6m\r\n\r\n\r\n###
Checklist\r\n\r\nDelete any items that are not applicable to this
PR.\r\n\r\n- [x] [Unit or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common
scenarios\r\n\r\nCo-authored-by: kibanamachine
<42973632+kibanamachine@users.noreply.github.com>\r\nCo-authored-by:
Marshall Main
<55718608+marshallmain@users.noreply.github.com>","sha":"b985ec1735d432bf4cf13fb4a4610dabb5c987c6"}}]}]
BACKPORT-->

Co-authored-by: Vitalii Dmyterko <92328789+vitaliidm@users.noreply.github.com>
marshallmain added a commit that referenced this pull request Nov 28, 2022
…ression and new terms multi fields (#145775)

## Summary

Adds new tour highlighting new rule capabilities in 8.6 - new terms
multi fields (#143943) and alert
suppression (#142686).

I tried using the generic `RulesFeatureTour` again
(main...marshallmain:kibana:failed-tour)
but it still crashes the page.

I also looked at integrating this tour with the Guided onboarding tour
for rules management (#145223),
but concluded that they should be separate since guided onboarding is
experimental and this tour should be displayed to users even if they are
not new users.

This PR is essentially a copy of the new terms tour in 8.4
(#138469).
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Nov 28, 2022
…ression and new terms multi fields (elastic#145775)

## Summary

Adds new tour highlighting new rule capabilities in 8.6 - new terms
multi fields (elastic#143943) and alert
suppression (elastic#142686).

I tried using the generic `RulesFeatureTour` again
(elastic/kibana@main...marshallmain:kibana:failed-tour)
but it still crashes the page.

I also looked at integrating this tour with the Guided onboarding tour
for rules management (elastic#145223),
but concluded that they should be separate since guided onboarding is
experimental and this tour should be displayed to users even if they are
not new users.

This PR is essentially a copy of the new terms tour in 8.4
(elastic#138469).

(cherry picked from commit 13c1b0b)
kibanamachine added a commit that referenced this pull request Nov 29, 2022
…r suppression and new terms multi fields (#145775) (#146479)

# Backport

This will backport the following commits from `main` to `8.6`:
- [[Security Solution][Alerts] Add tour to rule management page for
suppression and new terms multi fields
(#145775)](#145775)

<!--- Backport version: 8.9.7 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Marshall
Main","email":"55718608+marshallmain@users.noreply.github.com"},"sourceCommit":{"committedDate":"2022-11-28T21:35:02Z","message":"[Security
Solution][Alerts] Add tour to rule management page for suppression and
new terms multi fields (#145775)\n\n## Summary\r\n\r\nAdds new tour
highlighting new rule capabilities in 8.6 - new terms\r\nmulti fields
(#143943) and alert\r\nsuppression
(https://github.com/elastic/kibana/pull/142686).\r\n\r\nI tried using
the generic `RulesFeatureTour`
again\r\n(https://github.com/elastic/kibana/compare/main...marshallmain:kibana:failed-tour)\r\nbut
it still crashes the page.\r\n\r\nI also looked at integrating this tour
with the Guided onboarding tour\r\nfor rules management
(https://github.com/elastic/kibana/pull/145223),\r\nbut concluded that
they should be separate since guided onboarding is\r\nexperimental and
this tour should be displayed to users even if they are\r\nnot new
users.\r\n\r\nThis PR is essentially a copy of the new terms tour in
8.4\r\n(https://github.com/elastic/kibana/pull/138469).","sha":"13c1b0b863b7d8b324d33f2aaf45d90d5c8c108e","branchLabelMapping":{"^v8.7.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","Team:
SecuritySolution","Team:Detection
Alerts","v8.6.0","v8.7.0"],"number":145775,"url":"https://github.com/elastic/kibana/pull/145775","mergeCommit":{"message":"[Security
Solution][Alerts] Add tour to rule management page for suppression and
new terms multi fields (#145775)\n\n## Summary\r\n\r\nAdds new tour
highlighting new rule capabilities in 8.6 - new terms\r\nmulti fields
(#143943) and alert\r\nsuppression
(https://github.com/elastic/kibana/pull/142686).\r\n\r\nI tried using
the generic `RulesFeatureTour`
again\r\n(https://github.com/elastic/kibana/compare/main...marshallmain:kibana:failed-tour)\r\nbut
it still crashes the page.\r\n\r\nI also looked at integrating this tour
with the Guided onboarding tour\r\nfor rules management
(https://github.com/elastic/kibana/pull/145223),\r\nbut concluded that
they should be separate since guided onboarding is\r\nexperimental and
this tour should be displayed to users even if they are\r\nnot new
users.\r\n\r\nThis PR is essentially a copy of the new terms tour in
8.4\r\n(https://github.com/elastic/kibana/pull/138469).","sha":"13c1b0b863b7d8b324d33f2aaf45d90d5c8c108e"}},"sourceBranch":"main","suggestedTargetBranches":["8.6"],"targetPullRequestStates":[{"branch":"8.6","label":"v8.6.0","labelRegex":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"main","label":"v8.7.0","labelRegex":"^v8.7.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/145775","number":145775,"mergeCommit":{"message":"[Security
Solution][Alerts] Add tour to rule management page for suppression and
new terms multi fields (#145775)\n\n## Summary\r\n\r\nAdds new tour
highlighting new rule capabilities in 8.6 - new terms\r\nmulti fields
(#143943) and alert\r\nsuppression
(https://github.com/elastic/kibana/pull/142686).\r\n\r\nI tried using
the generic `RulesFeatureTour`
again\r\n(https://github.com/elastic/kibana/compare/main...marshallmain:kibana:failed-tour)\r\nbut
it still crashes the page.\r\n\r\nI also looked at integrating this tour
with the Guided onboarding tour\r\nfor rules management
(https://github.com/elastic/kibana/pull/145223),\r\nbut concluded that
they should be separate since guided onboarding is\r\nexperimental and
this tour should be displayed to users even if they are\r\nnot new
users.\r\n\r\nThis PR is essentially a copy of the new terms tour in
8.4\r\n(https://github.com/elastic/kibana/pull/138469).","sha":"13c1b0b863b7d8b324d33f2aaf45d90d5c8c108e"}}]}]
BACKPORT-->

Co-authored-by: Marshall Main <55718608+marshallmain@users.noreply.github.com>
@marshallmain marshallmain mentioned this pull request Jan 11, 2023
3 tasks
@vitaliidm vitaliidm deleted the alerts/multi-fields-new-terms branch March 4, 2024 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting ci:cloud-deploy Create or update a Cloud deployment Feature:Detection Alerts Security Solution Detection Alerts Feature release_note:enhancement Team:Detection Alerts Security Detection Alerts Area Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. v8.6.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants