Skip to content

Commit

Permalink
[Security Solution][Alerts] improves performance of new terms multi f…
Browse files Browse the repository at this point in the history
…ields (#145167)

## Summary

This PR improves performance of new terms multiple fields implementation
released in #143943
In comparison with single value fields it's faster in 2-2.5 times
In comparison with array field the win even more significant: 3-4 times

### Technical implementation:
Value for runtime field is emitted only if any of new terms fields
present in `include` clause of terms aggregation. It's achieved by
passing a values map of new terms fields to the runtime script and
checking new terms values in it. So, there is a significant performance
wins if runtime field is not matched with includes values:
   - new terms fields are not encoded in base64 within runtime script
   - runtime script doesn't emit any values
- runtime field doesn't have any value to be compared against during
aggregation, as its empty
- it eliminates possible unyielding execution branches early: if one of
items in array of first new terms field is not present in values map, no
need to run through the rest of combinations

As a result, this implementation overcomes the issue with non exhaustive
results due to the large number of emitted fields.
ES doesn't allow emitting more than 100 values in the runtime script, so
if the number of all combinations in new terms fields is greater than
100, only the first 100 combinations will be used for terms aggregation.
With this new implementation only matched fields will be emitting. Even
if its number is greater than 100, we will hit circuit breaker in
Security Solution: as rule run can't generate more than 100 alerts

### Performance measurements

Implementation | Shards | Docs per shard | Simultaneous Rule Executions
| Fields cardinality | Rule Execution Time (improved) | Rule Execution
Time (old)
-- | -- | -- | -- | -- | -- | --
Terms 1 field | 10 | 900,000 | 1 | 100,000 | 7s |  
Terms 2 fields | 10 | 900,000 | 1 | 100,000 | 17s | 33s
Terms 2 fields | 10 | 900,000 | 2 | 100,000 | 19s |  
Terms 3 fields | 10 | 900,000 | 1 | 100,000 | 18s | 46s
Terms 3 fields | 10 | 900,000 | 2 | 100,000 | 20s |  
  |   |   |   |   |   |  
Terms 1 field | 20 | 900,000 | 1 | 100,000 | 10.5s |  
Terms 2 fields | 20 | 900,000 | 1 | 100,000 | 28s | 55s
Terms 2 fields | 20 | 900,000 | 2 | 100,000 | 28.5s | 56s
Terms 3 fields | 20 | 900,000 | 1 | 100,000 | 30s | 75s
Terms 3 fields | 20 | 900,000 | 2 | 100,000 | 31s | 75s
  |   |   |   |   |   |  
Terms 1 field | 10 | 1,800,000 | 1 | 100,000 | 7s |  
Terms 2 fields | 10 | 1,800,000 | 1 | 100,000 | 24s | 50s
Terms 3 fields | 10 | 1,800,000 | 1 | 100,000 | 26s | 68s
  |   |   |   |   |   |  
array of unique values length 10 |   |   |   |   |   |  
Terms 1 field | 10 | 900,000 | 1 | 100,000 | 9.5s |  
Terms 2 fields | 10 | 900,000 | 1 | 100,000 | 75s | 3.5m
Terms 3 fields | 10 | 900,000 | 1 | 100,000 | 83s | 6m

### Checklist

Delete any items that are not applicable to this PR.

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Co-authored-by: Marshall Main <55718608+marshallmain@users.noreply.github.com>
(cherry picked from commit b985ec1)
  • Loading branch information
vitaliidm committed Nov 18, 2022
1 parent df1c4ed commit 6b66276
Show file tree
Hide file tree
Showing 6 changed files with 1,800 additions and 21 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -27,4 +27,7 @@ The new terms rule type reuses the singleSearchAfter function which implements t
## Limitations and future enhancements

- Value list exceptions are not supported at the moment. Commit ead04ce removes an experimental method I tried for evaluating value list exceptions.
- Runtime field supports only 100 emitted values. So for large arrays or combination of values greater than 100, results may not be exhaustive. This applies only to new terms with multiple fields
- Runtime field supports only 100 emitted values. So for large arrays or combination of values greater than 100, results may not be exhaustive. This applies only to new terms with multiple fields.
Following edge cases possible:
- false negatives (alert is not generated) if too many fields were emitted and actual new values are not getting evaluated if it happened in document in rule run window.
- false positives (wrong alert generated) if too many fields were emitted in historical document and some old terms are not getting evaluated against values in new documents.
Original file line number Diff line number Diff line change
Expand Up @@ -193,6 +193,11 @@ export const createNewTermsAlertType = (
}
const bucketsForField = searchResultWithAggs.aggregations.new_terms.buckets;
const includeValues = transformBucketsToValues(params.newTermsFields, bucketsForField);
const newTermsRuntimeMappings = getNewTermsRuntimeMappings(
params.newTermsFields,
bucketsForField
);

// PHASE 2: Take the page of results from Phase 1 and determine if each term exists in the history window.
// The aggregation filters out buckets for terms that exist prior to `tuple.from`, so the buckets in the
// response correspond to each new term.
Expand All @@ -209,7 +214,7 @@ export const createNewTermsAlertType = (
}),
runtimeMappings: {
...runtimeMappings,
...getNewTermsRuntimeMappings(params.newTermsFields),
...newTermsRuntimeMappings,
},
searchAfterSortIds: undefined,
index: inputIndex,
Expand Down Expand Up @@ -255,7 +260,7 @@ export const createNewTermsAlertType = (
}),
runtimeMappings: {
...runtimeMappings,
...getNewTermsRuntimeMappings(params.newTermsFields),
...newTermsRuntimeMappings,
},
searchAfterSortIds: undefined,
index: inputIndex,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ import {
getAggregationField,
decodeMatchedValues,
getNewTermsRuntimeMappings,
createFieldValuesMap,
AGG_FIELD_NAME,
} from './utils';

Expand Down Expand Up @@ -190,22 +191,185 @@ describe('new terms utils', () => {

describe('getNewTermsRuntimeMappings', () => {
it('should not return runtime field if new terms fields is empty', () => {
expect(getNewTermsRuntimeMappings([])).toBeUndefined();
expect(getNewTermsRuntimeMappings([], [])).toBeUndefined();
});
it('should not return runtime field if new terms fields has only one field', () => {
expect(getNewTermsRuntimeMappings(['host.name'])).toBeUndefined();
expect(getNewTermsRuntimeMappings(['host.name'], [])).toBeUndefined();
});

it('should return runtime field if new terms fields has more than one field', () => {
const runtimeMappings = getNewTermsRuntimeMappings(['host.name', 'host.ip']);
const runtimeMappings = getNewTermsRuntimeMappings(
['source.host', 'source.ip'],
[
{
key: {
'source.host': 'host-0',
'source.ip': '127.0.0.1',
},
doc_count: 1,
},
{
key: {
'source.host': 'host-1',
'source.ip': '127.0.0.1',
},
doc_count: 1,
},
]
);

expect(runtimeMappings?.[AGG_FIELD_NAME]).toMatchObject({
type: 'keyword',
script: {
params: { fields: ['host.name', 'host.ip'] },
params: {
fields: ['source.host', 'source.ip'],
values: {
'source.host': {
'host-0': true,
'host-1': true,
},
'source.ip': {
'127.0.0.1': true,
},
},
},
source: expect.any(String),
},
});
});
});
});

describe('createFieldValuesMap', () => {
it('should return undefined if new terms fields has only one field', () => {
expect(
createFieldValuesMap(
['host.name'],
[
{
key: {
'source.host': 'host-0',
},
doc_count: 1,
},
{
key: {
'source.host': 'host-1',
},
doc_count: 3,
},
]
)
).toBeUndefined();
});

it('should return values map if new terms fields has more than one field', () => {
expect(
createFieldValuesMap(
['source.host', 'source.ip'],
[
{
key: {
'source.host': 'host-0',
'source.ip': '127.0.0.1',
},
doc_count: 1,
},
{
key: {
'source.host': 'host-1',
'source.ip': '127.0.0.1',
},
doc_count: 1,
},
]
)
).toEqual({
'source.host': {
'host-0': true,
'host-1': true,
},
'source.ip': {
'127.0.0.1': true,
},
});
});

it('should not put value in map if it is null', () => {
expect(
createFieldValuesMap(
['source.host', 'source.ip'],
[
{
key: {
'source.host': 'host-1',
'source.ip': null,
},
doc_count: 1,
},
]
)
).toEqual({
'source.host': {
'host-1': true,
},
'source.ip': {},
});
});

it('should put value in map if it is a number', () => {
expect(
createFieldValuesMap(
['source.host', 'source.id'],
[
{
key: {
'source.host': 'host-1',
'source.id': 100,
},
doc_count: 1,
},
]
)
).toEqual({
'source.host': {
'host-1': true,
},
'source.id': {
'100': true,
},
});
});

it('should put value in map if it is a boolean', () => {
expect(
createFieldValuesMap(
['source.host', 'user.enabled'],
[
{
key: {
'source.host': 'host-1',
'user.enabled': true,
},
doc_count: 1,
},
{
key: {
'source.host': 'host-1',
'user.enabled': false,
},
doc_count: 1,
},
]
)
).toEqual({
'source.host': {
'host-1': true,
},
'user.enabled': {
true: true,
false: true,
},
});
});
});
Original file line number Diff line number Diff line change
Expand Up @@ -80,19 +80,55 @@ export const transformBucketsToValues = (
);
};

/**
* transforms arrays of new terms fields and its values in object
* [new_terms_field]: { [value1]: true, [value1]: true }
* It's needed to have constant time complexity of accessing whether value is present in new terms
* It will be passed to Painless script used in runtime field
*/
export const createFieldValuesMap = (
newTermsFields: string[],
buckets: estypes.AggregationsCompositeBucket[]
) => {
if (newTermsFields.length === 1) {
return undefined;
}

const valuesMap = newTermsFields.reduce<Record<string, Record<string, boolean>>>(
(acc, field) => ({ ...acc, [field]: {} }),
{}
);

buckets
.map((bucket) => bucket.key)
.forEach((bucket) => {
Object.entries(bucket).forEach(([key, value]) => {
if (value == null) {
return;
}
const strValue = typeof value !== 'string' ? value.toString() : value;
valuesMap[key][strValue] = true;
});
});

return valuesMap;
};

export const getNewTermsRuntimeMappings = (
newTermsFields: string[]
newTermsFields: string[],
buckets: estypes.AggregationsCompositeBucket[]
): undefined | { [AGG_FIELD_NAME]: estypes.MappingRuntimeField } => {
// if new terms include only one field we don't use runtime mappings and don't stich fields buckets together
if (newTermsFields.length <= 1) {
return undefined;
}

const values = createFieldValuesMap(newTermsFields, buckets);
return {
[AGG_FIELD_NAME]: {
type: 'keyword',
script: {
params: { fields: newTermsFields },
params: { fields: newTermsFields, values },
source: `
def stack = new Stack();
// ES has limit in 100 values for runtime field, after this query will fail
Expand All @@ -110,9 +146,14 @@ export const getNewTermsRuntimeMappings = (
emit(line);
emitLimit = emitLimit - 1;
} else {
for (field in doc[params['fields'][index]]) {
def fieldName = params['fields'][index];
for (field in doc[fieldName]) {
def fieldStr = String.valueOf(field);
if (!params['values'][fieldName].containsKey(fieldStr)) {
continue;
}
def delimiter = index === 0 ? '' : '${DELIMITER}';
def nextLine = line + delimiter + String.valueOf(field).encodeBase64();
def nextLine = line + delimiter + fieldStr.encodeBase64();
stack.add([index + 1, nextLine])
}
Expand Down
Loading

0 comments on commit 6b66276

Please sign in to comment.