Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discuss] Elastic Security Indicator Match Rule tuning and optimizations #64746

Closed
spong opened this issue Nov 7, 2020 · 9 comments
Closed
Labels
:Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team

Comments

@spong
Copy link
Member

spong commented Nov 7, 2020

In Elastic Security 7.10 we're introducing a new Detection Engine Rule type called Indicator/Threat Matching (elastic/kibana#77395, elastic/kibana#78955), which will allow users to use the results from querying one index (threat index) to filter/query data in a second index (source index). This feature is being released as beta, and can be quite resource intensive, so the hope here is to get a better understanding of what we can do to optimize and tune our current algorithm/search strategy for optimal performance on the Elasticsearch side.

I'll try to keep the detections/security language to a minimum, but gist here is that every 5 minutes a search will be performed against the previous 5 minutes to see if a combination of fields from one index (threat index) exists in a second index (source index).

Input

The user configuration is essentially as follows:

Where users have the ability to specify any number of Source Indices, a Source Query, any number of Threat Indices, a Threat Query, and a Threat Mapping object, specifying the field mapping between the source and threat indices.

A sample configuration would look like the following:

{
  "concurrent_searches": 10,
  "items_per_search": 10,
  "index": ["auditbeat-*", "endgame-*", "filebeat-*", "logs-*", "packetbeat-*", "winlogbeat-*"],
  "name": "Indicator Match Concurrent Searches",
  "description": "Does 100 Concurrent searches with 10 items per search",
  "rule_id": "indicator_concurrent_search",
  "risk_score": 1,
  "severity": "high",
  "type": "threat_match",
  "query": "*:*",
  "tags": ["concurrent_searches_test", "from_script"],
  "threat_index": ["mock-threat-list-1"],
  "threat_language": "kuery",
  "threat_query": "*:*",
  "threat_mapping": [
    {
      "entries": [
        {
          "field": "source.port",
          "type": "mapping",
          "value": "source.port"
        },
        {
          "field": "source.ip",
          "type": "mapping",
          "value": "source.ip"
        },
        {
          "field": "host.name",
          "type": "mapping",
          "value": "host.name"
        },
      ]
    }
  ]
}

Search Strategy

We begin by querying the threat index and storing the results in memory within Kibana as we work our way through the list. Lists are batched into memory in buckets of 9000 documents at a time (a large threat list could be ~400k-600k documents).

Once the list is in memory, we use the above items_per_search and concurrent_searches settings to chunk the processing. For the above configuration, we'll create 10 queries, each with 10 threat items as filters, and then execute them all at once. Once all requests have returned, we check for results (which for the majority of the time will be 0), and continue searching through the next block of 100 items (10x10), pulling more into memory as needed, until we've searched for all items.

Sample query with `items_per_search:10`

{
  "bool": {
    "must": [],
    "filter": [
      {
        "match_all": {}
      },
      {
        "bool": {
          "should": [
            {
              "bool": {
                "should": [
                  {
                    "bool": {
                      "filter": [
                        {
                          "bool": {
                            "should": [
                              {
                                "match": {
                                  "host.name": "siem-windows"
                                }
                              }
                            ],
                            "minimum_should_match": 1
                          }
                        }
                      ]
                    }
                  }
                ],
                "minimum_should_match": 1
              }
            },
            {
              "bool": {
                "should": [
                  {
                    "bool": {
                      "filter": [
                        {
                          "bool": {
                            "should": [
                              {
                                "match": {
                                  "source.port": "443"
                                }
                              }
                            ],
                            "minimum_should_match": 1
                          }
                        }
                      ]
                    }
                  }
                ],
                "minimum_should_match": 1
              }
            },
            {
              "bool": {
                "should": [
                  {
                    "bool": {
                      "filter": [
                        {
                          "bool": {
                            "should": [
                              {
                                "match": {
                                  "host.name": "siem-kibana"
                                }
                              }
                            ],
                            "minimum_should_match": 1
                          }
                        }
                      ]
                    }
                  }
                ],
                "minimum_should_match": 1
              }
            },
            {
              "bool": {
                "should": [
                  {
                    "bool": {
                      "filter": [
                        {
                          "bool": {
                            "should": [
                              {
                                "match": {
                                  "source.port": "1"
                                }
                              }
                            ],
                            "minimum_should_match": 1
                          }
                        }
                      ]
                    }
                  },
                  {
                    "bool": {
                      "filter": [
                        {
                          "bool": {
                            "should": [
                              {
                                "match": {
                                  "source.ip": "127.0.0.1"
                                }
                              }
                            ],
                            "minimum_should_match": 1
                          }
                        }
                      ]
                    }
                  }
                ],
                "minimum_should_match": 1
              }
            },
            {
              "bool": {
                "should": [
                  {
                    "bool": {
                      "filter": [
                        {
                          "bool": {
                            "should": [
                              {
                                "match": {
                                  "source.port": "2"
                                }
                              }
                            ],
                            "minimum_should_match": 1
                          }
                        }
                      ]
                    }
                  },
                  {
                    "bool": {
                      "filter": [
                        {
                          "bool": {
                            "should": [
                              {
                                "match": {
                                  "source.ip": "127.0.0.1"
                                }
                              }
                            ],
                            "minimum_should_match": 1
                          }
                        }
                      ]
                    }
                  }
                ],
                "minimum_should_match": 1
              }
            },
            {
              "bool": {
                "should": [
                  {
                    "bool": {
                      "filter": [
                        {
                          "bool": {
                            "should": [
                              {
                                "match": {
                                  "source.port": "3"
                                }
                              }
                            ],
                            "minimum_should_match": 1
                          }
                        }
                      ]
                    }
                  },
                  {
                    "bool": {
                      "filter": [
                        {
                          "bool": {
                            "should": [
                              {
                                "match": {
                                  "source.ip": "127.0.0.1"
                                }
                              }
                            ],
                            "minimum_should_match": 1
                          }
                        }
                      ]
                    }
                  }
                ],
                "minimum_should_match": 1
              }
            },
            {
              "bool": {
                "should": [
                  {
                    "bool": {
                      "filter": [
                        {
                          "bool": {
                            "should": [
                              {
                                "match": {
                                  "source.port": "4"
                                }
                              }
                            ],
                            "minimum_should_match": 1
                          }
                        }
                      ]
                    }
                  },
                  {
                    "bool": {
                      "filter": [
                        {
                          "bool": {
                            "should": [
                              {
                                "match": {
                                  "source.ip": "127.0.0.1"
                                }
                              }
                            ],
                            "minimum_should_match": 1
                          }
                        }
                      ]
                    }
                  }
                ],
                "minimum_should_match": 1
              }
            },
            {
              "bool": {
                "should": [
                  {
                    "bool": {
                      "filter": [
                        {
                          "bool": {
                            "should": [
                              {
                                "match": {
                                  "source.port": "5"
                                }
                              }
                            ],
                            "minimum_should_match": 1
                          }
                        }
                      ]
                    }
                  },
                  {
                    "bool": {
                      "filter": [
                        {
                          "bool": {
                            "should": [
                              {
                                "match": {
                                  "source.ip": "127.0.0.1"
                                }
                              }
                            ],
                            "minimum_should_match": 1
                          }
                        }
                      ]
                    }
                  }
                ],
                "minimum_should_match": 1
              }
            },
            {
              "bool": {
                "should": [
                  {
                    "bool": {
                      "filter": [
                        {
                          "bool": {
                            "should": [
                              {
                                "match": {
                                  "source.port": "6"
                                }
                              }
                            ],
                            "minimum_should_match": 1
                          }
                        }
                      ]
                    }
                  },
                  {
                    "bool": {
                      "filter": [
                        {
                          "bool": {
                            "should": [
                              {
                                "match": {
                                  "source.ip": "127.0.0.1"
                                }
                              }
                            ],
                            "minimum_should_match": 1
                          }
                        }
                      ]
                    }
                  }
                ],
                "minimum_should_match": 1
              }
            },
            {
              "bool": {
                "should": [
                  {
                    "bool": {
                      "filter": [
                        {
                          "bool": {
                            "should": [
                              {
                                "match": {
                                  "source.port": "7"
                                }
                              }
                            ],
                            "minimum_should_match": 1
                          }
                        }
                      ]
                    }
                  },
                  {
                    "bool": {
                      "filter": [
                        {
                          "bool": {
                            "should": [
                              {
                                "match": {
                                  "source.ip": "127.0.0.1"
                                }
                              }
                            ],
                            "minimum_should_match": 1
                          }
                        }
                      ]
                    }
                  }
                ],
                "minimum_should_match": 1
              }
            }
    ],
    "should": [],
    "must_not": []
  }
}

Tuning/Optimizations

With the above algorithm, our open questions mostly lie around the usefulness of batching like this, and if smaller batches of filters or one large batch per query would prove optimal (or if it really just depends on the data set, cluster configuration, etc). This feature is intended to be used with CCS.

Also of question is if there is anything we can do to better leverage caching with regards to the time windows we're querying. As it stands, a daterange filter is constructed (below) with to/from being calculated at query-time, so not static between queries (the result of some other upstream logic we'll need to address).

Daterange filter:

{
  "bool": {
    "filter": [
      {
        "bool": {
          "should": [
            {
              "range": {
                "@timestamp": {
                  "gte": from,
                  "format": "strict_date_optional_time"
                }
              }
            }
          ],
          "minimum_should_match": 1
        }
      },
      {
        "bool": {
          "should": [
            {
              "range": {
                "@timestamp": {
                  "lte": to,
                  "format": "strict_date_optional_time"
                }
              }
            }
          ],
          "minimum_should_match": 1
        }
      }
    ]
  }
}

Hopefully this is enough information to provide an idea of what we're doing here, and please do let me know if I can clarify any aspect of the above.

@spong spong added discuss needs:triage Requires assignment of a team area label labels Nov 7, 2020
@FrankHassanabad
Copy link
Contributor

FrankHassanabad commented Nov 7, 2020

we'll create 10 queries, each with 10 threat items as filters, and then execute them all at once.

That is the configurable part that we are wondering if we should expose to users to "fiddle with" but we are also wondering if there are conditions in one or two "knobs" are better than another or how we can tune this efficiently. Maybe even a way to query Elastic Search node information to help auto-tune it? We are kind of looking for any helpful advice on all of this.

At very large scale we are expecting every 5 minutes these things happen when our rule/alert runs:

  • The indicator/threat index has documents in the 100's of thousands.
  • The source indexes we are looking for a match against has documents in the thousands, millions, billions even.
  • The user has created the rule to use at most around 5 AND's/OR combinations.
  • The user can either scale their ES nodes as much as they want to for more performance or the user cannot scale and has limited resources but still wants some efficiency and eventual answers.

For example, if you have 900k indicator/threat items which are ip/port's/host name's like the above screen shot example from @spong, they would look like this below in the indicator/threat index. These represent bad things/needles in a haystack we are looking for which are rare finds:

indicator/threat-index-example

{
  "@timestamp": "2020-11-07T15:47:55.204Z",
  "source": { "ip": "127.0.0.1", "port": 1 },
  "host": { "name": "computer-1" }
},
{
  "@timestamp": "2020-11-07T15:48:55.204Z",
  "source": { "ip": "127.0.0.1", "port": 2 },
  "host": { "name": "computer-2" }
}

...etc... up to 100, 1k, 10k, 100k, ~500k for the threats/indicators they are looking for. We are looking to see if we get a match against any of these threat/indicator records using the "OR" against our large volume of source documents.

If we set the "knobs" like this (our current default):

"concurrent_searches": 1,
"items_per_search": 9000,

We will execute exactly 1 search using 9k of the indicator/threat items against the source documents within 5 minutes which could be in the thousands/millions/billions of documents from Kibana to Elastic. If we get 100 matches/signals, we stop as that is our "circuit breaker". In reality we expect a well tuned rule to find 0 or near 0. Once it is done searching or times out, it will return to our indicator/threat list and grab the next 9k indicator/threat items and then continue until it has completed the number of items from the list.

If we want to, we can change these "knobs" to something else such as this:

"concurrent_searches": 10,
"items_per_search": 100,

And now we will execute 10 searches at once using 100 of the indicator/threat items against the source documents within 5 minutes which could be in the thousands/millions/billions of documents from Kibana to Elastic. Each search now has a limit of 100 items, meaning we could find up to 100 matches/signals per each concurrent search but again we expect in reality a well tuned rule will find near 0. Once each concurrent search is done searching or times out, we will return to our indicator/threat list and grab the "next" 100 items, construct another 10 concurrent searches with 100 search items and then continue until it has completed the number of indicator/threat items from the list.

To limit things you can be as light weight with the searches and items as this:

"concurrent_searches": 1,
"items_per_search": 1,

And now you are sending only 1 list item at a time, waiting to see if it gets a positive match or not before sending the next 1 indicator/threat list item. Obviously now you are searching the thousands/millions/billions of records 1 indicator/threat list item at a time from the filter we construct from that single indicator/threat item which would be 9k round trips from Kibana to Elastic Search if you have 9k items in your indicator/threat list.

@jimczi jimczi added :Search/Search Search-related issues that do not fall into other categories team-discuss and removed discuss needs:triage Requires assignment of a team area label labels Nov 16, 2020
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Nov 16, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@spong
Copy link
Member Author

spong commented Dec 7, 2020

@jimczi @giladgal Thanks for taking the time to brainstorm with us and discuss options going forward. To summarize, our efforts here are going to be broken up into two:

  1. Smaller optimizations to the existing implementation (track_total_hits=false, sort criteria timestamp: desc, resolve gap detection remediation time drift, use named query to determine match)
  2. Explore POC for pushing implementation to ingest-time by leveraging ingest enrich processor and percolator, with the understanding that the ingest enrich processor currently only supports exact match, but could be expanded to fit the needs of this application.

We'll sync next once we complete the POC and can provide feedback on if this would be a suitable mid/long term solution for our needs (hopefully mid-way of the 7.12 feature development cycle). Thanks again! 🙂

@ayedem
Copy link

ayedem commented Jun 11, 2021

What is the limitation of these indicator rules? I have 6 million indicators in total and some of the indicator rules need to go through 1-2 million indicators.

Just some quick math [i hope the math is correct],
At 9000 IOCs/page it would take 222 pages to get through all 2 million indicators (as an example).
To get through the 9000 IOCs at the 10 concurrent queries rate we would need to run those 10 concurrent queries 90 times.

Now, if it takes 90x[10 concurrent queries] to get through a single page, then its going to take 19800x[10 concurrent queries] to get through all 2 million IOCs - which is about 199800 queries in total.

If my rules run every 5mins, Kibana needs to be able to finish these queries by the next run otherwise it will just cascade out of control.

@hilt86
Copy link

hilt86 commented Jun 11, 2021

Yeah I've had Kibana struggle when importing small datasets (30k indicators), albeit with a low powered cluster

@ayedem
Copy link

ayedem commented Jun 11, 2021

I actually have a decent cluster with 6 data dedicated nodes with 64GB / 8 CPU and 2 x kibana nodes with 16/4.

With the way I currently think this works, I feel that’s it actually an inefficient way of doing it. In this solution, we are querying the same set of siem data 100’s or 1000’s of times and filtering it with different items each time. Wouldn’t we be better off getting the IOC’s (with adjustability of how many doc’s you can store in pages) and querying the siem data once and storing that in memory and then working your way through IOC’s against the initial index results? This may not be technically feasible with the way elasticsearch currently works, but there must be a better way of doing this.

@hilt86
Copy link

hilt86 commented Jun 11, 2021

I don't know the internals well enough to comment but there must be a better way to do it.

@rylnd
Copy link

rylnd commented Jun 23, 2021

@hilt86 @ayedem thank you! Indicator match rules are a relatively new rule type, and we're always looking at ways to iterate and improve upon these features, so your feedback is incredibly helpful.

I believe it's been inferred above, but to state explicitly: indicator match rules are currently optimized for large event datasets. Due to this, a situation with a large number of indicators but a relatively small set of events is not going to produce the most optimal query (which I believe was your assertion above, @ayedem?). For now, the biggest performance improvements will be seen by limiting the number of indicators your rule uses. A few examples:

  1. Only querying indicators that pertain to your threat mapping
    • e.g. if your mapping is source.ip -> indicator.ip, then include indicator.ip: * in your indicator query
  2. Only querying recent/active indicators
    • i.e. is a network event indicating traffic to a now-inert IP worth alerting on? The TTL of your indicators is something to consider.

We are looking at ways to optimize your use case as well. To that end, I have a few questions for you both (and for anyone else who happens upon this!):

  1. What is the problem that your indicator match rule is trying to solve?
  2. What is approximate number of events, their content, and their source?
  3. What is the approximate number of indicators, their content, and their source?
  4. What behavior do you currently experience when trying to run these rules? Rule failures, elasticsearch timeouts, kibana slowdown, etc.

@javanna
Copy link
Member

javanna commented May 3, 2023

Closing as this issue was used for discussion and there is currently nothing left to do on ES side

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team
Projects
None yet
Development

No branches or pull requests

8 participants