Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple Grok pattern crashes the entire elastic cluster #28731

Closed
gellweiler opened this issue Feb 19, 2018 · 6 comments
Closed

Simple Grok pattern crashes the entire elastic cluster #28731

gellweiler opened this issue Feb 19, 2018 · 6 comments
Assignees
Labels
>bug :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP

Comments

@gellweiler
Copy link

gellweiler commented Feb 19, 2018

This Bonsuche mit folgender Anfrage: Belegart->\[%{WORD:param2},(?<param5>(\s*%{NOTSPACE})*)\] Zustand->ABGESCHLOSSEN Kassennummer->%{WORD:param9} Bonnummer->%{WORD:param10} Datum->%{DATESTAMP_OTHER:param11} not so complicated Grok pattern matched against Bonsuche mit folgender Anfrage: Belegart->[EINGESCHRAENKTER_VERKAUF, VERKAUF, NACHERFASSUNG] Zustand->ABGESCHLOSSEN Kassennummer->2 Bonnummer->6362 Datum->Mon Jan 08 00:00:00 UTC 2018 which I entered into the GROK Debugger in Kibana, crashed my whole cluster, by putting the CPU load on my proxy nodes to 100%. I had to restart them! Any idea whats going on here?

I was able to repeat this and every time the CPU usage would remain near 90% for hours on the proxy nodes till I restart them.

Elasticsearch-Version: 6.1.3 (Now updated to 6.2.1 problem still persists)

We were able to reproduce this on different clusters.

@martijnvg martijnvg self-assigned this Feb 21, 2018
@martijnvg
Copy link
Member

@gellweiler Thanks for reporting this, threads indeed can get stuck evaluating the grok expression.

This reproduce for me with the following ingest simulate api call on master:

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "processors": [
      {
        "grok": {
          "field": "message",
          "patterns": [
            "Bonsuche mit folgender Anfrage: Belegart->\\[%{WORD:param2},(?<param5>(\\s*%{NOTSPACE})*)\\] Zustand->ABGESCHLOSSEN Kassennummer->%{WORD:param9} Bonnummer->%{WORD:param10} Datum->%{DATESTAMP_OTHER:param11}"
          ]
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "message": "Bonsuche mit folgender Anfrage: Belegart->[EINGESCHRAENKTER_VERKAUF, VERKAUF, NACHERFASSUNG] Zustand->ABGESCHLOSSEN Kassennummer->2 Bonnummer->6362 Datum->Mon Jan 08 00:00:00 UTC 2018"
      }
    }
  ]
}

This api call never returns and I see in hot threads api that the following stack trace being returned:

 org.joni.ByteCodeMachine.matchAt(ByteCodeMachine.java:271)
       org.joni.Matcher.matchCheck(Matcher.java:304)
       org.joni.Matcher.searchInterruptible(Matcher.java:457)
       org.joni.Matcher.search(Matcher.java:318)
       org.elasticsearch.grok.Grok.captures(Grok.java:173)
       org.elasticsearch.ingest.common.GrokProcessor.execute(GrokProcessor.java:65)
       org.elasticsearch.ingest.CompoundProcessor.execute(CompoundProcessor.java:100)
       org.elasticsearch.ingest.Pipeline.execute(Pipeline.java:58)
       org.elasticsearch.action.ingest.SimulateExecutionService.executeDocument(SimulateExecutionService.java:56)
       org.elasticsearch.action.ingest.SimulateExecutionService$1.doRun(SimulateExecutionService.java:70)
       org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:672)
       org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
       java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       java.lang.Thread.run(Thread.java:748)

and this one:

org.joni.Matcher.matchCheck(Matcher.java:304)
       org.joni.Matcher.searchInterruptible(Matcher.java:457)
       org.joni.Matcher.search(Matcher.java:318)
       org.elasticsearch.grok.Grok.captures(Grok.java:173)
       org.elasticsearch.ingest.common.GrokProcessor.execute(GrokProcessor.java:65)
       org.elasticsearch.ingest.CompoundProcessor.execute(CompoundProcessor.java:100)
       org.elasticsearch.ingest.Pipeline.execute(Pipeline.java:58)
       org.elasticsearch.action.ingest.SimulateExecutionService.executeDocument(SimulateExecutionService.java:56)
       org.elasticsearch.action.ingest.SimulateExecutionService$1.doRun(SimulateExecutionService.java:70)
       org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:672)
       org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
       java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       java.lang.Thread.run(Thread.java:748)
   

@talevy What would be the best way avoiding joni from evaluating these poisonous expressions?

@martijnvg martijnvg added >bug :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP labels Feb 21, 2018
@talevy
Copy link
Contributor

talevy commented Feb 21, 2018

So, I am still investigating what is causing the regex processor to go insane here, but I recommend trying this pattern for now:

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "processors": [
      {
        "grok": {
          "field": "message",
          "patterns": [
            "Bonsuche mit folgender Anfrage: Belegart->\\[%{WORD:param2}(?<param5>(,\\s*%{WORD})*)\\] Zustand->ABGESCHLOSSEN Kassennummer->%{WORD:param9} Bonnummer->%{WORD:param10} Datum->%{DATESTAMP_OTHER:param11}"
          ]
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "message": "Bonsuche mit folgender Anfrage: Belegart->[EINGESCHRAENKTER_VERKAUF, VERKAUF, NACHERFASSUNG] Zustand->ABGESCHLOSSEN Kassennummer->2 Bonnummer->6362 Datum->Mon Jan 08 00:00:00 UTC 2018"
      }
    }
  ]
}

which results in the same expected output (I believe) and executes much faster

{
        "_index": "_index",
        "_type": "_type",
        "_id": "_id",
        "_source": {
          "param5": ", VERKAUF, NACHERFASSUNG",
          "param9": "2",
          "param10": "6362",
          "param11": "Mon Jan 08 00:00:00 UTC 2018",
          "message": "Bonsuche mit folgender Anfrage: Belegart->[EINGESCHRAENKTER_VERKAUF, VERKAUF, NACHERFASSUNG] Zustand->ABGESCHLOSSEN Kassennummer->2 Bonnummer->6362 Datum->Mon Jan 08 00:00:00 UTC 2018",
          "param2": "EINGESCHRAENKTER_VERKAUF"
}

@talevy
Copy link
Contributor

talevy commented Feb 21, 2018

Since I do not believe this is a generic way to avoid the regex engine from entering a potentially exponential execution time, I am investigating ways to provide safe ways to interrupt the regex to avoid it from taking too much CPU for too long

@gellweiler
Copy link
Author

Thanks for investigating on this issue. The grok pattern was auto generated by a tool I wrote. I've changed the grok output to produce GREEDYDATA instead of Token*. Which also seems to be safe. Still I would like to know the root cause of this issue, to prevent generating dangerous expressions in the future.

@ldormoy
Copy link

ldormoy commented May 14, 2018

Any update on this? I seem to have the same issue with a filebeat module.

My CPU went to 100%, hot_threads show org.joni.Matcher using all CPU. I will try to find out which grok pattern caused this, but it would be nice to have more information about which patterns should not be used within ingest pipelines

@martijnvg
Copy link
Member

martijnvg commented May 31, 2018

We have been discussing how to best fix this bug with what joni currently offers. Joni has checks in its code base that for every 30000 loops it checks if a threads interrupted flag has been set and in that case terminate the loop. I initially thought we could not use that as that would require to fork a thread for each thread that uses ingest with a grok processor, which is not feasible in ES. However it was brought to my attention that if we add a single background thread that checks if threads that use the grok processor have been running for too long and then set the interrupt flag. Each time the grok processor is used, threads would need to register themselves. This approach sounds good to me and I will try to implement this.

martijnvg added a commit to martijnvg/elasticsearch that referenced this issue Jun 1, 2018
…take too long

This adds a thread interrupter that allows us to encapsulate calls to org.joni.Matcher#search()
This method can hang forever if the regex expression is too complex.

The thread interrupter in the background checks every 3 seconds whether there are threads
execution the org.joni.Matcher#search() method for longer than 5 seconds and
if so interrupts these threads.

Joni has checks that that for every 30k iterations it checks if the current thread is interrupted and
if so returns org.joni.Matcher#INTERRUPTED

Closes elastic#28731
martijnvg added a commit that referenced this issue Jun 12, 2018
…take too long (#31024)

This adds a thread interrupter that allows us to encapsulate calls to org.joni.Matcher#search()
This method can hang forever if the regex expression is too complex.

The thread interrupter in the background checks every 3 seconds whether there are threads
execution the org.joni.Matcher#search() method for longer than 5 seconds and
if so interrupts these threads.

Joni has checks that that for every 30k iterations it checks if the current thread is interrupted and
if so returns org.joni.Matcher#INTERRUPTED

Closes #28731
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP
Projects
None yet
Development

No branches or pull requests

4 participants