-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simple Grok pattern crashes the entire elastic cluster #28731
Comments
@gellweiler Thanks for reporting this, threads indeed can get stuck evaluating the grok expression. This reproduce for me with the following ingest simulate api call on master:
This api call never returns and I see in hot threads api that the following stack trace being returned:
and this one:
@talevy What would be the best way avoiding joni from evaluating these poisonous expressions? |
So, I am still investigating what is causing the regex processor to go insane here, but I recommend trying this pattern for now:
which results in the same expected output (I believe) and executes much faster
|
Since I do not believe this is a generic way to avoid the regex engine from entering a potentially exponential execution time, I am investigating ways to provide safe ways to interrupt the regex to avoid it from taking too much CPU for too long |
Thanks for investigating on this issue. The grok pattern was auto generated by a tool I wrote. I've changed the grok output to produce GREEDYDATA instead of Token*. Which also seems to be safe. Still I would like to know the root cause of this issue, to prevent generating dangerous expressions in the future. |
Any update on this? I seem to have the same issue with a filebeat module. My CPU went to 100%, hot_threads show org.joni.Matcher using all CPU. I will try to find out which grok pattern caused this, but it would be nice to have more information about which patterns should not be used within ingest pipelines |
We have been discussing how to best fix this bug with what joni currently offers. Joni has checks in its code base that for every 30000 loops it checks if a threads interrupted flag has been set and in that case terminate the loop. I initially thought we could not use that as that would require to fork a thread for each thread that uses ingest with a grok processor, which is not feasible in ES. However it was brought to my attention that if we add a single background thread that checks if threads that use the grok processor have been running for too long and then set the interrupt flag. Each time the grok processor is used, threads would need to register themselves. This approach sounds good to me and I will try to implement this. |
…take too long This adds a thread interrupter that allows us to encapsulate calls to org.joni.Matcher#search() This method can hang forever if the regex expression is too complex. The thread interrupter in the background checks every 3 seconds whether there are threads execution the org.joni.Matcher#search() method for longer than 5 seconds and if so interrupts these threads. Joni has checks that that for every 30k iterations it checks if the current thread is interrupted and if so returns org.joni.Matcher#INTERRUPTED Closes elastic#28731
…take too long (#31024) This adds a thread interrupter that allows us to encapsulate calls to org.joni.Matcher#search() This method can hang forever if the regex expression is too complex. The thread interrupter in the background checks every 3 seconds whether there are threads execution the org.joni.Matcher#search() method for longer than 5 seconds and if so interrupts these threads. Joni has checks that that for every 30k iterations it checks if the current thread is interrupted and if so returns org.joni.Matcher#INTERRUPTED Closes #28731
This
Bonsuche mit folgender Anfrage: Belegart->\[%{WORD:param2},(?<param5>(\s*%{NOTSPACE})*)\] Zustand->ABGESCHLOSSEN Kassennummer->%{WORD:param9} Bonnummer->%{WORD:param10} Datum->%{DATESTAMP_OTHER:param11}
not so complicated Grok pattern matched againstBonsuche mit folgender Anfrage: Belegart->[EINGESCHRAENKTER_VERKAUF, VERKAUF, NACHERFASSUNG] Zustand->ABGESCHLOSSEN Kassennummer->2 Bonnummer->6362 Datum->Mon Jan 08 00:00:00 UTC 2018
which I entered into the GROK Debugger in Kibana, crashed my whole cluster, by putting the CPU load on my proxy nodes to 100%. I had to restart them! Any idea whats going on here?I was able to repeat this and every time the CPU usage would remain near 90% for hours on the proxy nodes till I restart them.
Elasticsearch-Version: 6.1.3 (Now updated to 6.2.1 problem still persists)
We were able to reproduce this on different clusters.
The text was updated successfully, but these errors were encountered: