-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Painless / module failing with: "Too many dynamic script compilations within, max" #9600
Comments
Pinging @elastic/infrastructure |
At least in documentation this limit hasn't changed, maybe we have reached it. I can take a look to increase this limit. |
@jsoriano The best solution is actually to not to increase the limit. If a test suite breaks the amount of compilations allowed, it will absolutely blow up in any serious environments. The best solution is to figure out which painless script(s) are always recompiling, and parameterize them instead. I've had that happen on a few occasions, and I just needed to move some literal values out of the script and into params. Check out this PR: https://github.com/elastic/beats/pull/9308/files#diff-759f580883147ab049f76cd3501ec965R32 |
Duplicate of #9587. |
I assume we hit the limit as we added more filebeat modules. +1 on fixing the actual scripts instead of increasing the limit. |
Agree on fixing the scripts if this helps before increasing the limit. Should we close this issue as duplicated of #9587? |
@webmat on the PR you mention, what was the script before? |
I opened a PR to quickly increase the limit for now to get CI to green: #9613 |
@jsoriano Now that we have all conversation already here, should be close the other one and label this one correctly with flaky_test and remove bug? |
I actually wonder why is this error happening on |
@jsoriano It's a script I was introducing, to adjust a nanosecond duration (ECS) vs the previous format (in ms, iirc). I tried to do this with a simple multiplication with the literal value right in the script. That would reliably trigger the "too many compilations" error. Moving this to a param made the error go away reliably as well. I can't say I understand why having this literal in the script would cause that, though. I talked with Jake from Ingest Node about it, and he wasn't 100% sure why either. Note that the error is being raised by an ES instance that's being used for all the tests, however. It's very likely that it's actually a few scripts together that cause too many compilations to happen on that instance. It's not necessarily just one script. So it's probably worth finding all the places where we have some Painless scripts, and reviewing those as a whole. |
This is an old thread, but I am now seeing this on Filebeat 7.9.2.
I did try doubling the limit to 150/5m (below) to test, but the setting does not seem to be taking effect. The logs are still showing the max is 75/5m (above log messages are after making the change). Does the cluster need to be restarted for the setting to take effect?
Response:
UPDATE: I did a rolling restart on the cluster, and it appears the new setting has taken affect. I am no longer seeing any errors relating to the Suricata pipeline, but the errors unfortunately remain for the Cisco pipeline.
UPDATE 2: Quadrupled the setting to 300/5m, and the errors have stopped. I do understand from the above conversation, and others, that increasing the limit is not the solution, but it has at least temporarily resolved the errors. Any assistance with properly fixing these errors would be greatly appreciated! |
I have this same issue with the ASA module. It is simply non functional for me with the same "too many dynamic scripts" errors. Increasing the setting has no effect in my experience. The issue is so bad that I have reverted back to an old logstash filter that is parsing the files fine, but it has broken ILM so I am having to manually delete large indecies manually. My guess is that this module has not be tested on any actual ASA that is pushing a large amount of log data. Would love to be proven wrong so I can recommend trying it again. |
If you look at
I recommend checking those metrics to see if you have cache evictions. Ideally these would only be compiled at startup or when the pipeline is first loaded and data volume wouldn't have an impact. If you have evictions then that limit ( There were changes to these settings (see breaking changes to ES). I believe that if you want to move over to the new context based settings that you need to set There are some good details in elastic/elasticsearch#53756. |
I have written a blog on this issue at: https://alexmarquardt.com/2020/10/21/elasticsearch-too-many-script-compilations/ |
Can you confirm that, @andrewkroh? We have zero scripts that we created ourselves. Any scripts running at all would have been included with Elasticsearch, Kibana, Beats, etc.
That is not the case for us. I had to manually increase the size as I mentioned above. Nice blog, @alexander-marquardt! Unfortunately, since we have no scripts of our own, I feel this is an issue that needs to be addressed by the teams at Elastic. |
I certainly appreciate the blog post as this is very helpful. My only
concern is that this seems to be written under the context of someone
dealing with the issue with a custom, in house script. That is not my
case. I am experiencing this with the ASA section of the Cisco module for
Filebeat. I assume I would attempt the same tunings even though this is
official code?
Thank you for the information for sure.
…On Sat, Oct 24, 2020 at 5:56 AM Alexander Marquardt < ***@***.***> wrote:
I have written a blog on this issue at:
https://alexmarquardt.com/2020/10/21/elasticsearch-too-many-script-compilations/
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#9600 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKLQKQACYIWHZSTFT4NLMWLSMKXE5ANCNFSM4GK4MD6A>
.
|
@hueyg - If you are running 7.8 or earlier, see: https://www.elastic.co/guide/en/beats/filebeat/7.8/filebeat-module-cisco.html#dynamic-script-compilations I believe the issue you are referring to was fixed with enhancements made in 7.9. |
@alexander-marquardt We just upgraded to 7.9.3 a couple days ago, but when I came to this thread, we were still experiencing the issue with 7.9.2. |
@MakoWish - Do you know if your cluster using the newer You may be able to see this if you run
|
I have opened elastic/elasticsearch#64595 to request an unlimited compilation rate limit for the template context. |
Here is the result of
@alexander-marquardt, are you suggesting I manually change the value of |
That looks like you are already using the new settings. The blog I posted earlier describes how to see if the script cache is churning and how to increase its size. |
I fear my response may have been a bit confusing, as I had first posted the actual setting from |
Any update on this? I am still seeing the issue on 7.10.0. To note, it only appears to have gotten worse with 7.10.0. I had previously put this setting to 300/5m, and that stopped the errors, but now even that setting is throwing errors. I just had to increase it to 500/5m. Again, I understand this is just a band-aide, so still waiting on feedback/suggestions from someone at Elastic. |
I realize this is being worked on but i am looking for a temporary solution in the mean time, I thought I understood but with this applied:
I am still getting this error Why does it still reflect 30/1m in the error? I am using 7.10 zeek module is that a different context somehow? |
In order to use different contexts (such as the "ingest" context you are attempting to use above), contexts need to be enabled. If you run the following command and get an empty response, then contexts are not enabled:
To enable contexts you can do the following:
Also, be careful with the |
Yes I have that setting applied, but its still giving the same error as if it wasnt applied. contexts are enabled, i can see various contexts in the stats query (and they all read zero)
What am I missing? Where is the 30/1m coming from if I have use-context set and the ingest context rate set at 500/1m? |
I'm also getting this in 7.10.1 filebeat while trying to pull in zeek via the zeek module. Is there something at the root of this that I can address? |
We're seeing also seeing this for the Zeek module, but we're running filebeat 7.9.0. Logs keep queuing up in Logstash and the only way to resolve it is to delete and recreate the filebeat-7.9.0-zeek-ssl-pipeline pipeline. We're reluctant to increase the max_compilations_rate. We're using Elastic cloud service running Elasticsearch 7.10.0.
|
Confirmed with custom ingest pipelines, too, that an ingest pipeline needs to be deleted and re-loaded to get Logstash events flowing again. |
Discovered in #9599
This look like a changes in 7.0?
The text was updated successfully, but these errors were encountered: