-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] CompoundProcessor limits ingest pipeline length #6338
Comments
@MaximeWewer thanks for reaching out. We cannot look at code which is not compatible with ALv2. |
Yes, of course, I understand for license compliance.
|
Looks like a bug. Looking for someone to write a unit test that reproduces this problem and a fix, without looking at any non-APLv2 code, please. |
From the description in this issue, I was able to reproduce with the following unit test in
The problem is that I think we might be able to reimplement it as a for-loop. I need to wrap my head around it some more, but I was already planning to refactor |
I poked at the I'll give it some more thought. It would be so much easier if Java had tail-call optimization. :) |
We should probably address this by adding a method to
For search pipelines, we have "handy" async subinterfaces that flip the sync-vs-async abstractness, so we can override For ingest processors, we could similarly add a subinterface that flips things. Right now, we don't have any async ingest processors AFAIK, but e.g. the ML inference ingest processor (and probably the GeoIP ingest processor) should be async, to avoid holding the transport_worker thread. Then, for both cases, we can process a whole chain of synchronous processors in a |
For Simple snippet to explain idea :
@msfroh Thoughts? Let me know if I have missed something over here. cc: @shwetathareja / @ankitkala |
The difficulty with that approach is that the If you have an async ingest processor that makes a long network call (e.g calling out to a remote ML inference service to compute embeddings), it could exhaust the available indexing threads. Normally the remote call would not hold a thread, but only need to execute the callback on a threadpool once the call completes. |
Describe the bug
I have used the ingest pipeline to rename a large number of fields and I found that it is possible to get a stack overflow exception when running an ingest pipeline with many processors.
I've found similar issue on the Elasticsearch github => elastic/elasticsearch#84274
Expected behavior
Can you fix the problem like the PR below ?
elastic/elasticsearch#84250
Host/Environment (please complete the following information):
The text was updated successfully, but these errors were encountered: