Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of recursion rather than iteration in CompoundProcessor limits ingest pipeline length #84274

Closed
droberts195 opened this issue Feb 23, 2022 · 1 comment · Fixed by #84250
Closed
Assignees
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP Team:Data Management Meta label for data/management team

Comments

@droberts195
Copy link
Contributor

A case where somebody was trying to import a CSV file with 3000 numeric fields revealed that it's possible to get a stack overflow exception when executing an ingest pipeline with many processors.

The ingest pipeline was ingest_pipeline.json, consisting of a CSV processor to parse the CSV followed by 3000 convert processors to convert the strings parsed from the CSV to numbers.

On executing this pipeline it fails with a stack overflow exception:

[2022-02-23T10:16:31,678][INFO ][o.e.c.m.MetadataCreateIndexService] [runTask-0] [test] creating index, cause [api], templates [], shards [1]/[1]
[2022-02-23T10:16:32,541][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [runTask-0] fatal error in thread [elasticsearch[runTask-0][write][T#12]], exiting
java.lang.StackOverflowError: null
        at java.lang.String.startsWith(String.java:2297) ~[?:?]
        at org.elasticsearch.ingest.IngestDocument$FieldPath.<init>(IngestDocument.java:877) ~[elasticsearch-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT]
        at org.elasticsearch.ingest.IngestDocument.getFieldValue(IngestDocument.java:102) ~[elasticsearch-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT]
        at org.elasticsearch.ingest.IngestDocument.getFieldValue(IngestDocument.java:122) ~[elasticsearch-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT]
        at org.elasticsearch.ingest.common.ConvertProcessor.execute(ConvertProcessor.java:185) ~[?:?]
        at org.elasticsearch.ingest.Processor.execute(Processor.java:41) ~[elasticsearch-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT]
        at org.elasticsearch.ingest.CompoundProcessor.innerExecute(CompoundProcessor.java:136) ~[elasticsearch-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT]
        at org.elasticsearch.ingest.CompoundProcessor.lambda$innerExecute$1(CompoundProcessor.java:154) ~[elasticsearch-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT]
        at org.elasticsearch.ingest.Processor.execute(Processor.java:46) ~[elasticsearch-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT]
        at org.elasticsearch.ingest.CompoundProcessor.innerExecute(CompoundProcessor.java:136) ~[elasticsearch-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT]
        at org.elasticsearch.ingest.CompoundProcessor.lambda$innerExecute$1(CompoundProcessor.java:154) ~[elasticsearch-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT]
        at org.elasticsearch.ingest.Processor.execute(Processor.java:46) ~[elasticsearch-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT]
        at org.elasticsearch.ingest.CompoundProcessor.innerExecute(CompoundProcessor.java:136) ~[elasticsearch-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT]
        at org.elasticsearch.ingest.CompoundProcessor.lambda$innerExecute$1(CompoundProcessor.java:154) ~[elasticsearch-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT]
        at org.elasticsearch.ingest.Processor.execute(Processor.java:46) ~[elasticsearch-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT]
        at org.elasticsearch.ingest.CompoundProcessor.innerExecute(CompoundProcessor.java:136) ~[elasticsearch-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT]
        at org.elasticsearch.ingest.CompoundProcessor.lambda$innerExecute$1(CompoundProcessor.java:154) ~[elasticsearch-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT]
        at org.elasticsearch.ingest.Processor.execute(Processor.java:46) ~[elasticsearch-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT]
        at org.elasticsearch.ingest.CompoundProcessor.innerExecute(CompoundProcessor.java:136) ~[elasticsearch-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT]
        at org.elasticsearch.ingest.CompoundProcessor.lambda$innerExecute$1(CompoundProcessor.java:154) ~[elasticsearch-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT]
        at org.elasticsearch.ingest.Processor.execute(Processor.java:46) ~[elasticsearch-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT]
        at org.elasticsearch.ingest.CompoundProcessor.innerExecute(CompoundProcessor.java:136) ~[elasticsearch-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT]
        at org.elasticsearch.ingest.CompoundProcessor.lambda$innerExecute$1(CompoundProcessor.java:154) ~[elasticsearch-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT]
        at org.elasticsearch.ingest.Processor.execute(Processor.java:46) ~[elasticsearch-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT]
.
.
.

(The stack trace continues with the same 3 calls over and over again.)

Could CompoundProcessor.innerExecute be changed to use iteration rather than recursion to avoid this?

The sample CSV file that goes with the ingest pipeline is test.csv.

@droberts195 droberts195 added the :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP label Feb 23, 2022
@elasticmachine elasticmachine added the Team:Data Management Meta label for data/management team label Feb 23, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP Team:Data Management Meta label for data/management team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants