Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Application crash caused by infinite concurrencyLimit during extended collector unavailability #4204

Closed
vijayaggarwal opened this issue Oct 13, 2023 · 2 comments · Fixed by #4211
Labels
bug Something isn't working priority:p1 Bugs which cause problems in end-user applications such as crashes, data inconsistencies, etc

Comments

@vijayaggarwal
Copy link

What happened?

Steps to Reproduce

Run the NodeJS application using auto-instrumentation as follows, but without starting the collector (to simulate extended collector unavailability).

env OTEL_TRACES_EXPORTER=otlp \
node --require @opentelemetry/auto-instrumentations-node/register app.js

Expected Result

OTel agent should drop the traces after a (configurable) queue size is reached.

Actual Result

OTel agent keep queuing traces infinitely, ultimately causing the application process to crash due to OOM.

Additional Details

#1708 has implemented concurrencyLimit option to limit the queue size. However, the default value is Infinity and there is no environment variable to set concurrencyLimit. It can be set via constructor argument, but exporter construction happens deep inside the auto-instrumentation flow.

OpenTelemetry specification says that all components should have bounded memory usage. Most components that use some sort of queue also specify reasonable default values for max queue size. In my view:

  1. The default value for concurrencyLimit should be changed from Infinity to a reasonable finite number. [critical]
  2. A mechanism should be provided for setting concurrencyLimit (e.g. environment variable) for auto-instrumentation use-cases. [nice-to-have]

OpenTelemetry Setup Code

env OTEL_TRACES_EXPORTER=otlp \
node --require @opentelemetry/auto-instrumentations-node/register app.js

package.json

@opentelemetry/auto-instrumentations-node@^0.38.0

Relevant log output

No response

@vijayaggarwal vijayaggarwal added bug Something isn't working triage labels Oct 13, 2023
@pichlermarc pichlermarc added priority:p1 Bugs which cause problems in end-user applications such as crashes, data inconsistencies, etc and removed triage labels Oct 17, 2023
@pichlermarc
Copy link
Member

Yep, I've been working on the exporters recently (#4116) and I've wondered about the same thing when I refactored that part. Making it configurable would be nice to have, in this issue, I think we should go with

The default value for concurrencyLimit should be changed from Infinity to a reasonable finite number. [critical]

then open a follow-up for feature:

A mechanism should be provided for setting concurrencyLimit (e.g. environment variable) for auto-instrumentation use-cases. [nice-to-have]

@vijayaggarwal
Copy link
Author

Agree with you @pichlermarc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority:p1 Bugs which cause problems in end-user applications such as crashes, data inconsistencies, etc
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants