You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Our current architecture is set so we have a load balancer that accepts all data as otlp and divides the load between several opentelemetry deployments that in turn send the data to the required exporter endpoint (loki,mimir,tempo).
Once we have a high load, the auto scaling kicks in and increases the number of deployments.
The issue is when the load decreases and we need to decrease the number of deployments as well.
When a pod goes down it deletes all the data it was holding leading to an obvious data loss.
This happens regardless if you use a persistent storage as each pod is unaware of the other.
Describe the solution you'd like
The solution, if possible is that once the number of pods decrease each pod will first - stop receiving data, clear out all of its queues and then drop.
Describe alternatives you've considered
Using an efs where the queue folders are shared among the pods so they know to pickup where the dropped pod left off.
Additional context
This issue also happens when using the stateful set mode not just deployment.
The text was updated successfully, but these errors were encountered:
I'm not sure if the operator provides this functionality. From my understanding you may be better served by filing this issue in the helm chart repository, since it's what handles the autoscaling.
Feel free to reopen if I've misunderstood though, happy to help if there's anything specific to the collector itself that can be done!
Component(s)
cmd/otelcontribcol
Is your feature request related to a problem? Please describe.
Our current architecture is set so we have a load balancer that accepts all data as otlp and divides the load between several opentelemetry deployments that in turn send the data to the required exporter endpoint (loki,mimir,tempo).
Once we have a high load, the auto scaling kicks in and increases the number of deployments.
The issue is when the load decreases and we need to decrease the number of deployments as well.
When a pod goes down it deletes all the data it was holding leading to an obvious data loss.
This happens regardless if you use a persistent storage as each pod is unaware of the other.
Describe the solution you'd like
The solution, if possible is that once the number of pods decrease each pod will first - stop receiving data, clear out all of its queues and then drop.
Describe alternatives you've considered
Using an efs where the queue folders are shared among the pods so they know to pickup where the dropped pod left off.
Additional context
This issue also happens when using the stateful set mode not just deployment.
The text was updated successfully, but these errors were encountered: