Auto down scaling collectors without loosing information #30916

chkp-yairt · 2024-01-31T13:48:56Z

Component(s)

cmd/otelcontribcol

Is your feature request related to a problem? Please describe.

Our current architecture is set so we have a load balancer that accepts all data as otlp and divides the load between several opentelemetry deployments that in turn send the data to the required exporter endpoint (loki,mimir,tempo).
Once we have a high load, the auto scaling kicks in and increases the number of deployments.
The issue is when the load decreases and we need to decrease the number of deployments as well.
When a pod goes down it deletes all the data it was holding leading to an obvious data loss.
This happens regardless if you use a persistent storage as each pod is unaware of the other.

Describe the solution you'd like

The solution, if possible is that once the number of pods decrease each pod will first - stop receiving data, clear out all of its queues and then drop.

Describe alternatives you've considered

Using an efs where the queue folders are shared among the pods so they know to pickup where the dropped pod left off.

Additional context

This issue also happens when using the stateful set mode not just deployment.

crobert-1 · 2024-01-31T18:24:41Z

Hello @chkp-yairt, are you using the operator and the helm chart, or just one or the other? Maybe neither?

chkp-yairt · 2024-02-01T08:23:00Z

Hi @crobert-1, I'm using the helm chart but not the operator. Is there a feature in the operator that provides this capability?

crobert-1 · 2024-02-01T16:16:38Z

I'm not sure if the operator provides this functionality. From my understanding you may be better served by filing this issue in the helm chart repository, since it's what handles the autoscaling.

Feel free to reopen if I've misunderstood though, happy to help if there's anything specific to the collector itself that can be done!

chkp-yairt added enhancement New feature or request needs triage New item requiring triage labels Jan 31, 2024

crobert-1 closed this as completed Feb 1, 2024

crobert-1 closed this as not planned Won't fix, can't repro, duplicate, stale Feb 1, 2024

github-actions bot mentioned this issue Feb 6, 2024

Weekly Report: 2024-01-30 - 2024-02-06 #31055

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto down scaling collectors without loosing information #30916

Auto down scaling collectors without loosing information #30916

chkp-yairt commented Jan 31, 2024

crobert-1 commented Jan 31, 2024

chkp-yairt commented Feb 1, 2024

crobert-1 commented Feb 1, 2024

Auto down scaling collectors without loosing information #30916

Auto down scaling collectors without loosing information #30916

Comments

chkp-yairt commented Jan 31, 2024

Component(s)

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

crobert-1 commented Jan 31, 2024

chkp-yairt commented Feb 1, 2024

crobert-1 commented Feb 1, 2024