Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spec.autoscaler.targetMemoryUtilization does not add memory scaling to autoscaler #1439

Closed
jsirianni opened this issue Feb 3, 2023 · 2 comments · Fixed by #1462
Closed
Labels
area:collector Issues for deploying collector bug Something isn't working

Comments

@jsirianni
Copy link
Member

When deploying a collector with autoscaling enabled, the HPA is created with CPU metric scaling only, despite targetMemoryUtilization being set in spec.autoscaler.

I have the following collector config:

apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: collector
spec:
  autoscaler:
    minReplicas: 2
    maxReplicas: 10
    targetCPUUtilization: 60
    targetMemoryUtilization: 80
    ...

The collector deploys and works great, but the hpa does not have memory configured:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
...
spec:
  maxReplicas: 10
  metrics:
  - resource:
      name: cpu
      target:
        averageUtilization: 60
        type: Utilization
    type: Resource
  minReplicas: 2
  scaleTargetRef:
    apiVersion: opentelemetry.io/v1alpha1
    kind: OpenTelemetryCollector
    name: collector
...

Kubernetes: 1.25.5 (GKE: 1.25.5-gke.2000)
Operator: ghcr.io/open-telemetry/opentelemetry-operator/opentelemetry-operator:0.68.0

I have other working HPA's defined outside of the operator which do have memory scaling, so I am not sure it is an issue with my cluster.

...
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
...
  metrics:
  - resource:
      name: memory
      target:
        averageUtilization: 60
        type: Utilization
    type: Resource
  - resource:
      name: cpu
      target:
        averageUtilization: 60
        type: Utilization
    type: Resource
...
@moh-osman3
Copy link
Contributor

Hmm this is strange, I wonder if there's an issue with updating the HPA when reapplying your new config. Have you tried deleting your namespace entirely and installing the operator and collector again? I am using the same operator version and autoscaler config and my hpa seems to run with no issues

% k get hpa -o yaml             
...
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
...
spec:
  maxReplicas: 10
  metrics:
  - resource:
      name: memory
      target:
        averageUtilization: 80
        type: Utilization
    type: Resource
  - resource:
      name: cpu
      target:
        averageUtilization: 60
        type: Utilization
    type: Resource
  minReplicas: 2
  scaleTargetRef:
    apiVersion: opentelemetry.io/v1alpha1
    kind: OpenTelemetryCollector
    name: lightstep-collector

And I've trigger rescale on memory successfully as well

  Normal   SuccessfulRescale        13m                horizontal-pod-autoscaler  New size: 2; reason: memory resource utilization (percentage of request) above target

Some additional info that would be helpful to know:

  1. What mode are you running the collector in? statefulset or deployment?
  2. What image of the collector are you running?
  3. Does a fresh install of operator and collector with your listed autoscaler config fix the issue (this would let me know that the issue is with updating the hpa when the config changes)?
  4. Any weirdness in the logs for the operator pod?

@jsirianni
Copy link
Member Author

Hmm this is strange, I wonder if there's an issue with updating the HPA when reapplying your new config. Have you tried deleting your namespace entirely and installing the operator and collector again? I am using the same operator version and autoscaler config and my hpa seems to run with no issues

% k get hpa -o yaml             
...
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
...
spec:
  maxReplicas: 10
  metrics:
  - resource:
      name: memory
      target:
        averageUtilization: 80
        type: Utilization
    type: Resource
  - resource:
      name: cpu
      target:
        averageUtilization: 60
        type: Utilization
    type: Resource
  minReplicas: 2
  scaleTargetRef:
    apiVersion: opentelemetry.io/v1alpha1
    kind: OpenTelemetryCollector
    name: lightstep-collector

And I've trigger rescale on memory successfully as well

  Normal   SuccessfulRescale        13m                horizontal-pod-autoscaler  New size: 2; reason: memory resource utilization (percentage of request) above target

Some additional info that would be helpful to know:

1. What mode are you running the collector in? statefulset or deployment?

2. What image of the collector are you running?

3. Does a fresh install of operator and collector with your listed `autoscaler` config fix the issue (this would let me know that the issue is with updating the hpa when the config changes)?

4. Any weirdness in the logs for the operator pod?
  1. I think I can replicate this with both deployment and statefulset. I have both, but I would like to scale the deployment only.
  2. The observIQ collector, I work at observIQ. I can try contrib, but I think our image shouldn't be doing anything funny.
  3. I don't think so, but I did not try this in the cluster in question.
  4. Nothing new in the logs. I seem to get the same messages for each iteration of the collector's configuration.

I was able to replicate this with minikube. I did the following

  1. minikube start --cpus 4 --memory 8g --nodes 1 --kubernetes-version 1.25.5
  2. Install cert manager (following the operator readme)
  3. Install operator (following the operator readme)
  4. Deploy collector with replicas: 2 set
  5. Update the collector by turning on autoscaling but do not specify cpu or memory utilization options
  6. Add targetCPUUtilization option (this works)
  7. Add targetMemoryUtilization option (this does not work).

If I deploy the collector with both utilization options set in the beginning, it works. Seems like updating is the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:collector Issues for deploying collector bug Something isn't working
Projects
None yet
3 participants