Keda HPA scales up stable replicaset to max when doing canary deployments #6058

diogofilipe098 · 2024-08-09T16:43:39Z

Report

We are facing a similar issue as reported here: argoproj/argo-rollouts#2857.
As stated in the argo-rollouts project issue, we use argo-rollouts to deploy our services via canary deployment.
In the services that use the horizontal pod auto scaler with scaling configured specifically for memory limits, we see the stable replica set scale up to max replicas during the deployment and then scale back down after the deployment is completed.

We were watching closely the metrics being reported by our hpa, and never during the deployment did we see the memory limits trespassing the threshold. Below, I inserted an example of the metrics that were being displayed by our hpa (this ones were copied from the argo-rollouts ticket, but the ones that we had were similar):

Metrics:                                                  ( current / target )
  resource memory on pods  (as a percentage of request):  43% (486896088615m) / 70%
  resource cpu on pods  (as a percentage of request):     3% (43m) / 70%

Even though nothing is above the target, we can see that we have events in the hpa (also copied from the argo-rollouts ticket):

Normal  SuccessfulRescale  2m58s (x8 over 2d20h)  horizontal-pod-autoscaler  New size: 13; reason: memory resource utilization (percentage of request) above target
  Normal  SuccessfulRescale  11s (x16 over 2d21h)   horizontal-pod-autoscaler  New size: 15; reason: memory resource utilization (percentage of request) above target

So far, the hpa is working as expected during normal operations and even during canary deployments, but when we use the hpa based on memory percentage, it scales up our fleet of pods to the max value. Is there any way to debug this further?

argo-rollouts version: v1.7.1

Expected Behavior

We expect that the hpa doesn't scale up our deployment replicas during a canary deployment to the maximum number of pods when there's no need to.

Actual Behavior

It scales up the number of pods to our number limit during a canary deployment with memory percentage setting defined in our hpa.

Steps to Reproduce the Problem

Our way to reproduce this issue is to have our keda hpa with hpa_memory_utilization between 70 and 80.

Logs from KEDA operator

No response

KEDA Version

2.13.1

Kubernetes Version

1.28

Platform

Amazon Web Services

Scaler Details

No response

Anything else?

No response

The text was updated successfully, but these errors were encountered:

JorTurFer · 2024-08-12T15:35:58Z

Hello
Thanks for reporting it. KEDA exposes the value to the HPA Controller and it's the HPA controller who scalers the workload, so at this point KEDA doesn't decide the replicaset scaled.

As Argo Rollouts is a CRD that implements /scale I'd say that it's Argo Rollouts controller the responsible for handling the amount of instances based on the desired replicas by the HPA, so probably that repo is the best place to solve the issue. Going further, the original issue in Argo Rollouts repo doesn't use KEDA at all.

I close this issue as it's not KEDA related

atmcarmo · 2024-08-12T15:41:46Z

Hi @JorTurFer , thank you for your answer, really appreciate it. In the logs we can see the message New size: 22; reason: memory resource utilization (percentage of request) above target. Is it possible to have more info on this, maybe increase the verbosity of the logs to understand the exact value the scaler is considering?

Thank you

EDIT: I just saw that this log is not from KEDA itself, please disregard my comment.

diogofilipe098 added the bug Something isn't working label Aug 9, 2024

JorTurFer closed this as not planned Won't fix, can't repro, duplicate, stale Aug 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keda HPA scales up stable replicaset to max when doing canary deployments #6058

Keda HPA scales up stable replicaset to max when doing canary deployments #6058

diogofilipe098 commented Aug 9, 2024

JorTurFer commented Aug 12, 2024

atmcarmo commented Aug 12, 2024 •

edited

Loading

Keda HPA scales up stable replicaset to max when doing canary deployments #6058

Keda HPA scales up stable replicaset to max when doing canary deployments #6058

Comments

diogofilipe098 commented Aug 9, 2024

Report

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Logs from KEDA operator

KEDA Version

Kubernetes Version

Platform

Scaler Details

Anything else?

JorTurFer commented Aug 12, 2024

atmcarmo commented Aug 12, 2024 • edited Loading

atmcarmo commented Aug 12, 2024 •

edited

Loading