Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keda HPA scales up stable replicaset to max when doing canary deployments #6058

Closed
diogofilipe098 opened this issue Aug 9, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@diogofilipe098
Copy link

Report

We are facing a similar issue as reported here: argoproj/argo-rollouts#2857.
As stated in the argo-rollouts project issue, we use argo-rollouts to deploy our services via canary deployment.
In the services that use the horizontal pod auto scaler with scaling configured specifically for memory limits, we see the stable replica set scale up to max replicas during the deployment and then scale back down after the deployment is completed.

We were watching closely the metrics being reported by our hpa, and never during the deployment did we see the memory limits trespassing the threshold. Below, I inserted an example of the metrics that were being displayed by our hpa (this ones were copied from the argo-rollouts ticket, but the ones that we had were similar):

Metrics:                                                  ( current / target )
  resource memory on pods  (as a percentage of request):  43% (486896088615m) / 70%
  resource cpu on pods  (as a percentage of request):     3% (43m) / 70% 

Even though nothing is above the target, we can see that we have events in the hpa (also copied from the argo-rollouts ticket):

Normal  SuccessfulRescale  2m58s (x8 over 2d20h)  horizontal-pod-autoscaler  New size: 13; reason: memory resource utilization (percentage of request) above target
  Normal  SuccessfulRescale  11s (x16 over 2d21h)   horizontal-pod-autoscaler  New size: 15; reason: memory resource utilization (percentage of request) above target

So far, the hpa is working as expected during normal operations and even during canary deployments, but when we use the hpa based on memory percentage, it scales up our fleet of pods to the max value. Is there any way to debug this further?

argo-rollouts version: v1.7.1

Expected Behavior

We expect that the hpa doesn't scale up our deployment replicas during a canary deployment to the maximum number of pods when there's no need to.

Actual Behavior

It scales up the number of pods to our number limit during a canary deployment with memory percentage setting defined in our hpa.

Steps to Reproduce the Problem

Our way to reproduce this issue is to have our keda hpa with hpa_memory_utilization between 70 and 80.

Logs from KEDA operator

No response

KEDA Version

2.13.1

Kubernetes Version

1.28

Platform

Amazon Web Services

Scaler Details

No response

Anything else?

No response

@diogofilipe098 diogofilipe098 added the bug Something isn't working label Aug 9, 2024
@JorTurFer
Copy link
Member

Hello
Thanks for reporting it. KEDA exposes the value to the HPA Controller and it's the HPA controller who scalers the workload, so at this point KEDA doesn't decide the replicaset scaled.

As Argo Rollouts is a CRD that implements /scale I'd say that it's Argo Rollouts controller the responsible for handling the amount of instances based on the desired replicas by the HPA, so probably that repo is the best place to solve the issue. Going further, the original issue in Argo Rollouts repo doesn't use KEDA at all.

I close this issue as it's not KEDA related

@JorTurFer JorTurFer closed this as not planned Won't fix, can't repro, duplicate, stale Aug 12, 2024
@atmcarmo
Copy link

atmcarmo commented Aug 12, 2024

Hi @JorTurFer , thank you for your answer, really appreciate it. In the logs we can see the message New size: 22; reason: memory resource utilization (percentage of request) above target. Is it possible to have more info on this, maybe increase the verbosity of the logs to understand the exact value the scaler is considering?

Thank you

EDIT: I just saw that this log is not from KEDA itself, please disregard my comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Ready To Ship
Development

No branches or pull requests

3 participants