Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnPause does not happen immediately #4642

Closed
alt-dima opened this issue Jun 2, 2023 · 8 comments
Closed

UnPause does not happen immediately #4642

alt-dima opened this issue Jun 2, 2023 · 8 comments
Labels
bug Something isn't working

Comments

@alt-dima
Copy link

alt-dima commented Jun 2, 2023

Report

We have about 500 ScaledObjects.
Some with CPU, some with RabbitMQ and some with Kafka scalers (only one scaler per ScaledObject)

Pausing of ScaledObjects happens immediately after autoscaling.keda.sh/paused-replicas annotation added to ScaledObject:

  1. In the keda-operator immediately appear Reconcile message
  2. In the keda-operator immediately appear HPA updated message
  3. HPA actually paused

But unPausing does not happen same way.
For ScaledObject with CPU scaler it may happen in couple of minutes
For ScaledObject with Kafka/RabbitMQ scaler it may never happen or happen with a big delay (tens of minutes)

Expected Behavior

  1. Reconcile for unpausing object should happen immediately right after autoscaling.keda.sh/paused-replicas annotation was removed from ScaledObject.
  2. HPA should be "unpaused" (scaling restored)

Actual Behavior

  1. autoscaling.keda.sh/paused-replicas annotation removed from ScaledObject
  2. Reconciling ScaledObject not happen
  3. HPA remains "paused" (with annotation autoscaling.keda.sh/paused-replicas and min=max=paused-replicas)

Steps to Reproduce the Problem

  1. Create a ScaledObject with Kafka scaler
  2. Pause it kubectl annotate scaledObject test-workload --overwrite autoscaling.keda.sh/paused-replicas=2
  3. Check HPA for test-workload = it was paused (annotated and min=max=2)
  4. UnPause it kubectl annotate scaledObject test-workload --overwrite autoscaling.keda.sh/paused-replicas-
  5. Check HPA for test-workload = it still paused (annotated and min=max=2)
  6. Check ScaledObject for test-workload = annotation removed!

Logs from KEDA operator

example

KEDA Version

2.10.1

Kubernetes Version

1.24

Platform

Amazon Web Services

Scaler Details

Kafka, RabbitMQ, CPU

Anything else?

Also interesting, why Reconcile for CPU-based ScaledObjects happens (based on the keda-operator logs) very often?
And for Kafka/RabbitMQ Reconcile happens very rarery.

Maybe actual problem in this behavior? Keda does not do immediately UnPause, but periodically scans all the ScaledObjects and "refresh/realign" HPA?

@alt-dima alt-dima added the bug Something isn't working label Jun 2, 2023
@alt-dima
Copy link
Author

alt-dima commented Jun 2, 2023

image
image
image

@alt-dima
Copy link
Author

alt-dima commented Jun 2, 2023

I tried to restart keda-operator and in 1-2 minutes after restart it updated/unpaused the HPA.

So it looks like the are 2 problems

  1. some loop over all scaledObjects (to refresh all of them?) does not reach some workloads (seems only Kafka and RabbitMQ are affected, because for CPU-based I do see Reconcile messages very often ).
  2. missing the logic to execute Reconcile after annotation removed from ScaledObject

@tomkerkhove tomkerkhove moved this from To Triage to To Do in Roadmap - KEDA Core Jun 5, 2023
@zroubalik
Copy link
Member

zroubalik commented Jun 21, 2023

There's new implementation of pausing ScaledObject coming in 2.11: #4550

this should resolve this issue

@zroubalik
Copy link
Member

there was a typo in the orignal post, missing the 0 - #4550

@alt-dima
Copy link
Author

Updated to 2.11.
But it remains the same.
Paused (to 0) => hpa removed almost immediately.
Un-Paused (removed annotation) => after 10 minutes it is still paused (hpa still not created and amount of pods = 0)

image

@zroubalik
Copy link
Member

zroubalik commented Jun 26, 2023

@alt-dima you are hitting a new regression, that has been introduced in this version. We will release 2.11.1 very soon, which fixes this. #4734

@tomkerkhove
Copy link
Member

This will be shipped as part of #4743.

Closing this issue since it was fixed already

@tomkerkhove tomkerkhove closed this as not planned Won't fix, can't repro, duplicate, stale Jun 28, 2023
@github-project-automation github-project-automation bot moved this from To Do to Ready To Ship in Roadmap - KEDA Core Jun 28, 2023
@alt-dima
Copy link
Author

For history.
With version 2.11.1 it works fine! UnPause happens immediately!
Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

No branches or pull requests

3 participants