UnPause does not happen immediately #4642

alt-dima · 2023-06-02T14:59:40Z

Report

We have about 500 ScaledObjects.
Some with CPU, some with RabbitMQ and some with Kafka scalers (only one scaler per ScaledObject)

Pausing of ScaledObjects happens immediately after autoscaling.keda.sh/paused-replicas annotation added to ScaledObject:

In the keda-operator immediately appear Reconcile message
In the keda-operator immediately appear HPA updated message
HPA actually paused

But unPausing does not happen same way.
For ScaledObject with CPU scaler it may happen in couple of minutes
For ScaledObject with Kafka/RabbitMQ scaler it may never happen or happen with a big delay (tens of minutes)

Expected Behavior

Reconcile for unpausing object should happen immediately right after autoscaling.keda.sh/paused-replicas annotation was removed from ScaledObject.
HPA should be "unpaused" (scaling restored)

Actual Behavior

autoscaling.keda.sh/paused-replicas annotation removed from ScaledObject
Reconciling ScaledObject not happen
HPA remains "paused" (with annotation autoscaling.keda.sh/paused-replicas and min=max=paused-replicas)

Steps to Reproduce the Problem

Create a ScaledObject with Kafka scaler
Pause it kubectl annotate scaledObject test-workload --overwrite autoscaling.keda.sh/paused-replicas=2
Check HPA for test-workload = it was paused (annotated and min=max=2)
UnPause it kubectl annotate scaledObject test-workload --overwrite autoscaling.keda.sh/paused-replicas-
Check HPA for test-workload = it still paused (annotated and min=max=2)
Check ScaledObject for test-workload = annotation removed!

Logs from KEDA operator

example

KEDA Version

2.10.1

Kubernetes Version

1.24

Platform

Amazon Web Services

Scaler Details

Kafka, RabbitMQ, CPU

Anything else?

Also interesting, why Reconcile for CPU-based ScaledObjects happens (based on the keda-operator logs) very often?
And for Kafka/RabbitMQ Reconcile happens very rarery.

Maybe actual problem in this behavior? Keda does not do immediately UnPause, but periodically scans all the ScaledObjects and "refresh/realign" HPA?

The text was updated successfully, but these errors were encountered:

alt-dima · 2023-06-02T15:12:51Z

alt-dima · 2023-06-02T15:45:12Z

I tried to restart keda-operator and in 1-2 minutes after restart it updated/unpaused the HPA.

So it looks like the are 2 problems

some loop over all scaledObjects (to refresh all of them?) does not reach some workloads (seems only Kafka and RabbitMQ are affected, because for CPU-based I do see Reconcile messages very often ).
missing the logic to execute Reconcile after annotation removed from ScaledObject

zroubalik · 2023-06-21T11:29:13Z

There's new implementation of pausing ScaledObject coming in 2.11: #4550

this should resolve this issue

zroubalik · 2023-06-21T13:43:26Z

there was a typo in the orignal post, missing the 0 - #4550

alt-dima · 2023-06-25T12:54:06Z

Updated to 2.11.
But it remains the same.
Paused (to 0) => hpa removed almost immediately.
Un-Paused (removed annotation) => after 10 minutes it is still paused (hpa still not created and amount of pods = 0)

zroubalik · 2023-06-26T06:34:46Z

@alt-dima you are hitting a new regression, that has been introduced in this version. We will release 2.11.1 very soon, which fixes this. #4734

tomkerkhove · 2023-06-28T07:25:49Z

This will be shipped as part of #4743.

Closing this issue since it was fixed already

alt-dima · 2023-06-29T15:03:52Z

For history.
With version 2.11.1 it works fine! UnPause happens immediately!
Thank you.

alt-dima added the bug Something isn't working label Jun 2, 2023

keda-automation added this to Roadmap - KEDA Core Jun 2, 2023

github-project-automation bot moved this to To Triage in Roadmap - KEDA Core Jun 2, 2023

tomkerkhove moved this from To Triage to To Do in Roadmap - KEDA Core Jun 5, 2023

tomkerkhove closed this as not planned Won't fix, can't repro, duplicate, stale Jun 28, 2023

github-project-automation bot moved this from To Do to Ready To Ship in Roadmap - KEDA Core Jun 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UnPause does not happen immediately #4642

UnPause does not happen immediately #4642

alt-dima commented Jun 2, 2023

alt-dima commented Jun 2, 2023

alt-dima commented Jun 2, 2023 •

edited

Loading

zroubalik commented Jun 21, 2023 •

edited

Loading

zroubalik commented Jun 21, 2023

alt-dima commented Jun 25, 2023

zroubalik commented Jun 26, 2023 •

edited

Loading

tomkerkhove commented Jun 28, 2023

alt-dima commented Jun 29, 2023

UnPause does not happen immediately #4642

UnPause does not happen immediately #4642

Comments

alt-dima commented Jun 2, 2023

Report

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Logs from KEDA operator

KEDA Version

Kubernetes Version

Platform

Scaler Details

Anything else?

alt-dima commented Jun 2, 2023

alt-dima commented Jun 2, 2023 • edited Loading

zroubalik commented Jun 21, 2023 • edited Loading

zroubalik commented Jun 21, 2023

alt-dima commented Jun 25, 2023

zroubalik commented Jun 26, 2023 • edited Loading

tomkerkhove commented Jun 28, 2023

alt-dima commented Jun 29, 2023

alt-dima commented Jun 2, 2023 •

edited

Loading

zroubalik commented Jun 21, 2023 •

edited

Loading

zroubalik commented Jun 26, 2023 •

edited

Loading