-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
STAN scaler does not scale from 0 #519
Comments
Thanks @cwoolum for spotting this one. Its strange that I didn't see this one, sure tested it will min replicas 0. Anyway thanks! |
I think i know how i missed this one. My initial deployment has a replica of 1. So atleast there is a registered subscriber. Once you get past that it can scale from 0 afterwards. |
If it scales to zero though, the subscription will go away if you are draining and closing the connection. Even if you don't drain and close, the subscription will drop off after a short period(5 minutes?) And that prevents it from scaling back up. |
I was testing it with this program https://github.com/balchua/gonuts. Subscriber seems to stick around. |
https://github.com/balchua/gonuts/blob/c0ce3d6448fc1dafeaf5695136d008a04a696873/sub/main.go#L150 You are calling If your service scales up to 20 instances, you'll have 20 subscribers in STAN. Now say you scale back down to 1, each of those instance need to unsubscribe so that STAN doesn't keep trying to deliver messages to them, right? I could be thinking about this the wrong way though. |
Thanks for the explanation.👍 I remember setting the |
I do use I saw that one of the newer versions of STAN/NATS does have a keepalive that it sends periodically to make sure a subscriber is still listening but I'm not sure of the relationship between a connection being open and a subscription existing for it. I'm going to open up an issue in the STAN repo and see if the team can provide some best practices on it. |
I was able to find the docs for this. It looks like I was doing it wrong. For durable subscriptions, you don't want to Unsubscribe but only Close the connection.
I'll try removing the |
@cwoolum thanks for checking the docs. Did it work for your when you removed the |
I see the subscriptions are still there but I'm not sure if the messages are getting delivered successfully now. I'm seeing a large number of
|
After the scale down occurred, the number of subscribers dropped to 1. I'm not sure the time it took for that to happen though. I opened an issue to ask the best practices for this. |
As far as i know stan will direct the pending messages to available subscribers. But how long before it takes that decision. |
I'm going to keep testing to make sure none of the messages are getting stuck. I keep running into the error |
When this error shows up does it stop scaling? Care to show what |
@cwoolum could you please decribe steps leading to this error + more details on ScaledObject and HPA. thanks |
Okay, the modified issue was due to an old CPU based HPA conflicting with the Keda Scaler but I was able to recreate the issue with the pod not scaling from 0. Here are the requested items.
Here is the response from STAN:
And here are the logs from KEDA
There are messages pending for this subscription but Keda doesn't seem to be picking them up. The pods do call close on the NATS and STAN connections but do not Unsubscribe. |
@cwoolum i think i know what the issue is, the stan scaler actually uses the channels' |
I think that makes sense. I created PR #533 to make the change. It also adds additional logging to help diagnose issues. |
Yes, I just did a full deploy using the master image tag and scale up/down is working for me. |
Thanks @cwoolum |
A clear and concise description of what the bug is.
Expected Behavior
When there messages in the queue and the HPA is scaled down to 0, the scaler should create at least one instance of a pod to create the subscription with STAN.
Actual Behavior
When the loop runs, The logic checks if the pending count of messages is greater than 0 and that the queue names match. It never takes into account if the subscription does not exist.
Steps to Reproduce the Problem
ScaledObject
that can scale to 0msgs
field because there are no active subscriptions.minReplicaCount
is set to 1, scaling works normally.Specifications
The text was updated successfully, but these errors were encountered: