Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to add initial delay for Activator Probing time #15316

Closed
shmubara opened this issue Jun 10, 2024 · 6 comments
Closed

How to add initial delay for Activator Probing time #15316

shmubara opened this issue Jun 10, 2024 · 6 comments
Labels
kind/question Further information is requested

Comments

@shmubara
Copy link

Hi Team,

I have a knative Service which will take at least to 20 minutes to become ready. But after deploying the Ksvc we observed after 10 minutes activator is Setting the capacity to Zero and terminating the pod while it is in initializing phase. We have

{"severity":"ERROR","timestamp":"2024-06-05T09:18:38.662189274Z","logger":"activator","caller":"net/revision_backends.go:398","message":"Failed to probe clusterIP 172.20.42.240:80","commit":"e82287d","[knative.dev/controller](http://knative.dev/controller)":"activator","[knative.dev/pod](http://knative.dev/pod)":"activator-695cfc8b75-2tmvv","[knative.dev/key](http://knative.dev/key)":"test-ns/common-ssql-ss-knative-00001","error":"unexpected status code: want [200], got 503","stacktrace":"[knative.dev/serving/pkg/activator/net.(*revisionWatcher).checkDests](http://knative.dev/serving/pkg/activator/net.(*revisionWatcher).checkDests)\n\[tknative.dev/serving/pkg/activator/net/revision_backends.go:398](http://tknative.dev/serving/pkg/activator/net/revision_backends.go:398)\[nknative.dev/serving/pkg/activator/net.(*revisionWatcher).run](http://nknative.dev/serving/pkg/activator/net.(*revisionWatcher).run)\n\[tknative.dev/serving/pkg/activator/net/revision_backends.go:443](http://tknative.dev/serving/pkg/activator/net/revision_backends.go:443)"}
{"severity":"INFO","timestamp":"2024-06-05T09:18:38.732356393Z","logger":"activator","caller":"net/throttler.go:326","message":"Updating Revision Throttler with: clusterIP = <nil>, trackers = 0, backends = 0","commit":"e82287d","[knative.dev/controller](http://knative.dev/controller)":"activator","[knative.dev/pod](http://knative.dev/pod)":"activator-695cfc8b75-2tmvv","[knative.dev/key](http://knative.dev/key)":"test-ns/common-ssql-ss-knative-00001"}
{"severity":"INFO","timestamp":"2024-06-05T09:18:38.732421036Z","logger":"activator","caller":"net/throttler.go:318","message":"Set capacity to 0 (backends: 0, index: 0/1)","commit":"e82287d","[knative.dev/con](http://knative.dev/controller)

We have tried below options but no luck.
scale-to-zero-pod-retention-period: "20m5s"
stable-window: "20m"

but nothing is stopping from pod being terminated during startup.
I can't send the requests during startup time as certificate is not available for my route URL. curl command to route url failing immediately as ssl certificate is not available yet.
Also observed certificate is getting provisioned only after pod is in ready state.

We tried adding the Readiness probes with delay but that did not help.

How can we avoid the pod termination after 10 minutes? is there a way to delay the activator Probing?

@shmubara shmubara added the kind/question Further information is requested label Jun 10, 2024
@ReToCode
Copy link
Member

You should be able to set https://knative.dev/docs/serving/configuration/deployment/#configuring-progress-deadlines which causes Knative to wait longer until the service is initially ready. But keep in mind, 20 minute startup time is not really what Knative was designed for. We are also working on supporting K8s startup probes which probably also would help here.

@dprotaso
Copy link
Member

Sorta related: #13611

@shmubara
Copy link
Author

@ReToCode it worked but getting activator timeout when running curl command like below


curl -H "Content-Type: application/json" -ikv https://XXXX.com 
-d '{      "inputs": "how to install secure agent?",      
              "parameters": {          "max_new_tokens": 400      }}'

We are getting Activator request timeout for initial Curl command after 5 mins as our pod will take approximately 8 mins for going to ready state. how can we increase the activator request timeout to higher value?

@skonto
Copy link
Contributor

skonto commented Jun 11, 2024

I suspect you are doing LLM inference given the curl command. You could use some init container or some other mechanism eg. image, to get the model ready. You could also check KServe.
Btw the default revision timeout is 5 minutes and you can set it accordingly per service.
Just to clarify activator is not making scaling decisions it just reports capacity to the autoscaler and the timeout settings are set to for each revision and then activator and queue proxy pick that up and try to respect it.

@shmubara
Copy link
Author

shmubara commented Jun 11, 2024

@skonto we have already tried revision-timeout-seconds: "600" but it is still timing out after 5 minutes only. It is not working as mentioned in docs

@skonto
Copy link
Contributor

skonto commented Jun 11, 2024

but it is still timing out after 5 minutes only.

Could you paste your knative service and the configuration you have in the related configmap? Also pls turn on the request logging and provide the logs of the activator, queue proxy and the app logs. If you raise the timeout to 10minutes it should be respected afaik, I am not sure about what the model serving runtime is down though, any timeouts there?

It is not working as mentioned in docs

In any case could you pls provide clear steps on how to reproduce it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants