-
Notifications
You must be signed in to change notification settings - Fork 448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vizier-core stuck in CrashLoopBackoff due to failed pod checks #322
Comments
@pdmack My deployment looks ok with 0.4.0. Did you change anything in your environment? Btw, are other Katib pods up? |
Yes the other vizier pods are running. Is this a local check within the pod that the |
@pdmack
|
This is OpenShift 3.11 with generous permissions, but I'm thinking there's something subtle I'm missing. A standalone 3.11 env (AIO) doesn't exhibit this problem. |
@johnugeorge yeah turned out to be vizier-db was the culprit although it reported as Running. I got around this with permission modifications for the storage provisioner backing store. I'll try to remember to file something for the health/ready checks on vizier-db. |
#270 implemented gRPC health checking but I'm trying to make sense of why vizier-core (0.4.0) falls into CLB due to the failed readiness/liveness checks in my master+3 compute node deployment.
The text was updated successfully, but these errors were encountered: