-
Notifications
You must be signed in to change notification settings - Fork 254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ironic health checks do not check against useful requests #1528
Comments
Actually, it looks like |
To repeat what I mentioned on IRC: any meaningful endpoint will require authentication. So we need to make sure the healtcheck script can use it. |
This change is the first step towards metal3-io/baremetal-operator#1528. Through these scripts, we can decouple the validation logic from the pod definition and provide more sophisticated tests in the future. Right now, the same curl command is used (modulo supporting all variants of deploying Ironic). Signed-off-by: Dmitry Tantsur <dtantsur@protonmail.com>
This change is the first step towards metal3-io/baremetal-operator#1528. Through these scripts, we can decouple the validation logic from the pod definition and provide more sophisticated tests in the future. Right now, the same curl command is used (modulo supporting all variants of deploying Ironic). Signed-off-by: Dmitry Tantsur <dtantsur@protonmail.com>
This change is the first step towards metal3-io/baremetal-operator#1528. Through these scripts, we can decouple the validation logic from the pod definition and provide more sophisticated tests in the future. Right now, the same curl command is used (modulo supporting all variants of deploying Ironic). Signed-off-by: Dmitry Tantsur <dtantsur@protonmail.com>
This change is the first step towards metal3-io/baremetal-operator#1528. Through these scripts, we can decouple the validation logic from the pod definition and provide more sophisticated tests in the future. Right now, the same curl command is used (modulo supporting all variants of deploying Ironic). Signed-off-by: Dmitry Tantsur <dtantsur@protonmail.com>
This change is the first step towards metal3-io/baremetal-operator#1528. Through these scripts, we can decouple the validation logic from the pod definition and provide more sophisticated tests in the future. Right now, the same curl command is used (modulo supporting all variants of deploying Ironic). Signed-off-by: Dmitry Tantsur <dtantsur@protonmail.com>
Using them allows us not to care about all possible ways Ironic can be installed. In the future, we can use the mounted secrets to exercise less trivial API resources such as conductors or drivers (see metal3-io#1528). Signed-off-by: Dmitry Tantsur <dtantsur@protonmail.com>
This change is the first step towards metal3-io/baremetal-operator#1528. Through these scripts, we can decouple the validation logic from the pod definition and provide more sophisticated tests in the future. Right now, the same curl command is used (modulo supporting all variants of deploying Ironic). Signed-off-by: Dmitry Tantsur <dtantsur@protonmail.com>
Using them allows us not to care about all possible ways Ironic can be installed. In the future, we can use the mounted secrets to exercise less trivial API resources such as conductors or drivers (see metal3-io#1528). Signed-off-by: Dmitry Tantsur <dtantsur@protonmail.com>
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with /lifecycle stale |
/remove-lifecycle stale With metal3-io-bot/ironic-image@e44c4f7, we now have a path forward. We also need to finish the discussion around #1685 since it affects how we get the credentials. |
/kind feature |
@dtantsur: Please ensure the request meets the requirements listed here. If this request no longer meets these requirements, the label can be removed In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with /lifecycle stale |
/remove-lifecycle stale |
What steps did you take and what happened:
By any method, cause the connection to the database to fail. Even though any requests to do actual work will fail, the deployment will be seen as live and ready because the health checks are only checking to see that the base URL is responding which it can do even if the internal connections are down. For instance, if one were to attempt to connect to
http://127.0.0.1:6385/v1/nodes/
or other such endpoints, there may be an error but Kubernetes does not know about it.What did you expect to happen:
Ideally, the liveness probe should detect that at least some other endpoint is successful which relies upon the database connection note that the Ironic instance is not healthy.
Anything else you would like to add:
Unfortunately, I have noticed that there are various occasions when Ironic, for various reasons, may fail to be able to connect to the database. In the past I have seen this caused by the database itself having issues as well as other issues related directly to the running instance of the Ironic API. In most cases, simply restarting Ironic has resolved the issue. Regardless, if the backend is unavailable, Ironic serves little utility. Therefore, I recommend changing the
livenessProbe
to check/v1/nodes/
rather than just/
. The same may be true of the Inspector as well, by adding/v1/rules
.Environment:
main
currently./kind bug
The text was updated successfully, but these errors were encountered: