CP /healthy should wait for all components to be ready and healthy #1001

jakubdyszkiewicz · 2020-08-31T15:08:36Z

Summary

Currently, we expose :5680/healthy endpoint for health checks. The problem is that we spin up many components (API, XDS, SDS, KDS, DNS etc.) concurrently. This health check should only return ready if all components were started and are healthy (!)

We see this problem in test app/kuma-cp/cmd/run_test.go:81 when we spin up CP and shut down too soon and then DNS Server complains that Server was not started.

The text was updated successfully, but these errors were encountered:

lahabana · 2021-08-30T13:10:08Z

This causes issues as well when adding tests that rely and restarting the cp. We have no way to ensure that we can create a deployment and the webhook will work.

Also we should also have a difference between "healthy" and "ready" I think what we're talking about here is a readiness check and not healthiness.

Disable the tests for the moment because of kumahq#1001 Signed-off-by: Charly Molter <charly.molter@konghq.com>

lahabana · 2021-08-30T17:05:59Z

Looking at this quickly it feels like the right approach is to add a OnReady cb on the component api and have the component manager do the link between this and the diagnosticsServer would return a json with the state of each component.

The diagnostics server will then be able to change the status code of the Ready endpoints accurately but also provides some useful info for troubleshooting.

The ready endpoint will also need to go to unhealthy as soon as the stop channel is closed.

jpeach · 2021-09-01T00:38:54Z

Looking at this quickly it feels like the right approach is to add a OnReady cb on the component api and have the component manager do the link between this and the diagnosticsServer would return a json with the state of each component.

Would the OnReady only fire once? I would expect that each request to /health would actually poll all the components.

* test(kds): Add test for KDS when restarting CP These tests might be unstable because of: #1001 Signed-off-by: Charly Molter <charly.molter@konghq.com>

* test(kds): Add test for KDS when restarting CP These tests might be unstable because of: #1001 Signed-off-by: Charly Molter <charly.molter@konghq.com> (cherry picked from commit 80c63ad)

These tests might be unstable because of: #1001 Signed-off-by: Charly Molter <charly.molter@konghq.com> (cherry picked from commit 80c63ad) Co-authored-by: Charly Molter <charly.molter@konghq.com>

* test(kds): Add test for KDS when restarting CP These tests might be unstable because of: kumahq#1001 Signed-off-by: Charly Molter <charly.molter@konghq.com>

github-actions · 2021-11-24T08:01:53Z

This issue was inactive for 30 days it will be reviewed in the next triage meeting and might be closed.
If you think this issue is still relevant please comment on it promptly or attend the next triage meeting.

github-actions · 2022-05-15T08:01:05Z

This issue was inactive for 30 days it will be reviewed in the next triage meeting and might be closed.
If you think this issue is still relevant please comment on it promptly or attend the next triage meeting.

github-actions · 2022-06-16T08:04:28Z

This issue was inactive for 30 days it will be reviewed in the next triage meeting and might be closed.
If you think this issue is still relevant please comment on it promptly or attend the next triage meeting.

github-actions · 2022-07-17T08:00:49Z

This issue was inactive for 30 days it will be reviewed in the next triage meeting and might be closed.
If you think this issue is still relevant please comment on it promptly or attend the next triage meeting.

github-actions · 2022-10-17T08:08:37Z