Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CP /healthy should wait for all components to be ready and healthy #1001

Open
jakubdyszkiewicz opened this issue Aug 31, 2020 · 16 comments
Open
Labels
area/kuma-cp triage/accepted The issue was reviewed and is complete enough to start working on it

Comments

@jakubdyszkiewicz
Copy link
Contributor

Summary

Currently, we expose :5680/healthy endpoint for health checks. The problem is that we spin up many components (API, XDS, SDS, KDS, DNS etc.) concurrently. This health check should only return ready if all components were started and are healthy (!)

We see this problem in test app/kuma-cp/cmd/run_test.go:81 when we spin up CP and shut down too soon and then DNS Server complains that Server was not started.

@lahabana
Copy link
Contributor

This causes issues as well when adding tests that rely and restarting the cp. We have no way to ensure that we can create a deployment and the webhook will work.

Also we should also have a difference between "healthy" and "ready" I think what we're talking about here is a readiness check and not healthiness.

lahabana added a commit to lahabana/kuma that referenced this issue Aug 30, 2021
Disable the tests for the moment because of
kumahq#1001

Signed-off-by: Charly Molter <charly.molter@konghq.com>
lahabana added a commit to lahabana/kuma that referenced this issue Aug 30, 2021
Disable the tests for the moment because of
kumahq#1001

Signed-off-by: Charly Molter <charly.molter@konghq.com>
lahabana added a commit to lahabana/kuma that referenced this issue Aug 30, 2021
Disable the tests for the moment because of
kumahq#1001

Signed-off-by: Charly Molter <charly.molter@konghq.com>
lahabana added a commit to lahabana/kuma that referenced this issue Aug 30, 2021
Disable the tests for the moment because of
kumahq#1001

Signed-off-by: Charly Molter <charly.molter@konghq.com>
@lahabana
Copy link
Contributor

Looking at this quickly it feels like the right approach is to add a OnReady cb on the component api and have the component manager do the link between this and the diagnosticsServer would return a json with the state of each component.

The diagnostics server will then be able to change the status code of the Ready endpoints accurately but also provides some useful info for troubleshooting.

The ready endpoint will also need to go to unhealthy as soon as the stop channel is closed.

@jpeach
Copy link
Contributor

jpeach commented Sep 1, 2021

Looking at this quickly it feels like the right approach is to add a OnReady cb on the component api and have the component manager do the link between this and the diagnosticsServer would return a json with the state of each component.

Would the OnReady only fire once? I would expect that each request to /health would actually poll all the components.

lahabana added a commit that referenced this issue Sep 3, 2021
* test(kds): Add test for KDS when restarting CP

These tests might be unstable because of: #1001

Signed-off-by: Charly Molter <charly.molter@konghq.com>
mergify bot pushed a commit that referenced this issue Sep 3, 2021
* test(kds): Add test for KDS when restarting CP

These tests might be unstable because of: #1001

Signed-off-by: Charly Molter <charly.molter@konghq.com>
(cherry picked from commit 80c63ad)
lahabana added a commit that referenced this issue Sep 6, 2021
These tests might be unstable because of: #1001

Signed-off-by: Charly Molter <charly.molter@konghq.com>
(cherry picked from commit 80c63ad)

Co-authored-by: Charly Molter <charly.molter@konghq.com>
nikita15p pushed a commit to nikita15p/kuma that referenced this issue Sep 28, 2021
* test(kds): Add test for KDS when restarting CP

These tests might be unstable because of: kumahq#1001

Signed-off-by: Charly Molter <charly.molter@konghq.com>
nikita15p pushed a commit to nikita15p/kuma that referenced this issue Sep 28, 2021
* test(kds): Add test for KDS when restarting CP

These tests might be unstable because of: kumahq#1001

Signed-off-by: Charly Molter <charly.molter@konghq.com>
@github-actions github-actions bot added the triage/stale Inactive for some time. It will be triaged again label Nov 24, 2021
@github-actions
Copy link
Contributor

This issue was inactive for 30 days it will be reviewed in the next triage meeting and might be closed.
If you think this issue is still relevant please comment on it promptly or attend the next triage meeting.

@lahabana lahabana added triage/accepted The issue was reviewed and is complete enough to start working on it area/kuma-cp and removed triage/stale Inactive for some time. It will be triaged again labels Apr 14, 2022
@github-actions github-actions bot added the triage/stale Inactive for some time. It will be triaged again label May 15, 2022
@github-actions
Copy link
Contributor

This issue was inactive for 30 days it will be reviewed in the next triage meeting and might be closed.
If you think this issue is still relevant please comment on it promptly or attend the next triage meeting.

@lahabana lahabana removed the triage/stale Inactive for some time. It will be triaged again label May 16, 2022
@github-actions
Copy link
Contributor

This issue was inactive for 30 days it will be reviewed in the next triage meeting and might be closed.
If you think this issue is still relevant please comment on it promptly or attend the next triage meeting.

@github-actions github-actions bot added the triage/stale Inactive for some time. It will be triaged again label Jun 16, 2022
@lahabana lahabana removed the triage/stale Inactive for some time. It will be triaged again label Jun 16, 2022
@github-actions
Copy link
Contributor

This issue was inactive for 30 days it will be reviewed in the next triage meeting and might be closed.
If you think this issue is still relevant please comment on it promptly or attend the next triage meeting.

@github-actions github-actions bot added the triage/stale Inactive for some time. It will be triaged again label Jul 17, 2022
@github-actions
Copy link
Contributor

This issue was inactive for 90 days. It will be reviewed in the next triage meeting and might be closed.
If you think this issue is still relevant, please comment on it or attend the next triage meeting.

@github-actions github-actions bot added the triage/stale Inactive for some time. It will be triaged again label Oct 17, 2022
@lahabana lahabana removed the triage/stale Inactive for some time. It will be triaged again label Oct 17, 2022
@github-actions github-actions bot added the triage/stale Inactive for some time. It will be triaged again label Jan 17, 2023
@github-actions
Copy link
Contributor

This issue was inactive for 90 days. It will be reviewed in the next triage meeting and might be closed.
If you think this issue is still relevant, please comment on it or attend the next triage meeting.

@lahabana lahabana removed the triage/stale Inactive for some time. It will be triaged again label Jan 17, 2023
@github-actions github-actions bot added the triage/stale Inactive for some time. It will be triaged again label Apr 18, 2023
@github-actions
Copy link
Contributor

This issue was inactive for 90 days. It will be reviewed in the next triage meeting and might be closed.
If you think this issue is still relevant, please comment on it or attend the next triage meeting.

@lahabana lahabana removed the triage/stale Inactive for some time. It will be triaged again label Apr 18, 2023
@github-actions github-actions bot added the triage/stale Inactive for some time. It will be triaged again label Jul 18, 2023
@github-actions
Copy link
Contributor

This issue was inactive for 90 days. It will be reviewed in the next triage meeting and might be closed.
If you think this issue is still relevant, please comment on it or attend the next triage meeting.

@michaelbeaumont michaelbeaumont removed the triage/stale Inactive for some time. It will be triaged again label Jul 18, 2023
@github-actions
Copy link
Contributor

This issue was inactive for 90 days. It will be reviewed in the next triage meeting and might be closed.
If you think this issue is still relevant, please comment on it or attend the next triage meeting.

@github-actions github-actions bot added the triage/stale Inactive for some time. It will be triaged again label Oct 17, 2023
@lahabana lahabana removed the triage/stale Inactive for some time. It will be triaged again label Oct 17, 2023
@github-actions github-actions bot added the triage/stale Inactive for some time. It will be triaged again label Jan 16, 2024
Copy link
Contributor

This issue was inactive for 90 days. It will be reviewed in the next triage meeting and might be closed.
If you think this issue is still relevant, please comment on it or attend the next triage meeting.

@jakubdyszkiewicz jakubdyszkiewicz removed the triage/stale Inactive for some time. It will be triaged again label Jan 22, 2024
lahabana added a commit to lahabana/kuma that referenced this issue Mar 15, 2024
- Abstract some common http startup code.
- Add a ReadyComponent interface and implement it for
  api-server,dp-server and mads
- Use this readyComponent to return CP readiness in `/ready` probe

part of kumahq#1001

Signed-off-by: Charly Molter <charly.molter@konghq.com>
@github-actions github-actions bot added the triage/stale Inactive for some time. It will be triaged again label Apr 22, 2024
Copy link
Contributor

This issue was inactive for 90 days. It will be reviewed in the next triage meeting and might be closed.
If you think this issue is still relevant, please comment on it or attend the next triage meeting.

@lobkovilya lobkovilya removed the triage/stale Inactive for some time. It will be triaged again label Apr 22, 2024
@github-actions github-actions bot added the triage/stale Inactive for some time. It will be triaged again label Jul 22, 2024
Copy link
Contributor

This issue was inactive for 90 days. It will be reviewed in the next triage meeting and might be closed.
If you think this issue is still relevant, please comment on it or attend the next triage meeting.

@michaelbeaumont michaelbeaumont removed the triage/stale Inactive for some time. It will be triaged again label Jul 23, 2024
@github-actions github-actions bot added the triage/stale Inactive for some time. It will be triaged again label Oct 22, 2024
Copy link
Contributor

This issue was inactive for 90 days. It will be reviewed in the next triage meeting and might be closed.
If you think this issue is still relevant, please comment on it or attend the next triage meeting.

@michaelbeaumont michaelbeaumont removed the triage/stale Inactive for some time. It will be triaged again label Oct 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kuma-cp triage/accepted The issue was reviewed and is complete enough to start working on it
Projects
None yet
Development

No branches or pull requests

5 participants