Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cortex components should have readiness check endpoints #784

Closed
csmarchbanks opened this issue Apr 4, 2018 · 7 comments
Closed

Cortex components should have readiness check endpoints #784

csmarchbanks opened this issue Apr 4, 2018 · 7 comments
Labels
keepalive Skipped by stale bot type/production Issues related to the production use of Cortex, inc. configuration, alerting and operating.

Comments

@csmarchbanks
Copy link
Contributor

In order to facilitate kubernetes self healing it would be great to have health checks present in the cortex components.

It would be great to collect some ideas of what good health checks are for different services here.

I am not working on this currently, but may in the future.

@tomwilkie
Copy link
Contributor

tomwilkie commented Apr 5, 2018 via email

@csmarchbanks
Copy link
Contributor Author

I agree that liveness checks are a bit heavy handed much of the time, and it would be great to be able to diagnose them or improve Cortex to be able to handle the failure mode more gracefully. I think readiness checks would be ideal, that way the failing component won't be served any requests.

The one exception to this is the ruler, since there is no HA yet. A liveness check might be nice to reduce customer impact.

@tomwilkie
Copy link
Contributor

Just had a instance of the distributor HTTP server just stop accepting requests, and adding a liveness check would have detected this and restarted it. Also, weaveworks/common#92 would hopefully show us the error and exit gracefully.

@tomwilkie tomwilkie added the type/production Issues related to the production use of Cortex, inc. configuration, alerting and operating. label Aug 24, 2018
@gouthamve gouthamve changed the title Cortex components should have health check endpoints Cortex components should have readiness check endpoints Nov 11, 2019
@gouthamve
Copy link
Contributor

Liveness is bad, as referenced here: https://srcco.de/posts/kubernetes-liveness-probes-are-dangerous.html

But we should do readiness in distributors.

@csmarchbanks
Copy link
Contributor Author

Liveness is not always bad, though may or may not be correct for Cortex components. As mentioned above there are cases where a liveness check could help as well.

@stale
Copy link

stale bot commented Feb 3, 2020

This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Feb 3, 2020
@pracucci pracucci added keepalive Skipped by stale bot and removed stale labels Feb 3, 2020
@gouthamve
Copy link
Contributor

Closed in #2166

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
keepalive Skipped by stale bot type/production Issues related to the production use of Cortex, inc. configuration, alerting and operating.
Projects
None yet
Development

No branches or pull requests

4 participants