-
Notifications
You must be signed in to change notification settings - Fork 674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
liveness and readiness checks for kubernetes #390
Comments
@nwest1 indeed. We had Regarding MQTT - it is only NodeJS microservice, as we could not find adequate Go candidate for MQTT broker. And it is indeed missing We are opened for all propositions regarding this healthcheck endpoint. Also, if you have some code ready for JS microservice please send PR, otherwise someone from Mainflux team will take a look at this early next week. |
The liveness and readiness probes are simple HTTP GET or TCP endpoints which return status codes between 200 and 400 in case of success, every other status code is failure. Any code greater than or equal to 200 and less than 400 indicates success. The body of the response is optional and it is normally used to bring more context about the service to the user and describe what went wrong. More detail here. Liveness means that the service is running properly (it should return false in case of fatal error for example) and readiness that it can accept traffic (ready to process requests). A simple example in our case would be the users service. When it is started it can return liveness 200, but it won't return readiness 200 until it has a connection to the DB is established. If it the service ends up in unrecoverable state the liveness probe would return false. Another example is that most of our services are dependant on NATS. If NATS is not there the services end up in unrecoverable state and can not work. In such cases the service can try to reconnect few times and if it fails to reconnect set the liveness probe to false so that kubernetes restarts the service. At the moment the services are trying to connect to NATS at start and if they can not we exit. I think that we don't cover cases when NATS is running when the service starts and is shuts down while the service is running. I suggest as a beginning, that we define in detail what are the dependencies (infrastructure and other Mainflux services) for each service and the ways we can check if the dependencies are healthy. Most of this is already known (docker-compose, k8s service configs), we only need to think about is how to check the health of the dependencies and combine them together with the service internal states to set the proper liveness and readiness status. @drasko I'll pick this issue. It would be nice if someone else joins me at least in the definition and structuring the solution. |
Also related to #378 Closing this one. |
Resolved with #378. |
FEATURE REQUEST
Hello!
I think this is a should-have - we can use
/version
endpoints for liveness probes but want to discuss what makes sense for readiness (if anything.)The only adapter that lacks this is mqtt at this point.
The text was updated successfully, but these errors were encountered: