-
Notifications
You must be signed in to change notification settings - Fork 543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the ability to wait for a period of time after SIGTERM #3298
Conversation
d62015b
to
b0df93b
Compare
fd5410e
to
55fd8d9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes in logic look good and may be useful. I think this PR suffers from too many comments. There is no need to write comments that describe exact logic in the code which is quite trivial (eg. comment for shutdownSignalReceiver
type in signals.go)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks a nice feature, I'm up to it! I left few minor comments.
Extend the gRPC HealthCheck struct to allow it to make use of multiple types of health checks such as a services.Manager or atomic.Bool shutdown request. See grafana/mimir#3298 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
Thank you for the feedback! I've opened grafana/dskit#227 to move the gRPC health checks into dskit and I'll work through the rest of the changes. |
Adds the ability to supply a "shutdown delay" to Mimir components such that they will disable HTTP keep-alives and mark themselves as not ready when receiving SIGTERM or SIGINT (via HTTP /ready or gRPC) but wait a configurable amount of time before actually stopping. Fixes an issue during rollouts on Kubernetes where the Grafana Cloud Gateway holds on to connections to query-frontends even when they shutdown resulting in user-facing read errors. By closing connections during shutdown, marking the component as "not ready", and still continuing to serve requests we ensure that: * Users don't see any disruption * Connections to the stopping component are not pooled * Kubernetes service endpoints are removed before the pod Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
Extend the gRPC HealthCheck struct to allow it to make use of multiple types of health checks such as a services.Manager or atomic.Bool shutdown request. See grafana/mimir#3298 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
55fd8d9
to
2691502
Compare
Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Also the code now is even more clean 👏
) * Add the ability to wait for a period of time after SIGTERM Adds the ability to supply a "shutdown delay" to Mimir components such that they will disable HTTP keep-alives and mark themselves as not ready when receiving SIGTERM or SIGINT (via HTTP /ready or gRPC) but wait a configurable amount of time before actually stopping. Fixes an issue during rollouts on Kubernetes where the Grafana Cloud Gateway holds on to connections to query-frontends even when they shutdown resulting in user-facing read errors. By closing connections during shutdown, marking the component as "not ready", and still continuing to serve requests we ensure that: * Users don't see any disruption * Connections to the stopping component are not pooled * Kubernetes service endpoints are removed before the pod Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * Code review changes. Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * Handle shutdown inline in `Run` goroutine. Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * Phrasing. Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
) * Add the ability to wait for a period of time after SIGTERM Adds the ability to supply a "shutdown delay" to Mimir components such that they will disable HTTP keep-alives and mark themselves as not ready when receiving SIGTERM or SIGINT (via HTTP /ready or gRPC) but wait a configurable amount of time before actually stopping. Fixes an issue during rollouts on Kubernetes where the Grafana Cloud Gateway holds on to connections to query-frontends even when they shutdown resulting in user-facing read errors. By closing connections during shutdown, marking the component as "not ready", and still continuing to serve requests we ensure that: * Users don't see any disruption * Connections to the stopping component are not pooled * Kubernetes service endpoints are removed before the pod Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * Code review changes. Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * Handle shutdown inline in `Run` goroutine. Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * Phrasing. Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
Deprecate ingester ring specific option for delaying shutdown after a SIGTERM. Instead use the `shutdown-delay` option which can be added to any component (it is not ingester or ring specific). See #3298 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
What this PR does
Adds the ability to supply a "shutdown delay" to Mimir components such that they will disable HTTP keep-alives and mark themselves as not ready when receiving SIGTERM or SIGINT (via HTTP /ready or gRPC) but wait a configurable amount of time before actually stopping.
Fixes an issue during rollouts on Kubernetes where the Grafana Cloud Gateway holds on to connections to query-frontends even when they shutdown resulting in user-facing read errors. By closing connections during shutdown, marking the component as "not ready", and still continuing to serve requests we ensure that:
Signed-off-by: Nick Pillitteri nick.pillitteri@grafana.com
Which issue(s) this PR fixes or relates to
N/A
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]