-
Notifications
You must be signed in to change notification settings - Fork 532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add prepare-shutdown endpoint to ingesters for down scaling #4718
Conversation
397201e
to
1909070
Compare
This change adds a new HTTP endpoint to ingesters that changes their in-memory configuration (with an on-disk backup) such that they will: * Unregister from the ring when they stop * Flush all in-memory data to object storage when they stop This differs from the shutdown endpoint because it does not actually stop the ingesters, just modifies their configuration in preparation for being permanently stopped. This is a requirement of using the rollout-operator for gracefully scaling down ingesters in Kubernetes. See grafana/rollout-operator#47 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
1909070
to
67337cc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ShutdownMarker should export a metric, so we can add an alert if for some reason there's an ingester continuously running with a shutdown marker (because as I understand, the marker is only removed when we remove the PVC, so if the sts isn't shut down after all (because we upscaled quickly?) it will continue shutting down between restarts.
Which makes me think: do we need a revert mechanism?
I'll add a metric. We could build a "revert" mechanism but we'd have to add that to the rollout operator as well. Let's avoid that until we need it. |
Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
Done while adding support for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. I think the direction here is fine, but I'd like to see the PR simplified before merging.
Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
### Prepare for Shutdown | ||
|
||
``` | ||
GET,POST /ingester/prepare-shutdown |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's mention DELETE too and explain what different methods do on this endpoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for addressing my feedback!
Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
Forgot to update the list of verbs supported in #4718 after review feedback. Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
Forgot to update the list of verbs supported in #4718 after review feedback. Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! 👏
**What this PR does / why we need it**: This updates the PrepareShutdown method so it supports GET and DELETE methods as well. This makes it similar to Mimir: grafana/mimir#4718. The status is now stored in a local file. A new config setting had to be added for this file as there is no obvious place to store it. **Checklist** - [X] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (**required**) - [X] Documentation added - [X] Tests updated - [x] `CHANGELOG.md` updated - [x] Changes that require user attention or interaction to upgrade are documented in `docs/sources/upgrading/_index.md` --------- Signed-off-by: Michel Hollands <michel.hollands@grafana.com> Co-authored-by: Dylan Guedes <djmgguedes@gmail.com>
What this PR does
This change adds a new HTTP endpoint to ingesters that changes their in-memory configuration (with an on-disk backup) such that they will:
This differs from the shutdown endpoint because it does not actually stop the ingesters, just modifies their configuration in preparation for being permanently stopped. This is a requirement of using the rollout-operator for gracefully scaling down ingesters in Kubernetes.
Which issue(s) this PR fixes or relates to
See grafana/rollout-operator#47
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]