-
Notifications
You must be signed in to change notification settings - Fork 707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Call _nodes/shutdown from pre-stop hook #6544
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not tested all the combinations but LGTM.
Note that this PR will trigger a cluster restart, we should also update https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-upgrading-eck.html#k8s-beta-to-ga-rolling-restart accordingly.
docs/orchestrating-elastic-stack-applications/elasticsearch/prestop.asciidoc
Outdated
Show resolved
Hide resolved
…estop.asciidoc Co-authored-by: Thibault Richard <thbkrkr@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I did a bit of manual testing and it seems to work as expected.
As you mentioned in a comment the only thing I'm afraid of would be an unexpected behaviour because the Pod's metadata would not be up to date in the client cache. This adds another "best-effort" layer 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Fixes #6478
Adds a more complicated shell script to the pre-stop lifecycle hook. This is covers the uses case described in the referenced issue which affect mostly larger clusters with lots of data.
This is obviously a best effort attempt at orchestrating node shutdown more gracefully. Many things can go wrong here. I anticipate that the most common reason why this logic might fail is on slow or not responding ES clusters where we might run into the overall lifecycle hook timeout.
There is some fragile grep'ing over JSON responses here as well because we don't have
jq
or similar tools available in the Elasticsearch image.