Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Call _nodes/shutdown from pre-stop hook #6544

Merged
merged 18 commits into from
Mar 22, 2023
Merged

Conversation

pebrc
Copy link
Collaborator

@pebrc pebrc commented Mar 17, 2023

Fixes #6478

Adds a more complicated shell script to the pre-stop lifecycle hook. This is covers the uses case described in the referenced issue which affect mostly larger clusters with lots of data.

This is obviously a best effort attempt at orchestrating node shutdown more gracefully. Many things can go wrong here. I anticipate that the most common reason why this logic might fail is on slow or not responding ES clusters where we might run into the overall lifecycle hook timeout.

There is some fragile grep'ing over JSON responses here as well because we don't have jq or similar tools available in the Elasticsearch image.

@botelastic botelastic bot added the triage label Mar 17, 2023
@pebrc pebrc added the >enhancement Enhancement of existing functionality label Mar 18, 2023
@botelastic botelastic bot removed the triage label Mar 18, 2023
@botelastic botelastic bot removed the triage label Mar 18, 2023
@barkbay barkbay self-assigned this Mar 20, 2023
Copy link
Contributor

@barkbay barkbay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not tested all the combinations but LGTM.

Note that this PR will trigger a cluster restart, we should also update https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-upgrading-eck.html#k8s-beta-to-ga-rolling-restart accordingly.

pkg/controller/elasticsearch/nodespec/lifecycle_hook.go Outdated Show resolved Hide resolved
pkg/controller/elasticsearch/nodespec/lifecycle_hook.go Outdated Show resolved Hide resolved
Copy link
Contributor

@barkbay barkbay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I did a bit of manual testing and it seems to work as expected.

As you mentioned in a comment the only thing I'm afraid of would be an unexpected behaviour because the Pod's metadata would not be up to date in the client cache. This adds another "best-effort" layer 😄

pkg/controller/elasticsearch/nodespec/lifecycle_hook.go Outdated Show resolved Hide resolved
@pebrc pebrc requested a review from thbkrkr March 21, 2023 16:52
Copy link
Contributor

@thbkrkr thbkrkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

pkg/controller/elasticsearch/nodespec/lifecycle_hook.go Outdated Show resolved Hide resolved
pkg/controller/elasticsearch/nodespec/lifecycle_hook.go Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement Enhancement of existing functionality v2.8.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consider calling _node/shutdown from a pre-stop hook
3 participants