Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Halt ML jobs before node restart #2005

Closed
anyasabo opened this issue Oct 16, 2019 · 4 comments
Closed

Halt ML jobs before node restart #2005

anyasabo opened this issue Oct 16, 2019 · 4 comments
Labels
>enhancement Enhancement of existing functionality

Comments

@anyasabo
Copy link
Contributor

Opening an issue for the outstanding TODO to halt ML jobs before we restart a node here. This is described in the official rolling upgrade ES docs:
https://www.elastic.co/guide/en/elasticsearch/reference/current/rolling-upgrades.html

@anyasabo anyasabo added the >enhancement Enhancement of existing functionality label Oct 16, 2019
@racevedoo
Copy link
Contributor

racevedoo commented Oct 30, 2019

Hey @anyasabo, I can try to take on this one. So basically we should call the POST _ml/set_upgrade_mode?enabled=true, and then, at the end, we call POST _ml/set_upgrade_mode?enabled=false.

Did I miss anything?

@anyasabo
Copy link
Contributor Author

Yep that's my understanding @racevedoo

@sebgl
Copy link
Contributor

sebgl commented Oct 31, 2019

This is not on easy one, we may want to discuss it further first. Some thoughts:

  • Disabling ML jobs is optional. If not done, jobs will simply be spawned on another node. This might be the behaviour we want in most/some cases? I don't know enough about how ML works to decide what's best.
  • We may want the user to configure this (should we stop ML jobs during upgrade? yes/no).
  • Stopping the ML jobs is easy since we know when we have a rolling upgrade to perform. Re-enabling it is a lot harder. Basically we don't know when a rolling upgrade is over (was performed before). An approach would be to call POST _ml/set_upgrade_mode?enabled=false at every single reconciliation, which feels wrong. See Don't clear shard allocation excludes at every reconciliation #1522 where we discuss a similar problem.

@pebrc
Copy link
Collaborator

pebrc commented Sep 14, 2021

Closing as this should be covered by the Node Shutdown API #4597

@pebrc pebrc closed this as completed Sep 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement Enhancement of existing functionality
Projects
None yet
Development

No branches or pull requests

4 participants