Halt ML jobs before node restart #2005

anyasabo · 2019-10-16T21:31:56Z

Opening an issue for the outstanding TODO to halt ML jobs before we restart a node here. This is described in the official rolling upgrade ES docs:
https://www.elastic.co/guide/en/elasticsearch/reference/current/rolling-upgrades.html

racevedoo · 2019-10-30T21:53:34Z

Hey @anyasabo, I can try to take on this one. So basically we should call the POST _ml/set_upgrade_mode?enabled=true, and then, at the end, we call POST _ml/set_upgrade_mode?enabled=false.

Did I miss anything?

anyasabo · 2019-10-30T22:07:29Z

Yep that's my understanding @racevedoo

sebgl · 2019-10-31T09:43:39Z

This is not on easy one, we may want to discuss it further first. Some thoughts:

Disabling ML jobs is optional. If not done, jobs will simply be spawned on another node. This might be the behaviour we want in most/some cases? I don't know enough about how ML works to decide what's best.
We may want the user to configure this (should we stop ML jobs during upgrade? yes/no).
Stopping the ML jobs is easy since we know when we have a rolling upgrade to perform. Re-enabling it is a lot harder. Basically we don't know when a rolling upgrade is over (was performed before). An approach would be to call POST _ml/set_upgrade_mode?enabled=false at every single reconciliation, which feels wrong. See Don't clear shard allocation excludes at every reconciliation #1522 where we discuss a similar problem.

pebrc · 2021-09-14T11:48:40Z

Closing as this should be covered by the Node Shutdown API #4597

anyasabo added the >enhancement Enhancement of existing functionality label Oct 16, 2019

pebrc closed this as completed Sep 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Halt ML jobs before node restart #2005

Halt ML jobs before node restart #2005

anyasabo commented Oct 16, 2019

racevedoo commented Oct 30, 2019 •

edited

Loading

anyasabo commented Oct 30, 2019

sebgl commented Oct 31, 2019 •

edited

Loading

pebrc commented Sep 14, 2021

Halt ML jobs before node restart #2005

Halt ML jobs before node restart #2005

Comments

anyasabo commented Oct 16, 2019

racevedoo commented Oct 30, 2019 • edited Loading

anyasabo commented Oct 30, 2019

sebgl commented Oct 31, 2019 • edited Loading

pebrc commented Sep 14, 2021

racevedoo commented Oct 30, 2019 •

edited

Loading

sebgl commented Oct 31, 2019 •

edited

Loading