Don't allow downscales if some shards are unassigned #3883

sebgl · 2020-10-28T14:42:11Z

In some conditions, for example when a Pod gets killed/restarted right
before a downscale happens, the shards on that Pod is reported as UNASSIGNED.
At this point it is dangerous to definitely remove the Pod (downscale the cluster)
since we can't know for sure the Pod to remove isn't supposed to hold any of the
unassigned shards.

To avoid that situation, this commit disallows any downscale to happen if
some of the shards don't have a node assigned to them (regardless of their status -
unassigned or not). The logic to allow a node to be downscaled is rather simple:

all shards must have a node assigned to them
the pod to remove must not have a shard assigned to it

This is a rather conservative/safe approach that could be optimized in the future.

I tried implementing an e2e test for this, but it's a bit tricky to setup the right way
so it consistently fails without this commit. I'm considering the unit test is good enough.

Fixes #3867.

In some conditions, for example when a Pod gets killed/restarted right before a downscale happens, the shards on that Pod is reported as UNASSIGNED. At this point it is dangerous to definitely remove the Pod (downscale the cluster) since we can't know for sure the Pod to remove isn't supposed to hold any of the unassigned shards. To avoid that situation, this commit disallows any downscale to happen if some of the shards don't have a node assigned to them (regardless of their status - unassigned or not). The logic to allow a node to be downscaled is rather simple: - all shards must have a node assigned to them - the pod to remove must not have a shard assigned to it This is a rather conservative/safe approach that could be optimized in the future. I tried implementing an e2e test for this, but it's a bit tricky to setup the right way so it consistently fails without this commit. I'm considering the unit test is good enough.

pkg/controller/elasticsearch/migration/migrate_data.go

Co-authored-by: Anya Sabo <anya@sabolicio.us>

sebgl · 2020-10-29T09:10:40Z

run full pr build

* Don't allow downscales if some shards are unassigned In some conditions, for example when a Pod gets killed/restarted right before a downscale happens, the shards on that Pod is reported as UNASSIGNED. At this point it is dangerous to definitely remove the Pod (downscale the cluster) since we can't know for sure the Pod to remove isn't supposed to hold any of the unassigned shards. To avoid that situation, this commit disallows any downscale to happen if some of the shards don't have a node assigned to them (regardless of their status - unassigned or not). The logic to allow a node to be downscaled is rather simple: - all shards must have a node assigned to them - the pod to remove must not have a shard assigned to it This is a rather conservative/safe approach that could be optimized in the future. I tried implementing an e2e test for this, but it's a bit tricky to setup the right way so it consistently fails without this commit. I'm considering the unit test is good enough. * Improve comments Co-authored-by: Anya Sabo <anya@sabolicio.us> Co-authored-by: Anya Sabo <anya@sabolicio.us>

* Don't allow downscales if some shards are unassigned In some conditions, for example when a Pod gets killed/restarted right before a downscale happens, the shards on that Pod is reported as UNASSIGNED. At this point it is dangerous to definitely remove the Pod (downscale the cluster) since we can't know for sure the Pod to remove isn't supposed to hold any of the unassigned shards. To avoid that situation, this commit disallows any downscale to happen if some of the shards don't have a node assigned to them (regardless of their status - unassigned or not). The logic to allow a node to be downscaled is rather simple: - all shards must have a node assigned to them - the pod to remove must not have a shard assigned to it This is a rather conservative/safe approach that could be optimized in the future. I tried implementing an e2e test for this, but it's a bit tricky to setup the right way so it consistently fails without this commit. I'm considering the unit test is good enough. * Improve comments Co-authored-by: Anya Sabo <anya@sabolicio.us> Co-authored-by: Anya Sabo <anya@sabolicio.us> Co-authored-by: Anya Sabo <anya@sabolicio.us>

sebgl added >bug Something isn't working v1.3.0 labels Oct 28, 2020

anyasabo reviewed Oct 28, 2020

View reviewed changes

pkg/controller/elasticsearch/migration/migrate_data.go Outdated Show resolved Hide resolved

anyasabo approved these changes Oct 28, 2020

View reviewed changes

Improve comments

8d674bd

Co-authored-by: Anya Sabo <anya@sabolicio.us>

sebgl merged commit 82b4762 into elastic:master Oct 29, 2020

sebgl mentioned this pull request Oct 30, 2020

Backport on 1.3: Don't allow downscales if some shards are unassigned #3887

Merged

thbkrkr added the v1.4.0 label Nov 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't allow downscales if some shards are unassigned #3883

Don't allow downscales if some shards are unassigned #3883

sebgl commented Oct 28, 2020

sebgl commented Oct 29, 2020

Don't allow downscales if some shards are unassigned #3883

Don't allow downscales if some shards are unassigned #3883

Conversation

sebgl commented Oct 28, 2020

sebgl commented Oct 29, 2020