Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't allow downscales if some shards are unassigned #3883

Merged
merged 2 commits into from
Oct 29, 2020

Conversation

sebgl
Copy link
Contributor

@sebgl sebgl commented Oct 28, 2020

In some conditions, for example when a Pod gets killed/restarted right
before a downscale happens, the shards on that Pod is reported as UNASSIGNED.
At this point it is dangerous to definitely remove the Pod (downscale the cluster)
since we can't know for sure the Pod to remove isn't supposed to hold any of the
unassigned shards.

To avoid that situation, this commit disallows any downscale to happen if
some of the shards don't have a node assigned to them (regardless of their status -
unassigned or not). The logic to allow a node to be downscaled is rather simple:

  • all shards must have a node assigned to them
  • the pod to remove must not have a shard assigned to it

This is a rather conservative/safe approach that could be optimized in the future.

I tried implementing an e2e test for this, but it's a bit tricky to setup the right way
so it consistently fails without this commit. I'm considering the unit test is good enough.

Fixes #3867.

In some conditions, for example when a Pod gets killed/restarted right
before a downscale happens, the shards on that Pod is reported as UNASSIGNED.
At this point it is dangerous to definitely remove the Pod (downscale the cluster)
since we can't know for sure the Pod to remove isn't supposed to hold any of the
unassigned shards.

To avoid that situation, this commit disallows any downscale to happen if
some of the shards don't have a node assigned to them (regardless of their status -
unassigned or not). The logic to allow a node to be downscaled is rather simple:
- all shards must have a node assigned to them
- the pod to remove must not have a shard assigned to it

This is a rather conservative/safe approach that could be optimized in the future.

I tried implementing an e2e test for this, but it's a bit tricky to setup the right way
so it consistently fails without this commit. I'm considering the unit test is good enough.
@sebgl sebgl added >bug Something isn't working v1.3.0 labels Oct 28, 2020
Co-authored-by: Anya Sabo <anya@sabolicio.us>
@sebgl
Copy link
Contributor Author

sebgl commented Oct 29, 2020

run full pr build

@sebgl sebgl merged commit 82b4762 into elastic:master Oct 29, 2020
sebgl added a commit to sebgl/cloud-on-k8s that referenced this pull request Oct 30, 2020
* Don't allow downscales if some shards are unassigned

In some conditions, for example when a Pod gets killed/restarted right
before a downscale happens, the shards on that Pod is reported as UNASSIGNED.
At this point it is dangerous to definitely remove the Pod (downscale the cluster)
since we can't know for sure the Pod to remove isn't supposed to hold any of the
unassigned shards.

To avoid that situation, this commit disallows any downscale to happen if
some of the shards don't have a node assigned to them (regardless of their status -
unassigned or not). The logic to allow a node to be downscaled is rather simple:
- all shards must have a node assigned to them
- the pod to remove must not have a shard assigned to it

This is a rather conservative/safe approach that could be optimized in the future.

I tried implementing an e2e test for this, but it's a bit tricky to setup the right way
so it consistently fails without this commit. I'm considering the unit test is good enough.

* Improve comments

Co-authored-by: Anya Sabo <anya@sabolicio.us>

Co-authored-by: Anya Sabo <anya@sabolicio.us>
sebgl added a commit that referenced this pull request Oct 30, 2020
* Don't allow downscales if some shards are unassigned

In some conditions, for example when a Pod gets killed/restarted right
before a downscale happens, the shards on that Pod is reported as UNASSIGNED.
At this point it is dangerous to definitely remove the Pod (downscale the cluster)
since we can't know for sure the Pod to remove isn't supposed to hold any of the
unassigned shards.

To avoid that situation, this commit disallows any downscale to happen if
some of the shards don't have a node assigned to them (regardless of their status -
unassigned or not). The logic to allow a node to be downscaled is rather simple:
- all shards must have a node assigned to them
- the pod to remove must not have a shard assigned to it

This is a rather conservative/safe approach that could be optimized in the future.

I tried implementing an e2e test for this, but it's a bit tricky to setup the right way
so it consistently fails without this commit. I'm considering the unit test is good enough.

* Improve comments

Co-authored-by: Anya Sabo <anya@sabolicio.us>

Co-authored-by: Anya Sabo <anya@sabolicio.us>

Co-authored-by: Anya Sabo <anya@sabolicio.us>
@thbkrkr thbkrkr added the v1.4.0 label Nov 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug Something isn't working v1.3.0 v1.4.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cluster can be downscaled while shards are not migrated
3 participants