-
Notifications
You must be signed in to change notification settings - Fork 717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cluster can be downscaled while shards are not migrated #3867
Comments
Is the right fix for this to add another predicate to not scale down if there are unassigned shards? It seems like it might be too conservative but I'm not sure. Maybe it just makes sense to filter for unassigned.reason == NODE_LEFT? But I don't know if there's other states where a shard is unassigned and removing a node might cause data loss. |
I think I'm +1 with your suggestion @anyasabo: don't downscale if there is any To be less conservative we could filter on
I am not sure about other unassigned reasons. I suspect for some of them (eg. I think it's best for us to stay on the safe side and don't downscale if there's any unassigned shard. Basically: fix your cluster first (through upscale, upgrade, or index API calls (eg. change replicas)), then we can safely remove some nodes. Similar to how we only move on through a rolling upgrade if the cluster health is green. We can still optimize later if we find valid reasons to have unassigned shards? |
#3867 takes a slightly simpler approach to detect the above:
I think this works regardless of the |
When a Pod is deleted/restarting:
UNASSIGNED
statenode
field is set to an empty stringThe empty string prevents ECK to detect that shards might actually still be assigned to a node which is downscaled (because
shard.NodeName
is empty) :If Pods are deleted (by the controller itself while upgrading as described in #3861) or evicted (K8S node maintenance), ECK might then downscale and delete a set of nodes while they are still hosting some shards.
Note that it is not possible to know to which node "name" a shard was assigned, only its
uid
is stored in the shard metadata:The text was updated successfully, but these errors were encountered: