Cluster can be downscaled while shards are not migrated #3867

barkbay · 2020-10-22T06:43:37Z

When a Pod is deleted/restarting:

the shards on the node enter the UNASSIGNED state
the node field is set to an empty string

{
		"index": "logs-191998",
		"shard": "4",
		"state": "UNASSIGNED",
		"node": "",
		"prirep": "p"
}

The empty string prevents ECK to detect that shards might actually still be assigned to a node which is downscaled (because shard.NodeName is empty) :

// NodeHasShard returns true if the given ES Pod is holding at least one shard (primary or replica).
func NodeHasShard(ctx context.Context, shardLister esclient.ShardLister, podName string) (bool, error) {
	[...]
	// filter shards affected by node removal
	for _, shard := range shards {
		if shard.NodeName == podName { // <--- Here
			return true, nil
		}
	}
	return false, nil
}

If Pods are deleted (by the controller itself while upgrading as described in #3861) or evicted (K8S node maintenance), ECK might then downscale and delete a set of nodes while they are still hosting some shards.

Note that it is not possible to know to which node "name" a shard was assigned, only its uid is stored in the shard metadata:

{
	"shard": "0",
	"prirep": "p",
	"state": "UNASSIGNED",
	"unassigned.reason": "NODE_LEFT",
	"node": null,
	"id": null,
	"ud": "node_left [fMeKGb_PSi-DIGR_qPTgQQ]"
}

The text was updated successfully, but these errors were encountered:

anyasabo · 2020-10-22T14:20:25Z

Is the right fix for this to add another predicate to not scale down if there are unassigned shards? It seems like it might be too conservative but I'm not sure. Maybe it just makes sense to filter for unassigned.reason == NODE_LEFT? But I don't know if there's other states where a shard is unassigned and removing a node might cause data loss.

sebgl · 2020-10-26T09:14:17Z

I think I'm +1 with your suggestion @anyasabo: don't downscale if there is any UNASSIGNED shard.

To be less conservative we could filter on NODE_LEFT, keep a node name -> node id mapping in memory and parse the node ID from unassigned.reason, but I feel like that's not needed. It feels also a bit unreliable since there's no schema guarantee on that API.
I cannot see a valid case where users would like to have some unassigned shards with NODE_LEFT. I think it could happen basically for 2 reasons:

the node did really left the cluster, on purpose. In which case it has gone through our downscale mechanism, so it should not have shards anymore. In other terms: this case should normally never happen?
the node is temporarily down for some reasons. For example because it has been restarted/upgraded, or because it failed for some reasons. In that case I think the way we move forward in other parts of the reconciliation has always been "the user needs to fix the manifest first so the node comes back into the cluster, then we can move on with the regular node reconciliation".

I am not sure about other unassigned reasons. I suspect for some of them (eg. REINITIALIZED) our predicate on the .node field already ensures we don't remove the wrong node, but I'm not sure.

I think it's best for us to stay on the safe side and don't downscale if there's any unassigned shard. Basically: fix your cluster first (through upscale, upgrade, or index API calls (eg. change replicas)), then we can safely remove some nodes. Similar to how we only move on through a rolling upgrade if the cluster health is green.

We can still optimize later if we find valid reasons to have unassigned shards?

sebgl · 2020-10-28T14:46:35Z

#3867 takes a slightly simpler approach to detect the above:

don't allow any downscale if some shards have no node assigned (.node == "")
don't allows downscaling a node if some shards are assigned to it (.node == <node>)

I think this works regardless of the state or unassigned.reason.

barkbay added the >bug Something isn't working label Oct 22, 2020

barkbay mentioned this issue Oct 22, 2020

When scaling down and changing container resources, data is not migrated #3861

Closed

barkbay changed the title ~~Cluster might be downscaled while shards are not migrated~~ Cluster can be downscaled while shards are not migrated Oct 22, 2020

david-kow added the v1.3.0 label Oct 26, 2020

sebgl mentioned this issue Oct 28, 2020

Don't allow downscales if some shards are unassigned #3883

Merged

sebgl closed this as completed in #3883 Oct 29, 2020

mrkm4ntr mentioned this issue Jun 7, 2024

Allow unassigned nodes by REPLICA_ADDED during shutdown #7874

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cluster can be downscaled while shards are not migrated #3867

Cluster can be downscaled while shards are not migrated #3867

barkbay commented Oct 22, 2020

anyasabo commented Oct 22, 2020

sebgl commented Oct 26, 2020 •

edited

Loading

sebgl commented Oct 28, 2020

Cluster can be downscaled while shards are not migrated #3867

Cluster can be downscaled while shards are not migrated #3867

Comments

barkbay commented Oct 22, 2020

anyasabo commented Oct 22, 2020

sebgl commented Oct 26, 2020 • edited Loading

sebgl commented Oct 28, 2020

sebgl commented Oct 26, 2020 •

edited

Loading