Don't clear shard allocation excludes at every reconciliation #1522

sebgl · 2019-08-08T12:49:04Z

We clear shard allocation excludes at every reconciliation attempt, to make sure we correctly reset it after nodes are downscaled.
We could optimize by only clearing it if not already cleared, similar to what we do with zen1 minimum_master_nodes:

cloud-on-k8s/operators/pkg/controller/elasticsearch/version/zen1/minimum_masters.go

Lines 81 to 84 in c889efc

    
           // Check if we really need to update minimum_master_nodes with an API call 
        
           if minimumMasterNodes == reconcileState.GetZen1MinimumMasterNodes() { 
        
           	return false, nil 
        
           }

Related: #1161.

ximenzaoshi · 2019-10-23T05:34:43Z

It's very strange to clear shard allocation excludes at every reconciliation attempt. I tried to decommission one specific node by setting shard allocation exclude config, but it always be reseted automatically which really confused me. I think we should not change the settings that already exist.
By the way, is there any way to remove a specific node? It seems that we can not do this using statefulset.

sebgl · 2019-10-23T07:09:18Z

@ximenzaoshi a workaround to not have the operator concurrently resetting your cluster settings is to temporarily disable reconciliations. This can be done by setting an annotation on your Elasticsearch resource:
"common.k8s.elastic.co/pause": "true"
When set, ECK will ignore the Elasticsearch cluster. Don't forget to remove it when you're done with any temporary manual operation.

By the way, is there any way to remove a specific node? It seems that we can not do this using statefulset.

Indeed, it's not straightforward. Can you explain a bit more your use case?
Depending on what you are trying to achieve, you could:

downscale the NodeSet (eg. 2 nodes instead of 3)
drain the corresponding Kubernetes node, which will trigger a rescheduling of the Pods that are on it. Provided your Elasticsearch cluster is in a green health, Elasticsearch Pods should be safely removed one by one on that Kubernetes node.
cordon the corresponding node to make it unschedulable, then apply a modification on the corresponding NodeSet (eg. an annotation or label), so that all Pods of this NodeSet get rotated. Since the k8s node is unschedulable the Pod you wanted to move should be scheduled on another k8s node. Unless affinity or PV constraints prevent it.

ximenzaoshi · 2019-10-23T08:30:14Z

Thanks for your reply! We have two ES node on one host and we want to move one of them away, as the host load is high. I can have a try with your method, thanks.
Unless affinity or PV constraints prevent it.
Yes..we use local storage which makes the operation more difficult. --

sebgl added the >enhancement Enhancement of existing functionality label Aug 8, 2019

sebgl mentioned this issue Aug 8, 2019

Refactor downscales and add unit tests #1506

Merged

sebgl mentioned this issue Oct 31, 2019

Halt ML jobs before node restart #2005

Closed

sebgl mentioned this issue Feb 24, 2020

Don't reset voting config exclusions at every single reconciliation #2605

Closed

sebgl self-assigned this Feb 24, 2020

sebgl mentioned this issue Mar 2, 2020

Do not request ES to clear routing allocation exclude at every reconciliation #2610

Merged

sebgl closed this as completed in #2610 Mar 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't clear shard allocation excludes at every reconciliation #1522

Don't clear shard allocation excludes at every reconciliation #1522

sebgl commented Aug 8, 2019

ximenzaoshi commented Oct 23, 2019

sebgl commented Oct 23, 2019

ximenzaoshi commented Oct 23, 2019 •

edited

Loading

Don't clear shard allocation excludes at every reconciliation #1522

Don't clear shard allocation excludes at every reconciliation #1522

Comments

sebgl commented Aug 8, 2019

ximenzaoshi commented Oct 23, 2019

sebgl commented Oct 23, 2019

ximenzaoshi commented Oct 23, 2019 • edited Loading

ximenzaoshi commented Oct 23, 2019 •

edited

Loading