Prevent allocating shards to broken nodes #18417

ywelsch · 2016-05-17T16:00:32Z

Allocating shards to a node can fail for various reasons. When an allocation fails, we currently ignore the node for that shard during the next allocation round. However, this means that:

subsequent rounds consider the node for allocating the shard again.
other shards are still allocated to the node (in particular the balancer tries to put shards on that node with the failed shard as its weight becomes smaller).
This is particularly bad if the node is permanently broken, leading to a never-ending series of failed allocations. Ultimately this affects the stability of the cluster.

s1monw · 2016-05-17T18:38:03Z

@ywelsch I think we can approach this from multiple directions.

we can start bottom up and check that a data-path is writeable before we allocate a shard and skip it if possible (that would help if someone looses a disk and has multiple)
we can also has a simple allocation_failed counter on UnassignedInfo to prevent endless allocation of a potentially broken index (metadata / settings / whatever is broken)
we might also be able to use a simple counter of failed allocations per node that we can reset once we had a successful one on that node. We can then also have a simple allocation decider that throttles that node or takes it out of the loop entirely once the counter goes beyond a threshold?

I think in all of these cases simplicity wins over complex state... my $0.05

Today if a shard fails during initialization phase due to misconfiguration, broken disks, missing analyzers, not installed plugins etc. elasticsaerch keeps on trying to initialize or rather allocate that shard. Yet, in the worst case scenario this ends in an endless allocation loop. To prevent this loop and all it's sideeffects like spamming log files over and over again this commit adds an allocation decider that stops allocating a shard that failed more than N times in a row to allocate. The number or retries can be configured via `index.allocation.max_retry` and it's default is set to `5`. Once the setting is updated shards with less failures than the number set per index will be allowed to allocate again. Internally we maintain a counter on the UnassignedInfo that is reset to `0` once the shards has been started. Relates to elastic#18417

Today if a shard fails during initialization phase due to misconfiguration, broken disks, missing analyzers, not installed plugins etc. elasticsaerch keeps on trying to initialize or rather allocate that shard. Yet, in the worst case scenario this ends in an endless allocation loop. To prevent this loop and all it's sideeffects like spamming log files over and over again this commit adds an allocation decider that stops allocating a shard that failed more than N times in a row to allocate. The number or retries can be configured via `index.allocation.max_retry` and it's default is set to `5`. Once the setting is updated shards with less failures than the number set per index will be allowed to allocate again. Internally we maintain a counter on the UnassignedInfo that is reset to `0` once the shards has been started. Relates to #18417

gmoskovicz · 2017-08-21T14:59:47Z

@ywelsch @s1monw is there are news on this?

Some OSs would cause the mounted disk to be read-only and if so the entire cluster will have issues with RED shards and not moving shards. Perhaps this could help in that end?

elasticmachine · 2018-03-15T14:00:04Z

Pinging @elastic/es-distributed

bleskes · 2018-03-21T13:59:18Z

We have another, non trivial, of instance of this in shard fetching. When it hard fails on a node (rather then succeeding by finding a broking copy) we currently redo the fetching. This is an easy way around networking issue but can be poisonous on disk failures (for example).

idegtiarenko · 2022-07-28T13:10:48Z

We would rather remove the broken node from the cluster rather then take an fail allocation(s).

ywelsch added >bug discuss :Allocation labels May 17, 2016

s1monw mentioned this issue May 19, 2016

Limit retries of failed allocations per index #18467

Merged

clintongormley removed the discuss label May 20, 2016

jasontedor mentioned this issue Aug 3, 2016

Handling disk and file system permission issues on new index creation #19789

Closed

abeyad mentioned this issue Aug 8, 2016

Memory leak when shard fails to allocate on missing synonym dictionary #19879

Closed

clintongormley mentioned this issue Aug 31, 2016

corrupted disk should be removed automatically if index couldn't be written in for a long time #20245

Closed

MorrieAtElastic mentioned this issue Jul 7, 2017

[DOCS] Document cluster behavior when a file system crashes but node remains operational #25591

Closed

lcawl added :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. and removed :Allocation labels Feb 13, 2018

DaveCTurner added the :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) label Mar 15, 2018

DaveCTurner removed the :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. label Mar 15, 2018

bleskes mentioned this issue Mar 16, 2018

Hot swappable path.data disks #18279

Closed

DaveCTurner mentioned this issue Mar 22, 2019

Fault detection ping doesn't check for disk health #40326

Closed

DaveCTurner mentioned this issue Aug 7, 2019

Improve handling of readonly filesystems #45286

Closed

3 tasks

DaveCTurner mentioned this issue Sep 24, 2019

No timeouts in co-ordinator to primary and primary to replica interactions during indexing #46994

Closed

dnhatn mentioned this issue Oct 29, 2019

Do not cancel ongoing recovery for noop copy on broken node #48265

Merged

rjernst added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label May 4, 2020

idegtiarenko closed this as completed Jul 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent allocating shards to broken nodes #18417

Prevent allocating shards to broken nodes #18417

ywelsch commented May 17, 2016

s1monw commented May 17, 2016

gmoskovicz commented Aug 21, 2017

elasticmachine commented Mar 15, 2018

bleskes commented Mar 21, 2018

idegtiarenko commented Jul 28, 2022

Prevent allocating shards to broken nodes #18417

Prevent allocating shards to broken nodes #18417

Comments

ywelsch commented May 17, 2016

s1monw commented May 17, 2016

gmoskovicz commented Aug 21, 2017

elasticmachine commented Mar 15, 2018

bleskes commented Mar 21, 2018

idegtiarenko commented Jul 28, 2022