Make sure there is no ongoing Pod deletion before downscaling #1534

sebgl · 2019-08-09T10:06:46Z

We had a race condition were we would run downscales based on wrong
assumptions, if StatefulSets replicas we work with do not match pods
that are still alive from a previous downscale.

When we update the Replicas of a StatefulSet, it may take a while before
the corresponding pod is actually deleted. Since most of the downscale
logic relies on manipulating StatefulSets, this gives us wrong
assumpations to work with.

For example, we may end up:

clearing shards allocation excludes, when a node from which data
was migrated away is not removed from the cluster yet, effectively
allowing it to allocate shards again
updating zen1 minimum_master_nodes with a wrong value if a previous
master node deletion is not over yet
considering we are in the 2->1 master nodes zen1 situation, but we are
actually in 3->1 since one master isn't removed yet
removing 2 masters nodes at once

We could work with Pods instead of StatefulSets, but the logic would be a bit more complex
Instead, let's just requeue the downscale until pods we expect from StatefulSets spec are there.

To do that, we just compare the list of pods we have with the list of
pods we would expect according to StatefulSets we work with. If there is
a mismatch, we requeue. Seems simpler than any event expectations mechanism.

This also accounts for pods creation mismatch (a pod that was requested
for creation is not there yet), which, I believe, are not that much of a
problem for downscales purpose; but I think it's better to err on the
safe side here (do nothing if creations are not complete).

Fixes #1523.

We had a race condition were we would run downscales based on wrong assumptions, if StatefulSets replicas we work with do not match pods that are still alive from a previous downscale. When we update the Replicas of a StatefulSet, it may take a while before the corresponding pod is actually deleted. Since most of the downscale logic relies on manipulating StatefulSets, this gives us wrong assumpations to work with. For example, we may end up: - clearing shards allocation excludes, when a node from which data was migrated away is not removed from the cluster yet - update zen1 minimum_master_nodes, but a previous master node deletion was not done yet - considering we are in the 2->1 master nodes zen1 situation, but we are actually in 3->1 since one master isn't removed yet - removing 2 masters nodes at once, if previous master is not removed ytet We could work with Pods instead of StatefulSets, but the logic would be a bit more complex. Instead, let's just requeue the downscale until pods we expect from StatefulSets spec are there. To do that, we just compare the list of pods we have with the list of pods we would expect according to StatefulSets we work with. if there is a mismatch, we requeue. This also accounts for pods creation mismatch (a pod that was requested for creation is not there yet), which, I believe, are not that much of a problem for downscales purpose; but I think it's better to err on the safe side here (do nothing if creations are not complete).

thbkrkr

Looks pretty good.
Left a question about the small functions and the naming.

operators/pkg/controller/elasticsearch/sset/list.go

operators/pkg/controller/elasticsearch/driver/downscale.go

operators/pkg/controller/elasticsearch/sset/list.go

thbkrkr reviewed Aug 9, 2019

View reviewed changes

operators/pkg/controller/elasticsearch/sset/list.go Outdated Show resolved Hide resolved

operators/pkg/controller/elasticsearch/driver/downscale.go Show resolved Hide resolved

operators/pkg/controller/elasticsearch/sset/list.go Outdated Show resolved Hide resolved

sebgl added 4 commits August 12, 2019 09:13

Fix function name mismatch in comments

b36bda2

Rename MatchActualPods to PodReconciliationDone

8fcae17

Merge branch 'master' into check-sset-pod-match-before-downscale

3eecd51

Fix voting exclusions test to requeue on missing pods for simplicity

cc40381

pebrc approved these changes Aug 12, 2019

View reviewed changes

operators/pkg/controller/elasticsearch/sset/list.go Outdated Show resolved Hide resolved

Add debug log for pod reconciliation not done

8895486

sebgl merged commit dbc12c8 into elastic:master Aug 12, 2019

pebrc added >bug Something isn't working v1.0.0-beta1 labels Aug 13, 2019

pebrc changed the title ~~Make sure there is no ongoing pod deletion before downscaling~~ Make sure there is no ongoing Pod deletion before downscaling Oct 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make sure there is no ongoing Pod deletion before downscaling #1534

Make sure there is no ongoing Pod deletion before downscaling #1534

sebgl commented Aug 9, 2019 •

edited

Loading

thbkrkr left a comment

Make sure there is no ongoing Pod deletion before downscaling #1534

Make sure there is no ongoing Pod deletion before downscaling #1534

Conversation

sebgl commented Aug 9, 2019 • edited Loading

thbkrkr left a comment

Choose a reason for hiding this comment

sebgl commented Aug 9, 2019 •

edited

Loading