Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure there is no ongoing Pod deletion before downscaling #1534

Merged
merged 6 commits into from
Aug 12, 2019

Conversation

sebgl
Copy link
Contributor

@sebgl sebgl commented Aug 9, 2019

We had a race condition were we would run downscales based on wrong
assumptions, if StatefulSets replicas we work with do not match pods
that are still alive from a previous downscale.

When we update the Replicas of a StatefulSet, it may take a while before
the corresponding pod is actually deleted. Since most of the downscale
logic relies on manipulating StatefulSets, this gives us wrong
assumpations to work with.

For example, we may end up:

  • clearing shards allocation excludes, when a node from which data
    was migrated away is not removed from the cluster yet, effectively
    allowing it to allocate shards again
  • updating zen1 minimum_master_nodes with a wrong value if a previous
    master node deletion is not over yet
  • considering we are in the 2->1 master nodes zen1 situation, but we are
    actually in 3->1 since one master isn't removed yet
  • removing 2 masters nodes at once

We could work with Pods instead of StatefulSets, but the logic would be a bit more complex
Instead, let's just requeue the downscale until pods we expect from StatefulSets spec are there.

To do that, we just compare the list of pods we have with the list of
pods we would expect according to StatefulSets we work with. If there is
a mismatch, we requeue. Seems simpler than any event expectations mechanism.

This also accounts for pods creation mismatch (a pod that was requested
for creation is not there yet), which, I believe, are not that much of a
problem for downscales purpose; but I think it's better to err on the
safe side here (do nothing if creations are not complete).

Fixes #1523.

We had a race condition were we would run downscales based on wrong
assumptions, if StatefulSets replicas we work with do not match pods
that are still alive from a previous downscale.

When we update the Replicas of a StatefulSet, it may take a while before
the corresponding pod is actually deleted. Since most of the downscale
logic relies on manipulating StatefulSets, this gives us wrong
assumpations to work with.

For example, we may end up:
- clearing shards allocation excludes, when a node from which data
was migrated away is not removed from the cluster yet
- update zen1 minimum_master_nodes, but a previous master node deletion
was not done yet
- considering we are in the 2->1 master nodes zen1 situation, but we are
actually in 3->1 since one master isn't removed yet
- removing 2 masters nodes at once, if previous master is not removed
ytet

We could work with Pods instead of
StatefulSets, but the logic would be a bit more complex. Instead, let's
just requeue the downscale until pods we expect from StatefulSets spec
are there.

To do that, we just compare the list of pods we have with the list of
pods we would expect according to StatefulSets we work with. if there is
a mismatch, we requeue.
This also accounts for pods creation mismatch (a pod that was requested
for creation is not there yet), which, I believe, are not that much of a
problem for downscales purpose; but I think it's better to err on the
safe side here (do nothing if creations are not complete).
Copy link
Contributor

@thbkrkr thbkrkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good.
Left a question about the small functions and the naming.

@sebgl sebgl merged commit dbc12c8 into elastic:master Aug 12, 2019
@pebrc pebrc added >bug Something isn't working v1.0.0-beta1 labels Aug 13, 2019
@pebrc pebrc changed the title Make sure there is no ongoing pod deletion before downscaling Make sure there is no ongoing Pod deletion before downscaling Oct 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug Something isn't working v1.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Don't clear shard allocation excludes for pods that are not terminated yet
3 participants