Perform forced rolling upgrade even if ES is reachable #2022

sebgl · 2019-10-18T13:12:41Z

There are cases where Elasticsearch is reachable
(some Pods are Ready), but cannot respond to any requests.
For example, if there is 1/2 master nodes available. See
#1847. In such case,
the bootlooping/pending 2nd master node will stay stuck forever since we
will never reach the force upgrade part of the reconciliation.

This commit fixes it by running force upgrades (if required) right after
the upscale/spec change phase. This force upgrade phase becomes the new
"Step 2". Following steps (downscale and regular upgrade) require the
Elasticsearch cluster to be reachable.

Due to how this force rolling upgrade deletes some pods and set some
expectations, I chose to requeue immediately if it was attempted. This
way we don't continue the reconciliation based on a transient state
that would require us re-checking expectations. The next reconciliation
can be a "regular" one.

I think this also tends to simplify a bit the general logic: we first do
everything that does not require the ES API (steps 1 and 2), then move
on with downscales and standard rolling upgrades if ES is reachable
(steps 3 and 4); instead of passing an esReachable bool around.

The PR also modifies the existing force upgrade E2E test so that it covers
the case where Elasticsearch can be reached, but responds with 503.
The new version of the test does not pass with the master branch, but
does pass with this branch.

Fixes #1847.

There are cases where Elasticsearch is reachable (some Pods are Ready), but cannot respond to any requests. For example, if there is 1/2 master nodes available. See elastic#1847. In such case, the bootlooping/pending 2nd master node will stay stuck forever since we will never reach the force upgrade part of the reconciliation. This commit fixes it by running force upgrades (if required) right after the upscale/spec change phase. This force upgrade phase becomes the new "Step 2". Following steps (downscale and regular upgrade) require the Elasticsearch cluster to be reachable. Due to how this force rolling upgrade deletes some pods and set some expectations, I chose to requeue immediately if it was attempted. This way we don't continue the reconciliation based on a transient state that would require us re-checking expectations. The next reconciliation can be a "regular" one. I think this also tends to simplify a bit the general logic: we first do everything that does not require the ES API (steps 1 and 2), then move on with downscales and standard rolling upgrades if ES is reachable (steps 3 and 4); instead of passing an `esReachable` bool around.

pkg/controller/elasticsearch/driver/upgrade_forced.go

pebrc

LGTM!

pkg/controller/elasticsearch/driver/nodes.go

sebgl added 2 commits October 18, 2019 10:05

Modify e2e test to cover the es reachable case

0e63149

sebgl added >enhancement Enhancement of existing functionality v1.0.0 labels Oct 18, 2019

thbkrkr reviewed Oct 22, 2019

View reviewed changes

pkg/controller/elasticsearch/driver/upgrade_forced.go Show resolved Hide resolved

pebrc self-assigned this Oct 24, 2019

pebrc approved these changes Oct 24, 2019

View reviewed changes

pkg/controller/elasticsearch/driver/nodes.go Outdated Show resolved Hide resolved

sebgl added 2 commits October 24, 2019 10:42

Improve comment

91c09a4

Merge branch 'master' into force-upgrade-reachable-but-down

200fc11

sebgl merged commit b6f86c6 into elastic:master Oct 24, 2019

thbkrkr mentioned this pull request Oct 24, 2019

Flaky unit test TestExpectedPodDeletions_DeletionsSatisfied #2052

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perform forced rolling upgrade even if ES is reachable #2022

Perform forced rolling upgrade even if ES is reachable #2022

sebgl commented Oct 18, 2019

pebrc left a comment

Perform forced rolling upgrade even if ES is reachable #2022

Perform forced rolling upgrade even if ES is reachable #2022

Conversation

sebgl commented Oct 18, 2019

pebrc left a comment

Choose a reason for hiding this comment