Fix readiness script in case of operator upgrade #2208

barkbay · 2019-12-03T18:24:29Z

If some Elasticsearch clusters have been deployed with 1.0.0-beta1 and the operator is upgraded then all pods suddenly became not ready:

NAME                             READY   STATUS    RESTARTS   AGE   IP           NODE                                          NOMINATED NODE   READINESS GATES
pod/es-apm-sample-es-default-0   0/1     Running   0          19m   10.32.1.11   gke-michael-dev3-default-pool-4eb026fe-dt6p   <none>           <none>
pod/es-apm-sample-es-default-1   0/1     Running   0          19m   10.32.0.12   gke-michael-dev3-default-pool-84228e30-qgpl   <none>           <none>
pod/es-apm-sample-es-default-2   0/1     Running   0          19m   10.32.2.12   gke-michael-dev3-default-pool-de22c45a-wsx9

The reason is that the readiness script has ben updated in #2180 and is propagated dynamically to all the Pods through a configmap, while for most of them PROBE_PASSWORD_PATH is unknown.

anyasabo · 2019-12-03T21:51:46Z

This change LGTM. Just to make sure I understand the current behavior:

the referenced PR updates the pod template to include the PROBE_PASSWORD_PATH env var
the readiness script is also updated to use the new env var

The update to the pod template begins the rolling upgrade process. But as soon as the readiness script config map is updated, all of the pods get the updated script even if they have not been restarted yet, and so do not have the new env var. As such they fail the readiness check until they're restarted, but the rolling upgrade probably also does not proceed because too many pods are not ready. Because we only define a readiness probe and not a liveness probe, k8s never restarts the pods either. Is that correct?

barkbay · 2019-12-04T07:31:05Z

Is that correct?

👍 on your analysis , the Pods are eventually restarted for upgrade so in the end the situation is fixed. Nevertheless there is a disruption from the user point of view.

sebgl

Good catch!

pebrc

Thanks for cleaning up after me :-)

Fix readiness script in case of operator upgrade

a432c26

barkbay added >bug Something isn't working v1.0.0 labels Dec 3, 2019

anyasabo approved these changes Dec 3, 2019

View reviewed changes

barkbay self-assigned this Dec 4, 2019

sebgl approved these changes Dec 4, 2019

View reviewed changes

pebrc approved these changes Dec 4, 2019

View reviewed changes

barkbay merged commit 774c58b into elastic:master Dec 4, 2019

barkbay deleted the fix-readiness-script branch December 4, 2019 08:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix readiness script in case of operator upgrade #2208

Fix readiness script in case of operator upgrade #2208

barkbay commented Dec 3, 2019 •

edited

Loading

anyasabo commented Dec 3, 2019

barkbay commented Dec 4, 2019

sebgl left a comment

pebrc left a comment

Fix readiness script in case of operator upgrade #2208

Fix readiness script in case of operator upgrade #2208

Conversation

barkbay commented Dec 3, 2019 • edited Loading

anyasabo commented Dec 3, 2019

barkbay commented Dec 4, 2019

sebgl left a comment

Choose a reason for hiding this comment

pebrc left a comment

Choose a reason for hiding this comment

barkbay commented Dec 3, 2019 •

edited

Loading