Skip to content
This repository has been archived by the owner on May 16, 2023. It is now read-only.

[elasticsearch] Revisit readinessProbe #553

Closed
jmlrt opened this issue Apr 2, 2020 · 2 comments · Fixed by #586
Closed

[elasticsearch] Revisit readinessProbe #553

jmlrt opened this issue Apr 2, 2020 · 2 comments · Fixed by #586
Assignees
Labels
enhancement New feature or request

Comments

@jmlrt
Copy link
Member

jmlrt commented Apr 2, 2020

Originally posted by @pugnascotia in elastic/elasticsearch#53426 (comment)

@jmlrt jmlrt added the enhancement New feature or request label Apr 2, 2020
@jmlrt
Copy link
Member Author

jmlrt commented Apr 17, 2020

Helm Chart readiness probe

We are using a script which run /_cluster/health?timeout=0s endpoint to check that Elasticsearch is started, then /_cluster/health?wait_for_status=green&timeout=1s endpoint to check that Elasticsearch is cluster is OK:

if [ -f "${START_FILE}" ]; then
echo 'Elasticsearch is already running, lets check the node is healthy and there are master nodes available'
http "/_cluster/health?timeout=0s"
else
echo 'Waiting for elasticsearch cluster to become ready (request params: "{{ .Values.clusterHealthCheckParams }}" )'
if http "/_cluster/health?{{ .Values.clusterHealthCheckParams }}" ; then
touch ${START_FILE}
exit 0
else
echo 'Cluster is not yet ready (request params: "{{ .Values.clusterHealthCheckParams }}" )'
exit 1
fi
fi

Note that informations for these checks are retrieved from masters as local query parameter is not set (https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-health.html#request-params).

The desired behaviour here is that if the data nodes are unable to talk to their master nodes for whatever reason, then the data nodes will become Unready and therefore be removed from the Service load-balancer until the master nodes are available again (quoting @fatmcgav from #380 (comment)).

ECK readiness probe

ECK is using a different approach which check / endpoint to only ensure that Elasticsearch node is started:

https://github.com/elastic/cloud-on-k8s/blob/71a3725335596acdb6b7a13c917f9018d9953a6f/pkg/controller/elasticsearch/nodespec/readiness_probe.go#L67-L77

The intention is to check only the single node independently from overall cluster health/cluster membership to know whether it is principally ready to enter into operation (elastic/cloud-on-k8s#2248 (comment)) to avoid issue during rolling upgrade where all nodes loose their ready state and are deleted while master nodes are rolled (more detail in (elastic/cloud-on-k8s#1748 (comment)).

Note that from what I know a failed readiness probe should only remove the pod from service so no traffic is sent to it untill readiness probe is successfull again, it shouldn't kill the pod (unless ECK operator force killing pods not ready).

@jmlrt
Copy link
Member Author

jmlrt commented Apr 21, 2020

Closed by #586

@jmlrt jmlrt closed this as completed Apr 21, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant