Should GET / return 503 in case of discovery.zen.no_master_block: write ? #8902

Mpdreamz · 2014-12-11T14:49:55Z

Given we have two nodes one (A) with:

node.master: false
discovery.zen.no_master_block: write

and (B) being a vanilla master node.

When we stop node (B), (A) is still allowed to service read requests.
However when calling

GET http://(A):9200/ HTTP/1.1

It currently returns:

HTTP/1.1 503 Service Unavailable

but is the service really unavailable in this case? Since we now explicitly allow you to configure for this state IMO it should return 200 OK with a possible boolean in the response signalling its in readonly mode.

A call to _search in this state also results in a 200 and not 503.

The text was updated successfully, but these errors were encountered:

martijnvg · 2015-01-16T11:11:58Z

@Mpdreamz Agreed, the main rest endpoint should return 200 in case there is no elected master. It doesn't report on cluster state related things, just configured cluster_name and a couple of node related stats.

javanna · 2015-01-16T11:22:53Z

Agreed, the main rest endpoint should return 200 in case there is no elected master.

should it always return 200 or only if we block writes? If we block all operations we should still return 503 maybe?

martijnvg · 2015-01-16T11:44:38Z

true, we should return 503 when all operations are blocked, but in the case
of when just block writes we should return 200, because the node is still
partially operational.

On 16 January 2015 at 12:23, Luca Cavanna notifications@github.com wrote:

Agreed, the main rest endpoint should return 200 in case there is no
elected master.

should it always return 200 or only if we block writes? If we block all
operations we should still return 503 maybe?

—
Reply to this email directly or view it on GitHub
#8902 (comment)
.

Met vriendelijke groet,

Martijn van Groningen

bleskes · 2015-01-16T14:00:53Z

+1 . During master lost we go into a new master election which takes 3s (by default). During those 3s the node has a configured block - if it allows read we should indeed return 200. This is likely a transient state which will be solved before we start rejecting indexing requests (remember they wait up to 1m for the situation to be resolved).

clintongormley · 2016-11-26T16:43:35Z

I'm not sure that this is the right thing to do. Imagine you're using sniffing. You try to perform a write and get back a 503 so you sniff and get back a 200, then you try the write again, get back a 503, etc etc.

That said, the above would work for reads. I know the python client aborts after 3 attempts, while the Perl client keeps going until it gets back a 503 on sniffing. Perhaps we should only sniff once before giving up.

bleskes · 2016-11-26T19:31:40Z

@clintongormley how are other transient errors that are not reflected by / handled? for example - if the queues are full we return a 429 code. Does that have special handling? Another example - the circuit breaker throws a 503 too. That one is not reflected by /. Should it?

Another aspect to consider here - on master loss (since 1.4), all data nodes will have a master block for 3s. If you hit / at that moment, no node will be happy and that's I believe this ticket is about. I'm not saying that this is what we should do but I think this should be taken into account in the solution.

clintongormley · 2016-11-28T09:46:27Z

We have said that a 503 response code should mean "retry on another node".

if the queues are full we return a 429 code. Does that have special handling?

Not in the Perl client, but not sure about the others. I think 429 should probably not retry but instead backoff.

the circuit breaker throws a 503 too

That means retry on another node.... This one is debatable. If you've sent the request that has triggered the circuit breaker, you could then replicate that bad behaviour across all nodes in the cluster by retrying.

jasontedor · 2018-03-14T03:51:03Z

A 503 is completely broken behavior here. A REST status is a response for the given request. A 503 means "I am overloaded right now, I can not handle your request." That is completely out of alignment with a discovery.zen.no_master_block block. If the server can respond to the / request, it is not overloaded.

jasontedor · 2018-03-14T04:15:57Z

I opened #29045.

This PR update readiness probe endpoint to check only `/` endpoint instead of `/_cluster/health?timeout=0s` when Elasticsearch is already running. This revert to initial config which was changed in elastic#380 with the exception that 503 HTTP code is accepted for 6.x (see elastic/elasticsearch#8902 for more details about why 503 is OK on Elasticsearch 6.x).

This PR update readiness probe endpoint to check only `/` endpoint instead of `/_cluster/health?timeout=0s` when Elasticsearch is already running. This revert to initial config which was changed in elastic/helm-charts#380 with the exception that 503 HTTP code is accepted for 6.x (see elastic/elasticsearch#8902 for more details about why 503 is OK on Elasticsearch 6.x).

gmarz mentioned this issue Dec 11, 2014

Should GET / return 503 in case of discovery.zen.no_master_block: write ? elastic/elasticsearch-net#1131

Closed

clintongormley added the discuss label Dec 11, 2014

martijnvg added help wanted adoptme and removed discuss labels Jan 16, 2015

clintongormley added >bug :Core/Infra/Core Core issues without another label v2.0.0-beta1 labels Jan 16, 2015

clintongormley added v2.1.0 and removed v2.0.0-beta1 v2.0.0 labels Aug 13, 2015

clintongormley added v2.2.0 and removed v2.1.0 labels Nov 20, 2015

spinscale added v2.3.0 and removed v2.2.0 labels Dec 23, 2015

clintongormley added v2.4.0 and removed v2.3.0 labels Mar 16, 2016

clintongormley added v2.4.1 and removed v2.4.0 labels Aug 24, 2016

clintongormley added v2.4.2 and removed v2.4.1 labels Sep 23, 2016

clintongormley removed the v2.4.2 label Nov 6, 2016

clintongormley added discuss and removed help wanted adoptme labels Nov 26, 2016

jasontedor mentioned this issue Mar 14, 2018

Main response should not have status 503 when okay #29045

Merged

jasontedor removed >bug discuss labels Mar 14, 2018

jasontedor closed this as completed in #29045 Mar 14, 2018

jmlrt mentioned this issue Apr 17, 2020

[elasticsearch] update readiness probe endpoint elastic/helm-charts#586

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should GET / return 503 in case of discovery.zen.no_master_block: write ? #8902

Should GET / return 503 in case of discovery.zen.no_master_block: write ? #8902

Mpdreamz commented Dec 11, 2014

martijnvg commented Jan 16, 2015

javanna commented Jan 16, 2015

martijnvg commented Jan 16, 2015

bleskes commented Jan 16, 2015

clintongormley commented Nov 26, 2016

bleskes commented Nov 26, 2016

clintongormley commented Nov 28, 2016

jasontedor commented Mar 14, 2018

jasontedor commented Mar 14, 2018

Should GET / return 503 in case of discovery.zen.no_master_block: write ? #8902

Should GET / return 503 in case of discovery.zen.no_master_block: write ? #8902

Comments

Mpdreamz commented Dec 11, 2014

martijnvg commented Jan 16, 2015

javanna commented Jan 16, 2015

martijnvg commented Jan 16, 2015

bleskes commented Jan 16, 2015

clintongormley commented Nov 26, 2016

bleskes commented Nov 26, 2016

clintongormley commented Nov 28, 2016

jasontedor commented Mar 14, 2018

jasontedor commented Mar 14, 2018