Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Green health is reported while it should be unknown #2939

Closed
barkbay opened this issue Apr 23, 2020 · 2 comments · Fixed by #4938
Closed

Green health is reported while it should be unknown #2939

barkbay opened this issue Apr 23, 2020 · 2 comments · Fixed by #4938
Labels
>bug Something isn't working

Comments

@barkbay
Copy link
Contributor

barkbay commented Apr 23, 2020

I got into a situation where the reported status was always green:

NAME                                              HEALTH   NODES   VERSION   PHASE             AGE
elasticsearch.elasticsearch.k8s.elastic.co/es-0   green    5       7.6.2     ApplyingChanges   7m42s

While my cluster was broken with only 1 master available out of 3, which means that the status should have been unknown I guess:

# curl -k -v -u elastic-internal-probe:redacted https://127.0.0.1:9200/_cluster/health
< HTTP/1.1 503 Service Unavailable
@barkbay barkbay added the >bug Something isn't working label Apr 23, 2020
@anyasabo anyasabo self-assigned this May 15, 2020
@anyasabo
Copy link
Contributor

anyasabo commented Jun 9, 2020

I can reproduce this. In my 3 node GKE cluster and an ES cluster with 3 masters, I set hard anti affinity rules on the masters, let it build. I then tainted two of the nodes and killed two of the pods. The ES resource stayed green/Ready afterwards, even though it was reconciling it

{"log.level":"info","@timestamp":"2020-06-09T19:36:16.712Z","log.logger":"generic-reconciler","message":"Recoverable error during step, continuing","service.version":"1.2.0-2bc08ef2","service.type":"eck","ecs.version":"1.4.0","step":"reconcile-cluster-license","error":"failed to revert to basic: 503 Service Unavailable: unknown","errorVerbose":"503 Service Unavailable

{"log.level":"error","@timestamp":"2020-06-09T19:35:16.311Z","log.logger":"driver","message":"Could not update remote clusters in Elasticsearch settings","service.version":"1.2.0-2bc08ef2","service.type":"eck","ecs.version":"1.4.0","namespace":"default","es_name":"elasticsearch-aff","error":"503 Service Unavailable: ","error.stack_trace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/pkg/mod/github.com/go-logr/zapr@v0.1.0/zapr.go:128\ngh.neting.cc/elastic/cloud-on-k8s/pkg/controller/elasticsearch/driver.(*defaultDriver).Reconcile\n\t/go/src/github.com/elastic/cloud-on-k8s/pkg/controller/elasticsearch/driver/driver.go:228

{"log.level":"info","@timestamp":"2020-06-09T19:34:46.294Z","log.logger":"generic-reconciler","message":"Recoverable error during step, continuing","service.version":"1.2.0-2bc08ef2","service.type":"eck","ecs.version":"1.4.0","step":"reconcile-cluster-license","error":"failed to revert to basic: 503 Service Unavailable: unknown",

Taking a look at it more to see how we should approach it.

@anyasabo
Copy link
Contributor

anyasabo commented Nov 3, 2020

Unassigning myself, I think this is worth double checking once the refactoring for #3496 is complete as that will be much more reward for the effort.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants