Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delay Kibana version upgrade until Elasticsearch is fully upgraded #2353

Closed
sebgl opened this issue Jan 6, 2020 · 10 comments
Closed

Delay Kibana version upgrade until Elasticsearch is fully upgraded #2353

sebgl opened this issue Jan 6, 2020 · 10 comments
Labels
>enhancement Enhancement of existing functionality

Comments

@sebgl
Copy link
Contributor

sebgl commented Jan 6, 2020

If we upgrade both Elasticsearch & Kibana from eg. v7.1.1 to v7.2.1, Kibana is stuck in its startup phase and logs:

{"type":"log","@timestamp":"2020-01-06T13:42:47Z","tags":["status","plugin:elasticsearch@7.2.1","error"],"pid":1,"state":"red","message":"Status changed from red to red - This version of Kibana requires Elastic
search v7.2.1 on all nodes. I found the following incompatible nodes in your cluster: v7.1.1 @ 10.16.1.26:9200 (10.16.1.26), v7.1.1 @ 10.16.0.29:9200 (10.16.0.29), v7.1.1 @ 10.16.2.30:9200 (10.16.2.30), v7.1.1 
@ 10.16.2.32:9200 (10.16.2.32)","prevState":"red","prevMsg":"This version of Kibana requires Elasticsearch v7.2.1 on all nodes. I found the following incompatible nodes in your cluster: v7.1.1 @ 10.16.2.30:9200
 (10.16.2.30), v7.1.1 @ 10.16.2.32:9200 (10.16.2.32), v7.1.1 @ 10.16.0.29:9200 (10.16.0.29), v7.1.1 @ 10.16.1.26:9200 (10.16.1.26)"}   

Once all Elasticsearch nodes are upgraded, Kibana is available again.

In case the Kibana resource is pointing towards an Elasticsearch resource managed by ECK, we could maybe wait before upgrading Kibana version until all Elasticsearch nodes have a version superior or equal to the Kibana version?

@sebgl sebgl added the >enhancement Enhancement of existing functionality label Jan 6, 2020
@anyasabo
Copy link
Contributor

anyasabo commented Jan 6, 2020

I'm a +1 for this. Note that for the versions we support, ES and Kibana are compatible iff the major+minor are the same:
https://www.elastic.co/support/matrix#matrix_compatibility

We've had other issues with out of order upgrades:
#2247
And in this scenario we can guard against it pretty simply. We'd need to check both that the ES version spec is updated and the upgrade is complete.

We should also implement this for APM IMO (though we need to block on Kibana and ES). Upgrade order here:
https://www.elastic.co/guide/en/elastic-stack/current/upgrading-elastic-stack.html#upgrade-order-elastic-stack

@anyasabo
Copy link
Contributor

anyasabo commented Jan 14, 2020

One implementation detail I think might be worth discussing: should this be a webhook validation?

My initial thinking was that it should be, to make it clear what the problem is. But then it may make it challenging for say, new deployments where you just apply kibana+es at the same time at the same version. We could allow it to pass validation if you create a kibana with an ES ref that does not exist yet to alleviate that. But then it becomes challenging to do validation if the Elasticsearch resource is created later with an incompatible version. That might still be okay though -- the webhook would still catch the vast majority of issues.

The alternative I see would be to not validate at admission, but then during reconciliation check the version of the referenced Elasticsearch before upgrading the version and emitting a warning if they are incompatible. That seems harder to implement though and is harder to see that there's a problem.

A third option would be to do both -- the webhook catches the vast majority of issues, and the reconciliation check catches the remainder.

@DaveCTurner
Copy link

Note that for the versions we support, ES and Kibana are compatible iff the major+minor are the same

Note that there is another compatibility matrix which offers a bit more flexibility to support minor-version upgrades as long as you upgrade Elasticsearch first.

@anyasabo
Copy link
Contributor

As discussed in zoom, we were curious about situations where Kibana does not automatically recover once Elasticsearch is upgraded. Going through this web of issues, we ran into it here:
#2352
going from 7.2.1 -> 7.5.0

There is an enhancement issue open in Kibana (which mentions it was possible to run into by disabling allocation in ES before upgrading Kibana, which I couldn't reproduce):
elastic/kibana#52569

And the actual original Kibana issue with a bunch of people chiming in:
elastic/kibana#25464

Kibana has some docs here on how to resolve manually:
https://www.elastic.co/guide/en/kibana/6.5/release-notes-6.5.0.html#known-issues-6.5.0

So I think we need to nail down exactly what situations lead to issues in ECK to decide if this is something we want to guard against. If Kibana comes up on its own without intervention once ES is upgraded, I think I'm okay with that as is and do not think we need to change anything in ECK.

@sebgl
Copy link
Contributor Author

sebgl commented Jan 24, 2020

I left some notes in #2352 (comment).
tl;dr: I managed to reproduce the issue when upgrading both ES and Kibana versions concurrently from 7.1.0 to 7.5.0. Which is an unsupported way of upgrading Kibana. A fatal error is expected.
Kibana does not recover from this unfortunately and needs manual intervention.

It does not occur when upgrading from 6.8.0 to 7.1.0 to 7.2.0 to 7.3.0 to 7.4.0 to 7.5.0. In that situation, Kibana is unavailable during the Elasticsearch upgrade since it detects the Elasticsearch version differs, but becomes available as soon as the Elasticsearch version upgrade is over.

@sebgl
Copy link
Contributor Author

sebgl commented Jan 24, 2020

The way Elasticsearch and Kibana controllers are decoupled in ECK does not make it easy to guarantee a version upgrade order ("don't upgrade Kibana until ES is fully upgraded").
By design, the Kibana controller is not supposed to inspect nor access the referenced Elasticsearch resource (or its Pods). It just runs the desired Kibana version as defined in the spec.

If we want to keep that behaviour (Kibana controller has nothing to do with Elasticsearch resources), a safeguard in the version upgrade would require some sort of communication between the Kibana controller and the Elasticsearch-Kibana association controller. The association controller could indicate (through an annotation?) in the Kibana resource the highest allowed Kibana version, according to the current lowest Elasticsearch version running. The Kibana controller would not upgrade the existing Kibana deployment if the desired Kibana version does not match (yet) that annotation.
It sounds a bit convoluted.

@anyasabo anyasabo removed their assignment Jan 30, 2020
@rudolf
Copy link

rudolf commented Feb 5, 2020

There was a regression in Kibana 7.5 where we no longer waited for all ES nodes to be of a compatible version before starting migrations. 7.6 includes a fix for this:
elastic/kibana#51311

@david-kow
Copy link
Contributor

david-kow commented Feb 10, 2020

@rudolf Do I understand correctly that this means we don't need to handle this at all (ie. there is no risk if Kibana is upgraded first)? Do you happen to know if apm/beats have this kind of check too?

@rudolf
Copy link

rudolf commented Feb 10, 2020

Yes all nodes upgraded at the same time is supported (and the error behaviour noted in this thread was a bug).

Just to be clear, the old Kibana node should be taken down first and then only can the ES and Kibana nodes be upgraded. Kibana polls elasticsearch by default every 2.5s so if an outdated Kibana node is left running while ES is being upgraded there is 2.5s of potentially undefined behaviour.

However, once Kibana is restarted, it won't fully start up until the version check is complete. So the behaviour of Kibana being "stuck at startup" is expected.

@pebrc
Copy link
Collaborator

pebrc commented Feb 24, 2020

Closing in favour of #2600

@pebrc pebrc closed this as completed Feb 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement Enhancement of existing functionality
Projects
None yet
Development

No branches or pull requests

6 participants