Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elasticsearch must be upgraded before the APM Server #2426

Closed
thbkrkr opened this issue Jan 14, 2020 · 4 comments
Closed

Elasticsearch must be upgraded before the APM Server #2426

thbkrkr opened this issue Jan 14, 2020 · 4 comments
Labels
>enhancement Enhancement of existing functionality

Comments

@thbkrkr
Copy link
Contributor

thbkrkr commented Jan 14, 2020

What did you do?

  • Deploy Elasticsearch and APM Server in version 7.4.0
Manifest ```yaml apiVersion: elasticsearch.k8s.elastic.co/v1 kind: Elasticsearch metadata: name: es-apm-sample spec: version: 7.4.0 nodeSets: - name: default count: 3 config: node.store.allow_mmap: false --- apiVersion: apm.k8s.elastic.co/v1 kind: ApmServer metadata: name: apm-apm-sample spec: version: 7.4.0 count: 1 elasticsearchRef: name: "es-apm-sample" ```
  • Upgrade Elasticsearch and APM Server to version 7.5.0
Manifest ```yaml apiVersion: elasticsearch.k8s.elastic.co/v1 kind: Elasticsearch metadata: name: es-apm-sample spec: version: 7.5.0 nodeSets: - name: default count: 3 config: node.store.allow_mmap: false --- apiVersion: apm.k8s.elastic.co/v1 kind: ApmServer metadata: name: apm-apm-sample spec: version: 7.5.0 count: 1 elasticsearchRef: name: "es-apm-sample" ```

What did you expect to see?

A green Elasticsearch cluster during the whole process.

What did you see instead? Under which circumstances?

The Elasticsearch cluster goes red during several seconds during the upgrade (from the point of view of ECK (kubectl get es)).

This is highlighted when the k8s cluster is slow (e.g.: kind on my laptop is slower than gke).

What is going on?

When the manifest with the new stack version is applied, the APM Server container and one Elasticsearch container are recreated in the new version.
The APM Server container is ready long before that of ES.
The APM Server tried to create new indices for the new version.
But the shards allocation has been disabled during the rolling upgrade of Elasticsearch.
We are therefore left with unallocated primary indices during the entire time that the ES instance starts.

/_cat/shards when the ES cluster health is reported as red:

apm-7.4.0-error-000001          0 p STARTED    0  283b 10.244.0.6 es-apm-sample-es-default-0
apm-7.4.0-error-000001          0 r STARTED    0  283b 10.244.0.5 es-apm-sample-es-default-1
apm-7.5.0-metric-000001         0 p STARTED    0  230b 10.244.0.6 es-apm-sample-es-default-0
apm-7.5.0-metric-000001         0 r UNASSIGNED                    
apm-7.4.0-transaction-000001    0 p STARTED    0  283b 10.244.0.5 es-apm-sample-es-default-1
apm-7.4.0-transaction-000001    0 r UNASSIGNED                    
apm-7.5.0-transaction-000001    0 p UNASSIGNED                    
apm-7.5.0-transaction-000001    0 r UNASSIGNED                    
apm-7.4.0-span-000001           0 p STARTED    0  283b 10.244.0.6 es-apm-sample-es-default-0
apm-7.4.0-span-000001           0 r UNASSIGNED                    
apm-7.5.0-error-000001          0 p STARTED    0  230b 10.244.0.6 es-apm-sample-es-default-0
apm-7.5.0-error-000001          0 r UNASSIGNED                    
apm-7.4.0-onboarding-2020.01.14 0 p STARTED    1 6.3kb 10.244.0.5 es-apm-sample-es-default-1
apm-7.4.0-onboarding-2020.01.14 0 r UNASSIGNED                    
apm-7.5.0-span-000001           0 p STARTED    0  230b 10.244.0.5 es-apm-sample-es-default-1
apm-7.5.0-span-000001           0 r UNASSIGNED                    
apm-7.5.0-onboarding-2020.01.14 0 p STARTED    1 6.2kb 10.244.0.5 es-apm-sample-es-default-1
apm-7.5.0-onboarding-2020.01.14 0 r UNASSIGNED                    
apm-7.4.0-metric-000001         0 p STARTED    0  283b 10.244.0.6 es-apm-sample-es-default-0
apm-7.4.0-metric-000001         0 r UNASSIGNED                    

Solution:

It's important to not upgrade at the same time an Elasticsearch cluster and an APM Server.

You have to upgrade the components of your Elastic Stack in the following order: see https://www.elastic.co/guide/en/elastic-stack/7.5/upgrading-elastic-stack.html#upgrade-order-elastic-stack.
 

@thbkrkr
Copy link
Contributor Author

thbkrkr commented Jan 14, 2020

Since it is very easy with ECK to upgrade multiple Elastic Stack components at once, perhaps we should at least document this.

@anyasabo
Copy link
Contributor

Related #2353

@david-kow
Copy link
Contributor

The APM Server tried to create new indices for the new version.
But the shards allocation has been disabled during the rolling upgrade of Elasticsearch.

I'm not sure if this is exactly correct as we do allow primaries to be allocated during upgrades. As @barkbay pointed to me, this might be caused by allocating a primary to a node that is about to be deleted.

It would seem to me we would have the same issue in general case of creating an index during upgrade - should we exclude pod from allocation before any delete? Right now we seem to do it only during downscales. I think this wouldn't prevent it on it's own (we would need to check for health again too), but it would shorten the window.

@pebrc pebrc added >enhancement Enhancement of existing functionality and removed >non-issue labels Feb 3, 2020
@pebrc
Copy link
Collaborator

pebrc commented Feb 24, 2020

Closing in favour of #2600

@pebrc pebrc closed this as completed Feb 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement Enhancement of existing functionality
Projects
None yet
Development

No branches or pull requests

4 participants