Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patching of v1beta1 ES resource with v1 on k8s 1.12 fails #2308

Closed
pebrc opened this issue Dec 19, 2019 · 12 comments
Closed

Patching of v1beta1 ES resource with v1 on k8s 1.12 fails #2308

pebrc opened this issue Dec 19, 2019 · 12 comments
Assignees
Labels
>bug Something isn't working

Comments

@pebrc
Copy link
Collaborator

pebrc commented Dec 19, 2019

k8s 1.12 running ECK 1.0.0-rc3 upgraded from v1beta1

When upgrading what was created in v1beta1 as

apiVersion: elasticsearch.k8s.elastic.co/v1beta1
kind: Elasticsearch
metadata:
  name: es-apm-sample
spec:
  version: 7.4.0
  nodeSets:
    - name: default
      count: 3
      config:
        # This setting could have performance implications for production clusters.
        # See: https://www.elastic.co/guide/en/cloud-on-k8s/master/k8s-virtual-memory.html
        node.store.allow_mmap: false

with

apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: es-apm-sample
spec:
  version: 7.5.0
  nodeSets:
    - name: default
      count: 3
      config:
        # This setting could have performance implications for production clusters.
        # See: https://www.elastic.co/guide/en/cloud-on-k8s/master/k8s-virtual-memory.html
        node.store.allow_mmap: false

I run into the following error:

Error from server (InternalError): error when applying patch:
{"metadata":{"annotations":{"kubectl.kubernetes.io/last-applied-configuration":"{\"apiVersion\":\"elasticsearch.k8s.elastic.co/v1\",\"kind\":\"Elasticsearch\",\"metadata\":{\"annotations\":{},\"name\":\"es-apm-sample\",\"namespace\":\"default\"},\"spec\":{\"nodeSets\":[{\"config\":{\"node.store.allow_mmap\":false},\"count\":3,\"name\":\"default\"}],\"version\":\"7.5.0\"}}\n"}},"spec":{"nodeSets":[{"config":{"node.store.allow_mmap":false},"count":3,"name":"default"}],"version":"7.5.0"}}
to:
Resource: "elasticsearch.k8s.elastic.co/v1, Resource=elasticsearches", GroupVersionKind: "elasticsearch.k8s.elastic.co/v1, Kind=Elasticsearch"
Name: "es-apm-sample", Namespace: "default"
Object: &{map["apiVersion":"elasticsearch.k8s.elastic.co/v1" "kind":"Elasticsearch" "metadata":map["annotations":map["common.k8s.elastic.co/controller-version":"1.0.0-beta1" "elasticsearch.k8s.elastic.co/cluster-uuid":"3S6znhdkQqK72AXerue8ow" "kubectl.kubernetes.io/last-applied-configuration":"{\"apiVersion\":\"elasticsearch.k8s.elastic.co/v1beta1\",\"kind\":\"Elasticsearch\",\"metadata\":{\"annotations\":{},\"name\":\"es-apm-sample\",\"namespace\":\"default\"},\"spec\":{\"nodeSets\":[{\"config\":{\"node.store.allow_mmap\":false},\"count\":3,\"name\":\"default\"}],\"version\":\"7.4.0\"}}\n"] "creationTimestamp":"2019-12-19T14:19:20Z" "finalizers":["finalizer.elasticsearch.k8s.elastic.co/observer" "finalizer.elasticsearch.k8s.elastic.co/secure-settings-secret" "finalizer.elasticsearch.k8s.elastic.co/http-certificates-secret"] "generation":'\x02' "name":"es-apm-sample" "namespace":"default" "resourceVersion":"6295" "selfLink":"/apis/elasticsearch.k8s.elastic.co/v1/namespaces/default/elasticsearches/es-apm-sample" "uid":"8c7defe8-226a-11ea-a356-42010a8e00ac"] "spec":map["http":map["service":map["metadata":map["creationTimestamp":<nil>] "spec":map[]] "tls":map["certificate":map[]]] "nodeSets":[map["config":map["node.store.allow_mmap":%!q(bool=false)] "count":'\x03' "name":"default" "podTemplate":map["metadata":map["creationTimestamp":<nil>] "spec":map["containers":<nil>]]]] "updateStrategy":map["changeBudget":map[]] "version":"7.4.0"] "status":map["availableNodes":'\x03' "health":"green" "phase":"Ready"]]}
for: "https://gist.githubusercontent.com/barkbay/f42682b17a41a9aca4c041c218d7b631/raw/477a4b4e73bc47d7c3b99b4d2c7b0121c6096ff0/03-update-v1beta1-with-v1.yml": Internal error occurred: no kind "Elasticsearch" is registered for version "elasticsearch.k8s.elastic.co/v1" in scheme "k8s.io/apiextensions-apiserver/pkg/apiserver/apiserver.go:51"
@pebrc pebrc added the >bug Something isn't working label Dec 19, 2019
@pebrc
Copy link
Collaborator Author

pebrc commented Dec 19, 2019

Changing v1beta1 to v1 on upgrade yaml (second example above) works.

Also it seems once the resource has been updated once by the v1 operator (whereby it presumably changes the storage version to v1) the problem goes away.

So this is IMO not a blocker.

@pebrc
Copy link
Collaborator Author

pebrc commented Dec 19, 2019

Retested and it is not a caching issue. Does not go away even after 30 mins.

Another couple of insights:

  • the problem manifests itself immediately after the operator upgrade from v1beta1 to v1 with the operator immediately failing the to remove the finalizers and the same error appearing the operator logs.
  • I also tested with k8s 1.13 and problem does not occur there anymore

@barkbay
Copy link
Contributor

barkbay commented Dec 31, 2019

I got the same issue on EKS 1.12

@barkbay
Copy link
Contributor

barkbay commented Dec 31, 2019

I reverted the version to v1beta1 with kubectl and it seems to unlock the situation, not sure to understand why.

Also it only affects Elasticsearch, not Kibana nor the APMServer

@barkbay
Copy link
Contributor

barkbay commented Dec 31, 2019

Likely caused by kubernetes/kubernetes#73752, removing the webhook solves the issue.

Fix has been back-ported to 1.13: kubernetes/kubernetes#79495 but not to 1.12

@barkbay
Copy link
Contributor

barkbay commented Jan 2, 2020

I was wondering why I didn't get the issue on OpenShift 3.11. I did some additional tests and just discovered that ... ValidatingAdmissionWebhook are not enabled by default in the API Server of OpenShift 3.11 😕

Once enabled I get the exact same issue.

@barkbay barkbay self-assigned this Jan 2, 2020
@barkbay
Copy link
Contributor

barkbay commented Jan 3, 2020

I reverted the version to v1beta1 with kubectl and it seems to unlock the situation

Here is a "one liner" to apply the aforementioned workaround on all the Elasticsearch resources in a cluster:

for ns in `kubectl get ns --template '{{range .items}}{{.metadata.name}}{{"\n"}}{{end}}'`; do \
for es in `kubectl get elasticsearch.elasticsearch.k8s.elastic.co --template '{{range .items}}{{.metadata.name}}{{"\n"}}{{end}}' -n ${ns}`; do \
kubectl get elasticsearch.elasticsearch.k8s.elastic.co/${es} -n ${ns} -o yaml | \
  sed -E 's/apiVersion: elasticsearch.k8s.elastic.co\/v1/apiVersion: elasticsearch.k8s.elastic.co\/v1beta1/' | \
  kubectl apply --validate=false -f -
 done
done

kubectl patch can't be used because it does some validation on the client side that prevents the patch to be applied.

@barkbay
Copy link
Contributor

barkbay commented Jan 3, 2020

Sum up of the different options that have been discussed offline:

  • Since the issue is happening on K8S < 1.13 (incl. OpenShift 3.11) while some v1beta1 resources are deployed we can exclude these versions from the upgrade process. We could document that if a user has deployed previous Elasticsearch resources then these resources must be deleted before upgrading if they are running on an impacted K8S version.

  • Provide a second installation manifest without the webhook.

  • Document a workaround, either by applying this command or by requesting the user to delete manually the webhook.

In any case we should document that once ECK has been upgraded to 1.0 on K8S < 1.13 only v1 resources must be deployed.

cc @agup006

@agup006
Copy link

agup006 commented Jan 6, 2020

Looking at the workaround I'm worried some folks will run into unforseen issues, or find it problematic to run the script on OpenShift 3.11 and K8s 1.13. As we still have some flexibility going from beta -> GA the cleaner option is to tell folks from that platform to uninstall and reinstall the operator.

@barkbay
Copy link
Contributor

barkbay commented Jan 7, 2020

Superseded by #2357

@iahmad-khan
Copy link

Same issue after upgrading the operator

@agup006
Copy link

agup006 commented Feb 3, 2020

@iahmad-khan , did you uninstall and reinstall the operator going from beta to GA?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants