Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent unintentional removal of all clusters if CRD is removed #5117

Closed
naemono opened this issue Dec 5, 2021 · 3 comments · Fixed by #7811
Closed

Prevent unintentional removal of all clusters if CRD is removed #5117

naemono opened this issue Dec 5, 2021 · 3 comments · Fixed by #7811
Labels
discuss We need to figure this out >feature Adds or discusses adding a feature to the product

Comments

@naemono
Copy link
Contributor

naemono commented Dec 5, 2021

Proposal

We probably want to try and prevent users from accidentally purging all ES clusters (or any Elastic custom resource for that matter) from their cluster by accidentally deleting the custom resource itself, as the below scenario shows is possible. This could be caused by a user during maintenance thinking that they could remove, reinstall without issues, or potentially from issues during a helm upgrade where the user is wanting to completely remove ECK operator, and reinstall it, not understanding the consequences of their action.

Example of the issue to prevent:

❯ kind create cluster --config config/samples/elasticsearch/custom/kind-cluster.yaml --wait 10m --retain
Creating cluster "kind" ...

❯ helm install elastic-operator elastic/eck-operator -n elastic-system --create-namespace -f ./eck-operator-values.yaml
NAME: elastic-operator
LAST DEPLOYED: Sun Dec  5 14:56:39 2021
NAMESPACE: elastic-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
1. Inspect the operator logs by running the following command:
   kubectl logs -n elastic-system sts/elastic-operator

❯ cat ./eck-operator-values.yaml
webhook:
  enabled: true
managedNamespaces:
- elastic

❯ kubectl apply -f config/samples/elasticsearch/custom/elasticsearch-small-1-member.yaml -n elastic

❯ cat config/samples/elasticsearch/custom/elasticsearch-small-1-member.yaml
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: testing
spec:
  version: 7.15.1
  nodeSets:
  - name: masters
    count: 1
    config:
      node.roles: ["master", "data"]
      node.store.allow_mmap: false

❯ kubectl get es -n elastic
NAME      HEALTH   NODES   VERSION   PHASE   AGE
testing   green    1       7.15.1    Ready   96s

❯ kc delete crd elasticsearches.elasticsearch.k8s.elastic.co
customresourcedefinition.apiextensions.k8s.io "elasticsearches.elasticsearch.k8s.elastic.co" deleted

❯ kubectl get pod -n elastic
NAME                   READY   STATUS        RESTARTS   AGE
testing-es-masters-0   1/1     Terminating   0          2m3s

Possible prevention path

We likely could add an additional validating webhook that routes all CRD deletion requests to a validating webhook such as:

apiVersion: admissionregistration.k8s.io/v1beta1
kind: ValidatingWebhookConfiguration
metadata:
  name: elastic-crd-deletion-prevention.k8s.elastic.co
webhooks:
- name: elastic-crd-deletion-prevention.k8s.elastic.co
  rules:
  - apiGroups:
    - apiextensions.k8s.io
    apiVersions:
    - v1
    operations:
    - DELETE
    resources:
    - customresourcedefinitions
  clientConfig:
    service:
      namespace: webhooks
      name: eck-webhook-server
      path: /crd

Where we would

  1. ignore all non-elastic CRDs
  2. Query for any instances of the CRD type in the cluster, and if the number is > 0, refuse the request with a detailed explanation of the reasoning.
@naemono naemono added the discuss We need to figure this out label Dec 5, 2021
@botelastic botelastic bot added the triage label Dec 5, 2021
@mtparet
Copy link

mtparet commented Dec 6, 2021

🙏 I experienced this too #4734 because of this issue k3s-io/helm-controller#110

@thbkrkr thbkrkr added the >feature Adds or discusses adding a feature to the product label Dec 6, 2021
@botelastic botelastic bot removed the triage label Dec 6, 2021
@jpuskar
Copy link

jpuskar commented Mar 5, 2023

I ran into this too, when migrating from direct helm installs to argocd, because it involves removing the helm chart. It bit me pretty hard, and this behavior is inconsistent with every other operator helm chart I've used.

What I expected was to remove the helm release, and be able to immediately reapply it, without consequences, because the state would be stored in the CRDs.

The current behavior is extremely dangerous.

@jtele2
Copy link

jtele2 commented Nov 15, 2023

This also bit me hard with flux. This needs to be fixed ASAP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss We need to figure this out >feature Adds or discusses adding a feature to the product
Projects
None yet
5 participants