Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to update Cassandra cluster once it gets into unhealthy state #334

Open
hoyhbx opened this issue May 19, 2022 · 2 comments
Open

Unable to update Cassandra cluster once it gets into unhealthy state #334

hoyhbx opened this issue May 19, 2022 · 2 comments
Labels
bug Something isn't working

Comments

@hoyhbx
Copy link

hoyhbx commented May 19, 2022

What happened?

We found that when we specify bad values for the field spec.nodeAffinityLabels, we later cannot modify it to correct the statefulSet, and the statefulSet is stuck with the incorrect spec.nodeAffinityLabels. The only way to correct it is to manually modify the statefulSet spec to delete the invalid nodeAffinityLabels.

The root cause of this issue is similar to the previous one: #324
In the previous issue, we were acknowledged that the statefulSet reconciliation being blocked while doing updating is according to the design.
We still want to report this incident to help document the potential bad consequences that can be caused. In this case, it caused dead lock which requires a restart or manual statefulSet correction to solve.

Did you expect to see something different?

The node affinity config on pods should be updated/removed when users updated/remove the invalid nodeAffinityLabels settings in CR.

How to reproduce it (as minimally and precisely as possible):

  1. Install cer-manager and operator
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.7.1/cert-manager.yaml
kubectl apply -f init.yaml
kubectl apply --force-conflicts --server-side -k 'github.com/k8ssandra/cass-operator/config/deployments/cluster?ref=v1.10.3'
  1. Deploy cassandraDB cluster with invalid nodeAffinityLabels by kubectl apply -f cr1.yaml
    cr1.yaml:
    apiVersion: cassandra.datastax.com/v1beta1
    kind: CassandraDatacenter
    metadata:
      name: cassandra-datacenter
    spec:
      clusterName: cluster1
      serverType: cassandra
      serverVersion: 3.11.7
      nodeAffinityLabels:
        dc: someLabel1
      managementApiAuth:
        insecure: {}
      size: 1
      storageConfig:
        cassandraDataVolumeClaimSpec:
          storageClassName: server-storage
          accessModes:
          - ReadWriteOnce
          resources:
            requests:
              storage: 3Gi
      config:
        cassandra-yaml:
          authenticator: org.apache.cassandra.auth.PasswordAuthenticator
          authorizer: org.apache.cassandra.auth.CassandraAuthorizer
          role_manager: org.apache.cassandra.auth.CassandraRoleManager
        jvm-options:
          initial_heap_size: 800M
          max_heap_size: 800M
    ```1. Observe that the affinity on statefulSet is set according to the CR `kubectl get pods cluster1-cassandra-datacenter-default-sts-0 --namespace=cass-operator -o=jsonpath='{.spec.affinity.nodeAffinity}'`
    ```json
    {
        "requiredDuringSchedulingIgnoredDuringExecution": {
            "nodeSelectorTerms": [
                {
                    "matchExpressions": [
                        {
                            "key": "dc",
                            "operator": "In",
                            "values": [
                                "someLabel1"
                            ]
                        }
                    ]
                }
            ]
        }
    }
    ```1. Update custom resource (remove NodeAffinity from the CR) `kubectl apply -f cr2.yaml`
    **cr2.yaml**
    ```yaml
    apiVersion: cassandra.datastax.com/v1beta1
    kind: CassandraDatacenter
    metadata:
      name: cassandra-datacenter
    spec:
      clusterName: cluster1
      serverType: cassandra
      serverVersion: 3.11.7
      managementApiAuth:
        insecure: {}
      size: 1
      storageConfig:
        cassandraDataVolumeClaimSpec:
          storageClassName: server-storage
          accessModes:* ReadWriteOnce
          resources:
            requests:
              storage: 3Gi
      config:
        cassandra-yaml:
          authenticator: org.apache.cassandra.auth.PasswordAuthenticator
          authorizer: org.apache.cassandra.auth.CassandraAuthorizer
          role_manager: org.apache.cassandra.auth.CassandraRoleManager
        jvm-options:
          initial_heap_size: 800M
          max_heap_size: 800M

Environment

  • Cass Operator version:

    docker.io/k8ssandra/cass-operator@sha256:fb9d9822fceda0057a1de39b690a5cfe570980a93e3782948482ccf68c3683bc

    * Kubernetes version information: `v1.21.1 * Kubernetes cluster kind:```kind v0.11.1 go1.16.4 linux/amd64

* Manifests:

init.yaml

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:

Changing the name to server-storage is the only change we have made compared to upstream

name: server-storage
provisioner: rancher.io/local-path
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Delete




┆Issue is synchronized with this [Jira Story](https://datastax.jira.com/browse/CASS-43) by [Unito](https://www.unito.io)
┆Issue Number: CASS-43
@hoyhbx hoyhbx added the bug Something isn't working label May 19, 2022
@sync-by-unito sync-by-unito bot changed the title Unable to update nodeAffinityLabels once setted with a bad value K8SSAND-1520 ⁃ Unable to update nodeAffinityLabels once setted with a bad value May 19, 2022
@jsanda
Copy link
Contributor

jsanda commented May 19, 2022

This is different than #324 in that there is no scaling involved. This is the expected behavior and has been for some time; however, I do consider it a bug and think it should be changed.

Any changes to the podTemplateSpec property of the underlying StatefulSet(s) will not be applied unless all Cassandra pods are in the ready state. Chicken meet egg :)

The work around that I typically recommend is to set stopped: true in your CassandraDatacenter spec. This will scale the StatefulSets down to zero pods. Then if you apply a change that involves an update to the podTemplateSpec it will be applied because there aren't any pods that are not ready. Lastly after applying the changes, set stopped: true to scale the StatefulSets back up. Note that this will not result in any data loss. PVCs aren't touched with this process.

@hoyhbx
Copy link
Author

hoyhbx commented May 23, 2022

Got it, thanks for the confirmation! We are also happy to contribute if you have plans to fix it:)

@hoyhbx hoyhbx changed the title K8SSAND-1520 ⁃ Unable to update nodeAffinityLabels once setted with a bad value K8SSAND-1520 ⁃ Unable to update Cassandra cluster once it gets into unhealthy state Apr 15, 2023
@sync-by-unito sync-by-unito bot changed the title K8SSAND-1520 ⁃ Unable to update Cassandra cluster once it gets into unhealthy state Unable to update Cassandra cluster once it gets into unhealthy state Oct 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
No open projects
Status: No status
Development

No branches or pull requests

2 participants