Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cass-operator crashes when using configSecret for Cassandra configuration #705

Closed
kos-team opened this issue Sep 13, 2024 · 0 comments · Fixed by #706
Closed

Cass-operator crashes when using configSecret for Cassandra configuration #705

kos-team opened this issue Sep 13, 2024 · 0 comments · Fixed by #706
Assignees
Labels
bug Something isn't working

Comments

@kos-team
Copy link

kos-team commented Sep 13, 2024

What happened?

When attempting to re-configure the CR with spec.configSecret property with a secret with no Annotations property, cass-operator crashes and restarts.

What did you expect to happen?

We expect the Cassandra container to correctly switch the application config to the one pointed by configSecret and keep the container running.

How can we reproduce it (as minimally and precisely as possible)?

This bug can be reproduced by first deploying the cass-operator and run these steps:

  1. Save this secret yaml into a file called secret.yaml:
apiVersion: v1
kind: Secret
metadata:
  name: test-config
type: Opaque
stringData:
  config: |-
    {
      "cassandra-yaml": {
        "read_request_timeout": "5000ms"
      },
      "jvm-options": {
        "initial_heap_size": "512M",
        "max_heap_size": "512M"
      }
    }
  1. Create this secret with this command:
kubectl -n cass-operator create -f secret.yaml
  1. This CR create a Cassandra cluster which config is specified by the property spec.config. Apply this CR.
apiVersion: cassandra.datastax.com/v1beta1
kind: CassandraDatacenter
metadata:
  name: test-cluster
spec:
  clusterName: development
  config:
    cassandra-yaml:
      authenticator: PasswordAuthenticator
      authorizer: CassandraAuthorizer
      num_tokens: 16
      role_manager: CassandraRoleManager
    jvm-server-options:
      initial_heap_size: 1G
      max_heap_size: 1G
  managementApiAuth:
    insecure: {}
  racks:
  - name: rack1
  - name: rack2
  - name: rack3
  resources:
    requests:
      cpu: 1000m
      memory: 2Gi
  serverType: cassandra
  serverVersion: 4.1.2
  size: 3
  storageConfig:
    cassandraDataVolumeClaimSpec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 10Gi
      storageClassName: standard
  1. This CR creates the property spec.configSecret that points to the secret we created above. Apply this CR:
apiVersion: cassandra.datastax.com/v1beta1
kind: CassandraDatacenter
metadata:
  name: test-cluster
spec:
  clusterName: development
  config:
    cassandra-yaml:
      authenticator: PasswordAuthenticator
      authorizer: CassandraAuthorizer
      num_tokens: 16
      role_manager: CassandraRoleManager
    jvm-server-options:
      initial_heap_size: 1G
      max_heap_size: 1G
  configSecret: test-config
  managementApiAuth:
    insecure: {}
  racks:
  - name: rack1
  - name: rack2
  - name: rack3
  resources:
    requests:
      cpu: 1000m
      memory: 2Gi
  serverType: cassandra
  serverVersion: 4.1.2
  size: 3
  storageConfig:
    cassandraDataVolumeClaimSpec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 10Gi
      storageClassName: standard

cass-operator version

k8ssandra/cass-operator:v1.22.1

Kubernetes version

1.29.1

Method of installation

Helm

Anything else we need to know?

Root Cause
In operator's log, the function reconcile_configsecret.checkDatacenterNameAnnotation() is pointed out. More specifically, line 93 is where the panic happens. Go complains that we are trying to write to a nil map: "assignment to entry in nil map." Go is trying to write to secret.Annotations, which is not guaranteed to be present in a valid secret config. Therefore, trying to write to a nonexistent secret.Annotations causes the operator to crash.

Cass-operator panic log:
        "2024-09-11T22:51:44.074Z\tINFO\tObserved a panic in reconciler: assignment to entry in nil map\t{\"controller\": \"cassandradatacenter_controller\", \"controllerGroup\": \"cassandra.datastax.com\", \"controllerKind\": \"CassandraDatacenter\", \"CassandraDatacenter\": {\"name\":\"test-cluster\",\"namespace\":\"cass-operator\"}, \"namespace\": \"cass-operator\", \"name\": \"test-cluster\", \"reconcileID\": \"bc2600de-09f6-4dcd-9624-410317a4987b\"}",
        "panic: assignment to entry in nil map [recovered]",
        "\tpanic: assignment to entry in nil map",
        "",
        "goroutine 222 [running]:",
        "sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()",
        "\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.4/pkg/internal/controller/controller.go:116 +0x1e5",
        "panic({0x1744620?, 0x1bc11c0?})",
        "\t/usr/local/go/src/runtime/panic.go:770 +0x132",
        "github.com/k8ssandra/cass-operator/pkg/reconciliation.(*ReconciliationContext).checkDatacenterNameAnnotation(0xc0002b4280, 0xc0007103c0)",
        "\t/workspace/pkg/reconciliation/reconcile_configsecret.go:93 +0x147",
        "github.com/k8ssandra/cass-operator/pkg/reconciliation.(*ReconciliationContext).CheckConfigSecret(0xc0002b4280)",
        "\t/workspace/pkg/reconciliation/reconcile_configsecret.go:41 +0x1ca",
        "github.com/k8ssandra/cass-operator/pkg/reconciliation.(*ReconciliationContext).ReconcileAllRacks(0xc0002b4280)",
        "\t/workspace/pkg/reconciliation/reconcile_racks.go:2454 +0x546",
        "github.com/k8ssandra/cass-operator/pkg/reconciliation.(*ReconciliationContext).CalculateReconciliationActions(0xc0002b4280)",
        "\t/workspace/pkg/reconciliation/handler.go:68 +0x105",
        "github.com/k8ssandra/cass-operator/internal/controllers/cassandra.(*CassandraDatacenterReconciler).Reconcile(0xc000291270, {0x1bdd8b0, 0xc0007a97a0}, {{{0xc000480840, 0xd}, {0xc00048084e, 0xc}}})",
        "\t/workspace/internal/controllers/cassandra/cassandradatacenter_controller.go:147 +0xa69",
        "sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x1be1768?, {0x1bdd8b0?, 0xc0007a97a0?}, {{{0xc000480840?, 0xb?}, {0xc00048084e?, 0x0?}}})",
        "\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.4/pkg/internal/controller/controller.go:119 +0xb7",
        "sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0000fc0a0, {0x1bdd8e8, 0xc0002911d0}, {0x17bf9c0, 0xc00078a2e0})",
        "\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.4/pkg/internal/controller/controller.go:316 +0x3bc",
        "sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0000fc0a0, {0x1bdd8e8, 0xc0002911d0})",
        "\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.4/pkg/internal/controller/controller.go:266 +0x1be",
        "sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()",
        "\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.4/pkg/internal/controller/controller.go:227 +0x79",
        "created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 in goroutine 151",
        "\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.4/pkg/internal/controller/controller.go:223 +0x50c"

Notice that in cass-operator's CRD documentation, when both config and configSecret are specified, cass-operator would prefer configSecret to config. Therefore, applying the second CR would ask cass-operator to switch from using config to configSecret as the application's config.

┆Issue is synchronized with this Jira Story by Unito
┆Reviewer: Alexander Dejanovski
┆Issue Number: CASS-66

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
No open projects
Status: No status
2 participants