Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing storageClassName in CR does not change the PVC resource #39

Closed
tylergu opened this issue Mar 14, 2022 · 8 comments
Closed

Changing storageClassName in CR does not change the PVC resource #39

tylergu opened this issue Mar 14, 2022 · 8 comments
Assignees
Labels
bug Something isn't working

Comments

@tylergu
Copy link
Member

tylergu commented Mar 14, 2022

Describe the bug

I was trying to change the persistence/storageClassName field in my rabbitmq-cluster's CR, but changing the persistence/storageClassName has no effect on the PVC used by the statefulSet.
The persistence/storageClassName field was initially not specified so the operator used the default storage class "standard". Then I created a new storage class following instruction here https://github.com/rabbitmq/cluster-operator/blob/main/docs/examples/production-ready/ssd-gke.yaml, and changed persistence/storageClassName from null to ssd. This change failed silently.

To reproduce

Steps to reproduce this behavior:

  1. Deploy cluster-operator using the command:
    kubectl apply -f https://github.com/rabbitmq/cluster-operator/releases/latest/download/cluster-operator.yml
  2. Deploy rabbitmqCluster with the following yaml file:
apiVersion: rabbitmq.com/v1beta1
kind: RabbitmqCluster
metadata:
  name: sample
spec:
  image: "rabbitmq:3.8.21-management"
  persistence:
    storage: 20Gi
  replicas: 2
  1. Create new storage class:
    kubectl apply -f https://github.com/rabbitmq/cluster-operator/blob/main/docs/examples/production-ready/ssd-gke.yaml
  2. Change the rabbitmqCluster's persistence/storageClassName to ssd and apply
apiVersion: rabbitmq.com/v1beta1
kind: RabbitmqCluster
metadata:
  name: sample
spec:
  image: "rabbitmq:3.8.21-management"
  persistence:
    storage: 20Gi
    storageClassName: ssd
  replicas: 2
  1. Check the PVC used by rabbitmq cluster, the storageClassName change is not applied. It's still "standard" instead of "ssd":
"spec": {
    "accessModes": [
        "ReadWriteOnce"
    ],
    "resources": {
        "requests": {
            "storage": "20Gi"
        }
    },
    "storageClassName": "standard",
    "volumeMode": "Filesystem",
    "volumeName": "pvc-43d66309-2090-4de3-bc82-9edcd0a69361"
}

Version and environment information

  • rabbitmq: rabbitmq:3.8.21-management
  • rabbitmq cluster operator: rabbitmqoperator/cluster-operator:1.10.0
  • Kubernetes: 1.21.1
  • Running on Kind cluster

Addition context

This bug is caused because the operator only reconciles the PVC's storage capacity, but does not reconcile the storageClassName here: https://github.com/rabbitmq/cluster-operator/blob/d657ffb516f948aaffd252794e3ed5e75e352d3d/controllers/reconcile_persistence.go#L15.

Possible fix is to create the desired storage type and migrate the data over. Or report an error message like PVC scaling down here: https://github.com/rabbitmq/cluster-operator/blob/d657ffb516f948aaffd252794e3ed5e75e352d3d/internal/scaling/scaling.go#L51

@tylergu tylergu added the bug Something isn't working label Mar 14, 2022
@tylergu tylergu changed the title Changing PVC's storageClassName does not change the PVC Changing storageClassName in CR does not change the PVC resource Mar 14, 2022
@tylergu
Copy link
Member Author

tylergu commented Mar 15, 2022

This bug is found automatically by Acto.

In one of the mutations, Acto changed the CR's persistence/storageClassName from null to random. Then Acto didn't found the matching change in the system state delta.
Though random is not a valid storageClassName, this alarm still exposed the bug in rabbitmq-operator.

@tianyin
Copy link
Member

tianyin commented Mar 15, 2022

@tylergu I have a naive question -- why it has to reconcile the name?

In other words, why is not reconciliating the name a bug (beyond looking good)?

@tylergu
Copy link
Member Author

tylergu commented Mar 15, 2022

@tianyin , the operator should not only reconcile the name, but also change the type of persistence volume it uses.

Kubernetes provides a concept called storage class which allows users to have different types of storage from different provisioners. For example, I can create a storage class called ssd which specifically uses fast SSDs instead of default storage.

Here, the rabbitmq-operator should actually restart the Persistent Volumn Claim and migrate the data over, since we as the user changed the storage class we want to use.

@tianyin
Copy link
Member

tianyin commented Mar 16, 2022

@tylergu did you report it?

@tylergu
Copy link
Member Author

tylergu commented Mar 16, 2022

I added a possible fix and issued: rabbitmq/cluster-operator#992

@tianyin
Copy link
Member

tianyin commented Mar 16, 2022

nice!

@tylergu tylergu mentioned this issue Mar 16, 2022
@tylergu
Copy link
Member Author

tylergu commented Mar 21, 2022

I think the developers confirmed this bug, and they think it's an interesting(hard) feature to implement. There are other operators which actually support changing storage class on the fly: https://www.percona.com/blog/2021/04/20/change-storage-class-on-kubernetes-on-the-fly/

@tylergu tylergu closed this as completed May 19, 2022
@tianyin
Copy link
Member

tianyin commented May 19, 2022

There are other operators which actually support changing storage class on the fly: https://www.percona.com/blog/2021/04/20/change-storage-class-on-kubernetes-on-the-fly/

This is a priceless point!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants