-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG]: CSM Operator 1.5.0 crashloopbackoff #1637
Comments
@OA72280: Thank you for submitting this issue! The issue is currently awaiting triage. Please make sure you have given us as much context as possible. If the maintainers determine this is a relevant issue, they will remove the needs-triage label and respond appropriately. We want your feedback! If you have any questions or suggestions regarding our contributing process/workflow, please reach out to us at container.storage.modules@dell.com. |
@OA72280 I am not able to understand the issue.
|
Customer reports operator came up long enough to roll the CSI driver without the snapshotter side car and is now crashing again. |
/sync |
How was the PowerFlex driver installed? With (CSI/CSM?) Operator or other means. Was PowerFlex already installed prior to installing Operator? Please get us an accurate timeline of what was done. We can try to reproduce but the logs that you have and the description of the problem is vague. Thanks. |
The CSI driver was deployed and functional .... it was only when the customer went to remove the snapshotter they edited the CR, disable: true for snapshotter. Tried to apply the CR - nothing happened...they checked the status of the operator and learned it was failing. Not sure how long it had been failing, could be hours, days, weeks... but since the CSI was ok, no one knew. |
/sync |
link: 31238 |
@OA72280 Could you share the following data as well:
|
Bug Description
Customer has attempted to use both Red Hat OperatorHub (Mirrored) as well as offline installer process to deploy CSM 1.5.0 in OpenShift 4.14.x - Operator loads momentarily and then goes into - crashloopbackoff
Note - The users environment had already had the PowerFlex CSI deployed, though, with the snapshotter sidecar. Upon updating their CR file and running #oc replace -f cr-file.yaml nothing happened. The snapshotter was not being removed.
Validated it is not the CR file. Took at look at the environment and the operator crashing is obviously why when applying the updated CR nothing is happening.
Cleared the CSI driver out and operator but the operator is now incapable of deploying.
Logs
2024-12-10T22:29:56.977Z DEBUG workspace/main.go:87 Operator Version {"TraceId": "main", "Version": "1.5.0", "Commit ID": "5a711be500ab85c70be0dea037f3c97ba77d20b2", "Commit SHA": "Tue, 19 Mar 2024 14:31:11 UTC"}
2024-12-10T22:29:56.978Z DEBUG workspace/main.go:88 Go Version: go1.22.1 {"TraceId": "main"}
2024-12-10T22:29:56.978Z DEBUG workspace/main.go:89 Go OS/Arch: linux/amd64 {"TraceId": "main"}
2024-12-10T22:29:57.180Z INFO workspace/main.go:101 Openshift environment {"TraceId": "main"}
2024-12-10T22:29:57.184Z INFO workspace/main.go:140 Current kubernetes version is 1.27 which is a supported version {"TraceId": "main"}
2024-12-10T22:29:57.184Z INFO workspace/main.go:151 Use ConfigDirectory /etc/config/dell-csm-operator {"TraceId": "main"}
2024-12-10T22:29:57Z INFO controller-runtime.metrics Metrics server is starting to listen {"addr": ":8082"}
2024-12-10T22:29:57Z INFO setup starting manager
2024-12-10T22:29:57Z INFO Starting server {"kind": "health probe", "addr": "[::]:8081"}
2024-12-10T22:29:57Z INFO starting server {"path": "/metrics", "kind": "metrics", "addr": "[::]:8082"}
I1210 22:29:57.276792 1 leaderelection.go:245] attempting to acquire leader lease -dell-csm-operator-168737/090cae6a.dell.com...
I1210 22:30:14.787155 1 leaderelection.go:255] successfully acquired lease -dell-csm-operator-168737/090cae6a.dell.com
2024-12-10T22:30:14Z DEBUG events dell-csm-operator-controller-manager-fc9c5f796-njg5x_0b2eccbf-e65c-4861-9257-9d55487737df became leader {"type": "Normal", "object": {"kind":"Lease","namespace":"-dell-csm-operator-168737","name":"090cae6a.dell.com","uid":"2e6e47e1-23bf-4021-86b1-7bd8e5453ee9","apiVersion":"coordination.k8s.io/v1","resourceVersion":"720572476"}, "reason": "LeaderElection"}
2024-12-10T22:30:14Z INFO Starting EventSource {"controller": "containerstoragemodule", "controllerGroup": "storage.dell.com", "controllerKind": "ContainerStorageModule", "source": "kind source: *v1.ContainerStorageModule"}
2024-12-10T22:30:14Z INFO Starting Controller {"controller": "containerstoragemodule", "controllerGroup": "storage.dell.com", "controllerKind": "ContainerStorageModule"}
2024-12-10T22:30:14Z INFO Starting EventSource {"controller": "apexconnectivityclient", "controllerGroup": "storage.dell.com", "controllerKind": "ApexConnectivityClient", "source": "kind source: *v1.ApexConnectivityClient"}
2024-12-10T22:30:14Z INFO Starting Controller {"controller": "apexconnectivityclient", "controllerGroup": "storage.dell.com", "controllerKind": "ApexConnectivityClient"}
2024-12-10T22:30:14Z INFO Starting workers {"controller": "containerstoragemodule", "controllerGroup": "storage.dell.com", "controllerKind": "ContainerStorageModule", "worker count": 1}
2024-12-10T22:30:14Z INFO Starting workers {"controller": "apexconnectivityclient", "controllerGroup": "storage.dell.com", "controllerKind": "ApexConnectivityClient", "worker count": 1}
2024-12-10T22:30:14.891Z INFO controllers/csm_controller.go:240 ################Starting Reconcile############## {"TraceId": "powerflex-1"}
2024-12-10T22:30:14.891Z INFO controllers/csm_controller.go:243 reconcile for {"TraceId": "powerflex-1", "Namespace": "-powerflex-168737", "Name": "powerflex", "Attempt": 1}
--Additional logs from previous pods--
I1210 21:40:32.681491 1 trace.go:236] Trace[1662988865]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.27.2/tools/cache/reflector.go:231 (10-Dec-2024 21:40:22.275) (total time: 10406ms):
Trace[1662988865]: ---"Objects listed" error: 10405ms (21:40:32.680)
2024-12-10T21:42:38.573Z INFO utils/status.go:304 error from getDeploymentStatus: Deployment.apps "powerflex-controller" not found {"TraceId": "powerflex-0"}
2024-12-10T21:46:29.776Z INFO utils/status.go:315 calculate Daemonseterror msg [DaemonSet.apps "powerflex-node" not found] {"TraceId": "powerflex-1"}
Screenshots
N/A
Additional Environment Information
N/A
Steps to Reproduce
N/A
Expected Behavior
N/A
CSM Driver(s)
CSM 1.5.0 with PowerFlex 2.10.0 CR
Installation Type
Offline Installer and OperatorHub
Container Storage Modules Enabled
Authorization
Enabled and Disabled, does not matter, either way results in the same outcome.
Container Orchestrator
OpenShift
Operating System
RHCOS
The text was updated successfully, but these errors were encountered: