You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Trident CSI Node plugin (csi.trident.netapp.io) on one node is now unregistered after the Kubernetes version was updated from v1.18.9 to v1.19.4. Pods on this node can no longer mount and unmount Trident volumes.
Error messages
We see the following messages in the kubelet log.
csi.trident.netapp.io was unregistered since the registration socket (/var/lib/kubelet/plugins_registry/csi.trident.netapp.io-reg.sock) had been removed.
I1119 05:47:54.246972 6550 plugin_watcher.go:212] Removing socket path /var/lib/kubelet/plugins_registry/csi.trident.netapp.io-reg.sock from desired state cache
I1119 05:47:53.162305 6550 reconciler.go:139] operationExecutor.UnregisterPlugin started for plugin at "/var/lib/kubelet/plugins_registry/csi.trident.netapp.io-reg.sock" (plugin details: &{/var/lib/kubelet/plugins_registry/csi.trident.netapp.io-reg.sock 2020-11-04 05:08:19.553684094 +0000 UTC m=+38.893901704 0x704c200 csi.trident.netapp.io})
I1119 05:47:53.163390 6550 csi_plugin.go:177] kubernetes.io/csi: registrationHandler.DeRegisterPlugin request for plugin csi.trident.netapp.io
The pod could not unmount the volume because csi.trident.netapp.io was not found.
E1119 09:02:52.819122 6550 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/csi/csi.trident.netapp.io^pvc-75a6fd7f-7aee-45e8-a5fa-d4500272528e podName:ad18a7d1-4090-4e0c-9e71-cba46dfc3657 nodeName:}" failed. No retries permitted until 2020-11-19 09:04:54.819071328 +0000 UTC m=+1310234.159288938 (durationBeforeRetry 2m2s). Error: "UnmountVolume.TearDown failed for volume "data" (UniqueName: "kubernetes.io/csi/csi.trident.netapp.io^pvc-75a6fd7f-7aee-45e8-a5fa-d4500272528e") pod "ad18a7d1-4090-4e0c-9e71-cba46dfc3657" (UID: "ad18a7d1-4090-4e0c-9e71-cba46dfc3657") : kubernetes.io/csi: mounter.SetUpAt failed to get CSI client: driver name csi.trident.netapp.io not found in the list of registered CSI drivers"
Two trident-csi were running simultaneously
We found that two trident-csi (Node Plugin) pods on this node were running simultaneously for a very short time, and that the old driver-registrar had stopped after a new one had started.
driver-registrar removes the registration socket (/var/lib/kubelet/plugins_registry/csi.trident.netapp.io-reg.sock) when it recieves SIGTERM (node_register.go#L113-L116). Removing the socket causes the kubelet to unregister the Trident plugin. I believe this is the cause of the problem.
DaemonSet was recreated after updating
Trident-csi (Node Plugin) pods are managed by DaemonSet. Normally, only one pod runs on every node. But after Kubernetes was updated, trident-csi Daemonset was recreated by trident-operator. Deleting DaemonSet allows two pods (old and new) to run simultaneously.
After Kubernetes was updated, the shouldUpdate flag was set to true (controller.go#L1110). It seems that the shouldUpdate flag causes the trident-csi Daemonset to be deleted(installer.go#L1489-L1494).
Updating the Kubernetes version may reproduce this problem. Since updating Kubernetes takes a long time and does not always happen, we confirmed the following behaviors that cause this problem through different demonstrations.
Two trident-csi causes the kubelet to unregister Trident plugin
Confirm that the Trident CSI driver is registered on the node.
Thank you for providing details of this issue and looking closely at the underlying cause, your analysis is very helpful. The window between the daemonset pod's termination and getting recreation is critical, and the latter should only occur only when the former has completed. Therefore, the operator should ensure that before daemonset creation the pods belonging to the previous daemonset are all deleted and then only create a new a daemonset.
Out of curiosity, do you mind me asking the number of clusters that have run into this issue during an upgrade?
Describe the bug
Trident CSI Node plugin (
csi.trident.netapp.io
) on one node is now unregistered after the Kubernetes version was updated from v1.18.9 to v1.19.4. Pods on this node can no longer mount and unmount Trident volumes.Error messages
We see the following messages in the kubelet log.
csi.trident.netapp.io
was unregistered since the registration socket (/var/lib/kubelet/plugins_registry/csi.trident.netapp.io-reg.sock
) had been removed.The pod could not unmount the volume because
csi.trident.netapp.io
was not found.Two trident-csi were running simultaneously
We found that two
trident-csi
(Node Plugin) pods on this node were running simultaneously for a very short time, and that the olddriver-registrar
had stopped after a new one had started.driver-registrar
removes the registration socket (/var/lib/kubelet/plugins_registry/csi.trident.netapp.io-reg.sock
) when it recieves SIGTERM (node_register.go#L113-L116). Removing the socket causes the kubelet to unregister the Trident plugin. I believe this is the cause of the problem.DaemonSet was recreated after updating
Trident-csi (Node Plugin) pods are managed by DaemonSet. Normally, only one pod runs on every node. But after Kubernetes was updated, trident-csi Daemonset was recreated by
trident-operator
. Deleting DaemonSet allows two pods (old and new) to run simultaneously.We confirmed this on the
trident-operator
log.Here, the
trident-csi
Daemonset was deleted.The
trident-csi
Daemonset was then created soon after.After Kubernetes was updated, the
shouldUpdate
flag was set to true (controller.go#L1110). It seems that theshouldUpdate
flag causes thetrident-csi
Daemonset to be deleted(installer.go#L1489-L1494).Environment
silenceAutosupport: true
(Trident Operator)To Reproduce
Updating the Kubernetes version may reproduce this problem. Since updating Kubernetes takes a long time and does not always happen, we confirmed the following behaviors that cause this problem through different demonstrations.
Two trident-csi causes the kubelet to unregister Trident plugin
trident-csi
DaemonSet to run two trident-csi pods on each node.trident-csi-2
DaemonSet.Recreating DaemonSet allows two pods (old and new) to run simultaneously
trident-csi
DaemonSet. The DaemonSet will be recreated soon after by the trident-operator.trident-csi
pods on each node.Expected behavior
Pods can mount and unmount Trident volumes after Kubernetes version is updated.
Additional context
None
The text was updated successfully, but these errors were encountered: