-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Container Volumes not showing in vcenter after CSI driver upgrade #2863
Comments
Update: after a day or so, newly created volumes also disappeared from vcenter container volumes list. |
probably same as #2802. This may be related to csi-provisionner enabling topology support by default: kubernetes-csi/external-provisioner#1167 |
It seems that topology support is now enabled by default, but according to docs :
So this looks like a breaking change. Can we disable topology with @divyenpatel any idea? |
Update: after trying a lot of different things to recover the volume catalog in vcenter, we just gave up and manually migrated our data from the broken vsphere volumes to other kinds of storage. Then we did a full purge of the vsphere cpi/csi and all the vsphere volumes from that cluster, and re-installed it all using the rancher helm charts. So far it's working. We were completely clueless about topology settings being the root cause, since we never used it and none of the error messages we found led us this way. |
Hi, we hit this issue today after enabling topology for working around another issue. The fingerprint of this problem is basically topology.csi.vmware.com/cluster-category: yourClusterCategoryTag
topology.csi.vmware.com/datacenter-category: yourDatacenterCategoryTag
name: nodeName001 what we found was that these were not set for some nodes (NOTE: DO NOT SET THEM MANUALLY). The issue can be fixed by:
|
/kind bug
What happened:
We have 5 kubernetes clusters running on rancher/RKE with vsphere CSI driver installed. We recently upgraded our clusters to kubernetes 1.24. We then decided to upgrade the CPI to 1.24, and the vsphere CSI from 2.6.4 to 3.0.3. Most clusters seem to be running ok after the upgrade, but in one specific cluster we ran into some trouble after a few days. We realized some statefulsets stopped working with errors like "Unable to attach or mount volumes: unmounted volumes=[data], unattached volumes=[dshm data]: timed out waiting for the condition". While checking the monitor tab in vcenter, we realized all PV's from that specific cluster weren't being displayed anymore. The vsphere-syncer container shows the following errors in its log:
Looking back at vcenter events, right after the update we started getting lots of "Sync volume" and "Attach container volume" events from the nodes belonging to the problematic cluster.
I googled similar errors that ocurred in much older versions of CSI, but none of the workarounds, like reconciling the CNS catalog, worked for me. Also, from what I understand, all in-tree volumes should have been migrated to CSI long before this last update, since we were using the 2.6.4 version with migration enabled.
Tried updating CSI to 3.0.3rc1, but only thing that changed was the frequency of the fullsync calls by the syncer.
I'm able to create new volumes, and they all show in vcenter normally.
What you expected to happen:
CSI migration to run smoothly.
How to reproduce it (as minimally and precisely as possible):
Upgraded CPI to 1.24 via helm chart, and then CSI driver to 3.0.3 by applying the manifests according to vmware documentation.
Anything else we need to know?:
Attacher logs:
csi-controller logs
Environment:
uname -a
): 3.10.0-1160.11.1.el7.x86_64The text was updated successfully, but these errors were encountered: