Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non Azure Kubernetes cluster when tried to connect to azure storage using csi driver nfs protocol, throwing error in pvc creation #689

Closed
anjalisajith opened this issue Jun 7, 2022 · 11 comments

Comments

@anjalisajith
Copy link

Error :
Provisioning failed blob.csi.azure.com.xyz failed to provision volume with Storage class "abc"
Rpc error : code = Internal desc = update service endpoints failed with error : SubnetClient is nil

When implemented the same solution in AKS contributor permission was provided to the VNET RG and storage contributor to Storage account,

Since this is a non-azure cluster Please advice if this is an issue with similar permission or not ? And if so please guide how to enable the connectivity for non-azure clusters in azure

@andyzhangx
Copy link
Member

not sure whether this driver would support non-azure cluster since for NFS, it requires agent node and storage account in the same vnet. And about the issue, could you provide driver controller logs by: https://github.com/kubernetes-sigs/blob-csi-driver/blob/master/docs/csi-debug.md#case1-volume-createdelete-issue

This driver requires azure cloud config, have you set that config? follow by: https://github.com/kubernetes-sigs/blob-csi-driver#option1-provide-cloud-provider-config-with-azure-credentials

@anjalisajith
Copy link
Author

Hi,

I tried to test the connection between on premise rancher cluster and azure blob. Configured the azure cloud config when tried to deploy the csi driver, able to deploy the pv and pvc successfully without any issues. But when tried to mount thr backend in the test nginx container it is throwing error "driver name not found in the list of registered csi drivers " .

Could you please help

@andyzhangx
Copy link
Member

Hi,

I tried to test the connection between on premise rancher cluster and azure blob. Configured the azure cloud config when tried to deploy the csi driver, able to deploy the pv and pvc successfully without any issues. But when tried to mount thr backend in the test nginx container it is throwing error "driver name not found in the list of registered csi drivers " .

Could you please help

@anjalisajith that means this driver daemonset is not installed successfully on the agent node, follow guide here to troubelshooting: https://github.com/kubernetes-sigs/blob-csi-driver/blob/master/docs/csi-debug.md#case2-volume-mountunmount-failed

@anjalisajith
Copy link
Author

Hi,
I tried to test the connection between on premise rancher cluster and azure blob. Configured the azure cloud config when tried to deploy the csi driver, able to deploy the pv and pvc successfully without any issues. But when tried to mount thr backend in the test nginx container it is throwing error "driver name not found in the list of registered csi drivers " .
Could you please help

@anjalisajith that means this driver daemonset is not installed successfully on the agent node, follow guide here to troubelshooting: https://github.com/kubernetes-sigs/blob-csi-driver/blob/master/docs/csi-debug.md#case2-volume-mountunmount-failed

csi-blobe-node and csi-blob-controller pods are stuck at crashloopbackoff error.
Kubectl logs daemonset/csi-blob-node -c blob -n kube-system -f
This command gives the error as below :
F0613: no credentials provided for azure cloud provider
Configured the azure config details as secret and passed the details at the time of csi driver HELM installation
Azure config file is created with the below details:
tenantID, subscription ID, rg, location , subnet,msg,vnet and vnet RG.

Please suggest if we missed any more details?

@andyzhangx
Copy link
Member

@anjalisajith which version are you trying, from v1.7.0, we allow empty cloud config: #562

@andyzhangx
Copy link
Member

could you provide more detailed logs?

@andyzhangx
Copy link
Member

I see where the problem is, you should set --allow-empty-cloud-config=true in both driver controller deployment and driver daemonset:

- "--allow-empty-cloud-config={{ .Values.controller.allowEmptyCloudConfig }}"

@anjalisajith
Copy link
Author

I see where the problem is, you should set --allow-empty-cloud-config=true in both driver controller deployment and driver daemonset:

- "--allow-empty-cloud-config={{ .Values.controller.allowEmptyCloudConfig }}"

Hi Andy,

These were the files that got created as part of csi-driver installation, are you suggesting to update these files ?
Can you please share any sample document or detailed steps?

@andyzhangx
Copy link
Member

from your logs, I think the issue is secret kube-system/azure-cloud-provider contains invalid azure cloud provider config, could you remove that config, and then restart all blob drivers?

  • normal driver logs when cloud provider config is not provided, it should run well without any azure cloud config:
I0614 06:44:45.841900       1 main.go:113] set up prometheus server on [::]:29634
I0614 06:44:45.842137       1 blob.go:216]
DRIVER INFORMATION:
-------------------
Build Date: "2022-06-14T02:07:04Z"
Compiler: gc
Driver Name: blob.csi.azure.com
Driver Version: v1.14.0
Git Commit: 8b5f47d4af866b3308075c964245e966b2074199
Go Version: go1.18.3
Platform: linux/amd64

Streaming logs below:
I0614 06:44:45.842147       1 blob.go:219] driver userAgent: blob.csi.azure.com/v1.14.0 gc/go1.18.3 (amd64-linux) OSS-kubectl
I0614 06:44:45.842480       1 azure.go:78] reading cloud config from secret kube-system/azure-cloud-provider
I0614 06:44:45.911238       1 azure.go:85] InitializeCloudFromSecret: failed to get cloud config from secret kube-system/azure-cloud-provider: failed to get secret kube-system/azure-cloud-provider: secrets "azure-cloud-provider" not found
I0614 06:44:45.911260       1 azure.go:90] could not read cloud config from secret kube-system/azure-cloud-provider
I0614 06:44:45.911268       1 azure.go:96] use default AZURE_CREDENTIAL_FILE env var: /etc/kubernetes/azure.json
W0614 06:44:45.911283       1 azure.go:101] load azure config from file(/etc/kubernetes/azure.json) failed with open /etc/kubernetes/azure.json: no such file or directory
I0614 06:44:45.911292       1 azure.go:113] no cloud config provided, error: failed to get secret kube-system/azure-cloud-provider: secrets "azure-cloud-provider" not found, driver will run without cloud config
I0614 06:44:45.911301       1 azure.go:138] starting controller server...
I0614 06:44:45.911308       1 blob.go:224] cloud: , location: , rg: , VnetName: , VnetResourceGroup: , SubnetName:
I0614 06:44:45.911377       1 mount_linux.go:208] Detected OS without systemd
I0614 06:44:45.911387       1 driver.go:80] Enabling controller service capability: CREATE_DELETE_VOLUME
I0614 06:44:45.911392       1 driver.go:80] Enabling controller service capability: EXPAND_VOLUME
I0614 06:44:45.911396       1 driver.go:80] Enabling controller service capability: SINGLE_NODE_MULTI_WRITER
I0614 06:44:45.911401       1 driver.go:99] Enabling volume access mode: SINGLE_NODE_WRITER
I0614 06:44:45.911405       1 driver.go:99] Enabling volume access mode: SINGLE_NODE_READER_ONLY
I0614 06:44:45.911409       1 driver.go:99] Enabling volume access mode: SINGLE_NODE_SINGLE_WRITER
I0614 06:44:45.911413       1 driver.go:99] Enabling volume access mode: SINGLE_NODE_MULTI_WRITER
I0614 06:44:45.911417       1 driver.go:99] Enabling volume access mode: MULTI_NODE_READER_ONLY
I0614 06:44:45.911421       1 driver.go:99] Enabling volume access mode: MULTI_NODE_SINGLE_WRITER
I0614 06:44:45.911425       1 driver.go:99] Enabling volume access mode: MULTI_NODE_MULTI_WRITER
I0614 06:44:45.911429       1 driver.go:90] Enabling node service capability: STAGE_UNSTAGE_VOLUME
I0614 06:44:45.911434       1 driver.go:90] Enabling node service capability: SINGLE_NODE_MULTI_WRITER

@anjalisajith
Copy link
Author

from your logs, I think the issue is secret kube-system/azure-cloud-provider contains invalid azure cloud provider config, could you remove that config, and then restart all blob drivers?

  • normal driver logs when cloud provider config is not provided, it should run well without any azure cloud config:
I0614 06:44:45.841900       1 main.go:113] set up prometheus server on [::]:29634
I0614 06:44:45.842137       1 blob.go:216]
DRIVER INFORMATION:
-------------------
Build Date: "2022-06-14T02:07:04Z"
Compiler: gc
Driver Name: blob.csi.azure.com
Driver Version: v1.14.0
Git Commit: 8b5f47d4af866b3308075c964245e966b2074199
Go Version: go1.18.3
Platform: linux/amd64

Streaming logs below:
I0614 06:44:45.842147       1 blob.go:219] driver userAgent: blob.csi.azure.com/v1.14.0 gc/go1.18.3 (amd64-linux) OSS-kubectl
I0614 06:44:45.842480       1 azure.go:78] reading cloud config from secret kube-system/azure-cloud-provider
I0614 06:44:45.911238       1 azure.go:85] InitializeCloudFromSecret: failed to get cloud config from secret kube-system/azure-cloud-provider: failed to get secret kube-system/azure-cloud-provider: secrets "azure-cloud-provider" not found
I0614 06:44:45.911260       1 azure.go:90] could not read cloud config from secret kube-system/azure-cloud-provider
I0614 06:44:45.911268       1 azure.go:96] use default AZURE_CREDENTIAL_FILE env var: /etc/kubernetes/azure.json
W0614 06:44:45.911283       1 azure.go:101] load azure config from file(/etc/kubernetes/azure.json) failed with open /etc/kubernetes/azure.json: no such file or directory
I0614 06:44:45.911292       1 azure.go:113] no cloud config provided, error: failed to get secret kube-system/azure-cloud-provider: secrets "azure-cloud-provider" not found, driver will run without cloud config
I0614 06:44:45.911301       1 azure.go:138] starting controller server...
I0614 06:44:45.911308       1 blob.go:224] cloud: , location: , rg: , VnetName: , VnetResourceGroup: , SubnetName:
I0614 06:44:45.911377       1 mount_linux.go:208] Detected OS without systemd
I0614 06:44:45.911387       1 driver.go:80] Enabling controller service capability: CREATE_DELETE_VOLUME
I0614 06:44:45.911392       1 driver.go:80] Enabling controller service capability: EXPAND_VOLUME
I0614 06:44:45.911396       1 driver.go:80] Enabling controller service capability: SINGLE_NODE_MULTI_WRITER
I0614 06:44:45.911401       1 driver.go:99] Enabling volume access mode: SINGLE_NODE_WRITER
I0614 06:44:45.911405       1 driver.go:99] Enabling volume access mode: SINGLE_NODE_READER_ONLY
I0614 06:44:45.911409       1 driver.go:99] Enabling volume access mode: SINGLE_NODE_SINGLE_WRITER
I0614 06:44:45.911413       1 driver.go:99] Enabling volume access mode: SINGLE_NODE_MULTI_WRITER
I0614 06:44:45.911417       1 driver.go:99] Enabling volume access mode: MULTI_NODE_READER_ONLY
I0614 06:44:45.911421       1 driver.go:99] Enabling volume access mode: MULTI_NODE_SINGLE_WRITER
I0614 06:44:45.911425       1 driver.go:99] Enabling volume access mode: MULTI_NODE_MULTI_WRITER
I0614 06:44:45.911429       1 driver.go:90] Enabling node service capability: STAGE_UNSTAGE_VOLUME
I0614 06:44:45.911434       1 driver.go:90] Enabling node service capability: SINGLE_NODE_MULTI_WRITER

Hi Andy,
As advised I tried uninstalling the csi-driver and reinstalled it using the below helm command without providing the azure config file. Csi driver version v1.13.0.
Kubernetes Cluster : rancher

Scenario 1:
Helm install blob-csi-driver/blob-csi-driver --set node.enableBlobfuseProxy=true --namespace kube-system

In that case also we are getting the crashloopbackoff error for both csi-blob-node and blob-controller.

On describing csi-blob-node it shows :
Error failed to start container "install-blobfuse-proxy" error response from daemon :OCI runtime create failed. Starting container process caused : exec "/blobfuse-proxy/init.sh" : stat /blobfuse-proxy/init.sh no such file or directory

Scenario 2:
Also tried to change the path from /var/lib/kubelet to /opt/rke /var/lib/kubelet and tried to install helm without the azure config file.

In this case also blob-controller , blobefuse-proxy and blobe-node are failing with crashloopbackoff error.

Csi-blobe-node error :
Liveness probe failed : kubelet plugin registration hasn't succeeded yet , file : opt/rke/var/lib/kubelet/plugins/blob.csi.azure.com/registration doesn't exist.

Csi-blobefuse-proxy and csi-blob-controller shows "back off restarting failed container"

Could you please advice on this?
Above two scenarios were tried without providing the azure config file.
If we need to provide azure config file, please confirm whether it requires app registration in azure ?

@andyzhangx
Copy link
Member

pls do not use master branch to install the csi driver since it uses latest tag image which tends to change, pls follow released version install method: https://github.com/kubernetes-sigs/blob-csi-driver/blob/master/docs/install-csi-driver-v1.14.0.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants