Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

disabled dataLIFs are picked up by controller during ControllerPublishVolume #524

Closed
travisghansen opened this issue Feb 9, 2021 · 4 comments

Comments

@travisghansen
Copy link

Describe the bug
We have a situation where the 2/4 dataLIFs have been disabled. Trident continues to use all 4 of them during ControllerPublish which in turns gets passed to NodeStage and is used for initiating iscsi connections. The connections fail (as expected) and we end up in a bad state.

Environment
Provide accurate information about the environment to help us reproduce the issue.

  • Trident version: 20.10, 21.01
  • Trident installation flags used: nothing special
  • Container runtime: 19.03, 18.09
  • Kubernetes version: 1.19
  • Kubernetes orchestrator: Rancher v2.x
  • Kubernetes enabled feature gates: none
  • OS: CentOS7 with elrepo kernel
  • NetApp backend types: ontap-san, ontap-nas
  • Other:

To Reproduce
Disable a dataLIF, attach a volume after disabling.

Expected behavior
The disable LIF doesn't get picked up and used.

Additional context

MicrosoftTeams-image

time="2021-02-07T20:32:01Z" level=debug msg="Data LIFs with reporting nodes" reportedDataLIFs="[172.28.61.211 172.28.61.212 172.28.61.215 172.28.61.216]" requestID=e7bb2fc9-d680-469e-8e34-6cb186987531 requestSource=CSI
time="2021-02-07T20:32:01Z" level=debug msg="Attempting volume publish." backend=ontapsan_172.28.61.211 backendUUID=c9e1539e-b4d2-43f2-9444-7c518f1f9d54 requestID=a335277c-a53d-4c90-a1ed-ec5ac1a45548 requestSource=CSI volume=pvc-734edbb5-c733-4297-aa75-a910509d797d volumeInternal=trident_dev_pvc_734edbb5_c733_4297_aa75_a910509d797d
time="2021-02-07T20:32:01Z" level=debug msg="<<<< ControllerPublishVolume" Method=ControllerPublishVolume Type=CSI_Controller requestID=e7bb2fc9-d680-469e-8e34-6cb186987531 requestSource=CSI

After completely deleting the dataLIFs we get this in the logs:

time="2021-02-08T17:44:32Z" level=debug msg="Data LIFs" dataLIFs="[172.28.61.215 172.28.61.216]" requestID=d0c9ef3f-e1b7-4090-91e0-70c315fd900d requestSource=Internal
time="2021-02-08T17:44:32Z" level=debug msg="Found iSCSI LIFs." dataLIFs="[172.28.61.215 172.28.61.216]" requestID=d0c9ef3f-e1b7-4090-91e0-70c315fd900d requestSource=Internal
time="2021-02-08T17:44:32Z" level=debug msg="Read storage pools assigned to SVM." pools="[SSD_7600_01 SSD_7600_02 SSD_3800_03 SSD_3800_04]" requestID=d0c9ef3f-e1b7-4090-91e0-70c315fd900d requestSource=Internal svm=hq00nvsp07
@balaramesh
Copy link
Contributor

One way to address this for all ontap drivers is to use ONTAP's on-box DNS load balancing (or a round-robin DNS for that matter) and explicitly set the dataLIF in the backend.json. The good thing with the on-box DNS load balancing is that I would assume it would handle LIFs that are down and can avoid this issue altogether.

@travisghansen
Copy link
Author

@balaramesh I believe if you explicitly set dataLIF you lose multipath support.

@balaramesh
Copy link
Contributor

Yep you are right

@gnarl
Copy link
Contributor

gnarl commented Apr 23, 2021

This is fixed with commit abefaa and will be included in the Trident 21.04 release.

@gnarl gnarl closed this as completed Apr 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants