Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PV was re-formated and deleted data in volume #418

Closed
ysakashita opened this issue Jul 9, 2020 · 26 comments
Closed

PV was re-formated and deleted data in volume #418

ysakashita opened this issue Jul 9, 2020 · 26 comments

Comments

@ysakashita
Copy link

Describe the bug

We are using StatefulSet to create 100 Pods that mount 10G PV.
Then, we operated a rolling update StatefulSet and moving nodes.

As a result, only one PV was re-formated and deleted data in volume.

Environment

  • Trident version: [20.04]
  • Trident installation flags used: [-n trident]
  • Container runtime: [Docker 19.03.11-CE]
  • Kubernetes version: [1.17.4]
  • Kubernetes orchestrator: [none]
  • Kubernetes enabled feature gates: [none]
  • OS: [Ubuntu18.04]
  • NetApp backend types: [ONTAP AFF 9.5]
  • Other:
    • open-iscsi: 2.0.874-5ubuntu2.10 (in Ubuntu 18.04)
    • /etc/iscsi/iscsid.conf : `update node.session.scan = manual and node.session.timeo.replacement_timeout = 5

To Reproduce

Rolling update of StatefulSet and Node.
But, we don't know clealy how to reproduce way yet.

Additional context

The trident log (Debug) is below.

time="2020-07-09T01:25:40Z" level=debug msg=">>>> osutils.getFSType" device=/dev/dm-0
time="2020-07-09T01:25:40Z" level=debug msg=">>>> osutils.execCommandWithTimeout." args="[/dev/dm-0]" command=blkid timeoutSeconds=5s
time="2020-07-09T01:25:40Z" level=debug msg="<<<< osutils.execCommandWithTimeout." command=blkid error="exit status 2" output=
time="2020-07-09T01:25:40Z" level=debug msg="Could not get FSType for device." device=/dev/dm-0
time="2020-07-09T01:25:40Z" level=debug msg=">>>> osutils.formatVolume" device=/dev/dm-0 fsType=ext4
time="2020-07-09T01:25:40Z" level=debug msg=">>>> osutils.execCommand." args="[-F /dev/dm-0]" command=mkfs.ext4
time="2020-07-09T01:25:40Z" level=debug msg="<<<< osutils.execCommand." command=mkfs.ext4 error="exit status 1" output="mke2fs 1.44.1 (24-Mar-2018)\nmkfs.ext4: Device size reported to be zero.  Invalid partition specified, or\n\tpartition table wasn't reread after running fdisk, due to\n\ta modified partition being busy and in use.  You may need to reboot\n\tto re-read your partition table.\n"
time="2020-07-09T01:25:40Z" level=debug msg="Format failed, retrying." increment=1.837401191s
time="2020-07-09T01:25:42Z" level=debug msg=">>>> osutils.execCommand." args="[-F /dev/dm-0]" command=mkfs.ext4
time="2020-07-09T01:25:49Z" level=debug msg="<<<< osutils.execCommand." command=mkfs.ext4 error="<nil>" output="mke2fs 1.44.1 (24-Mar-2018)\nDiscarding device blocks:     4096/26214400\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b 1052672/26214400\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b15208448/26214400\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b20975616/26214400\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b22548480/26214400\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b23597056/26214400\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b24645632/26214400\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b                 \b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\bdone                            \nCreating filesystem with 26214400 4k blocks and 6553600 inodes\nFilesystem UUID: 28739ebf-963e-4fcd-8ff6-21a2f9233624\nSuperblock backups stored on blocks: \n\t32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, \n\t4096000, 7962624, 11239424, 20480000, 23887872\n\nAllocating group tables:   0/800\b\b\b\b\b\b\b       \b\b\b\b\b\b\bdone                            \nWriting inode tables:   0/800\b\b\b\b\b\b\b       \b\b\b\b\b\b\bdone                            \nCreating journal (131072 blocks): done\nWriting superblocks and filesystem accounting information:   0/800\b\b\b\b\b\b\b       \b\b\b\b\b\b\bdone\n"
time="2020-07-09T01:25:49Z" level=info msg="Device formatted." device=/dev/dm-0 fsType=ext4

I checked source code.

https://github.com/NetApp/trident/blob/v20.04.0/utils/osutils.go#L1968-L1984

I think trident will add error case(exit code 2) patterns.
Also fsType := "" may be dangerous.
Because, when an error occurs, the default valus "" returns from getFSType().
Therefore, trident call to format commands.

https://github.com/NetApp/trident/blob/v20.04.0/utils/osutils.go#L199-L202

In this case, the data in PV(Volume) is deleted.
We would like to avoid to delete data in volume.

(Another Issue may be better,) it's better that the debug log messages, which are the error of blkid and the retring format, change from Debug to Info level.

@ysakashita ysakashita added the bug label Jul 9, 2020
@gnarl
Copy link
Contributor

gnarl commented Jul 9, 2020

Hi @ysakashita,

Can you open a NetApp support case and provide the Trident logs? We need the logs since there is not a way to reproduce the issue.

@ysakashita
Copy link
Author

@gnarl
We already send all logs to NetApp(Japan).
Could you contact to them?

@ysakashita
Copy link
Author

We tried to deep dive the issue.

I show logs below.

  • Trident log
time="2020-07-09T01:25:39Z" level=debug msg="<<<< osutils.findMultipathDeviceForDevice" device=sdb
time="2020-07-09T01:25:39Z" level=debug msg="Multipath device not yet present, waiting." increment=1.093384791s
time="2020-07-09T01:25:40Z" level=debug msg=">>>> osutils.findMultipathDeviceForDevice" device=sda
time="2020-07-09T01:25:40Z" level=debug msg="<<<< osutils.findMultipathDeviceForDevice" device=sda
time="2020-07-09T01:25:40Z" level=debug msg="Multipath device found." multipathDevice=dm-0
time="2020-07-09T01:25:40Z" level=debug msg="<<<< osutils.waitForMultipathDeviceForDevices" devices="[sda sdb]"
time="2020-07-09T01:25:40Z" level=debug msg="<<<< osutils.waitForMultipathDeviceForLUN" iSCSINodeName="iqn.1992-08.com.netapp:sn.4c78b5baa19b11eab5f0d039ea431d1b:vs.4" lunID=25
time="2020-07-09T01:25:40Z" level=debug msg=">>>> osutils.getDeviceInfoForLUN" iSCSINodeName="iqn.1992-08.com.netapp:sn.4c78b5baa19b11eab5f0d039ea431d1b:vs.4" lunID=25
time="2020-07-09T01:25:40Z" level=debug msg=">>>> osutils.GetISCSIHostSessionMapForTarget" iSCSINodeName="iqn.1992-08.com.netapp:sn.4c78b5baa19b11eab5f0d039ea431d1b:vs.4"
time="2020-07-09T01:25:40Z" level=debug msg="Found iSCSI host/session." hostNumber=2 sessionNumber=1
time="2020-07-09T01:25:40Z" level=debug msg="Found iSCSI host/session." hostNumber=3 sessionNumber=2
time="2020-07-09T01:25:40Z" level=debug msg="<<<< osutils.GetISCSIHostSessionMapForTarget" iSCSINodeName="iqn.1992-08.com.netapp:sn.4c78b5baa19b11eab5f0d039ea431d1b:vs.4"
time="2020-07-09T01:25:40Z" level=debug msg=">>>> osutils.findMultipathDeviceForDevice" device=sda
time="2020-07-09T01:25:40Z" level=debug msg="<<<< osutils.findMultipathDeviceForDevice" device=sda
time="2020-07-09T01:25:40Z" level=debug msg=">>>> osutils.getFSType" device=/dev/dm-0
time="2020-07-09T01:25:40Z" level=debug msg=">>>> osutils.execCommandWithTimeout." args="[/dev/dm-0]" command=blkid timeoutSeconds=5s
time="2020-07-09T01:25:40Z" level=debug msg="<<<< osutils.execCommandWithTimeout." command=blkid error="exit status 2" output=
time="2020-07-09T01:25:40Z" level=debug msg="Could not get FSType for device." device=/dev/dm-0
time="2020-07-09T01:25:40Z" level=debug msg="<<<< osutils.getFSType"
time="2020-07-09T01:25:40Z" level=debug msg="Found SCSI device." LUN=25 deviceNames="[sda sdb]" fsType= hostSessionMap="map[2:1 3:2]" multipathDevice=dm-0
time="2020-07-09T01:25:40Z" level=debug msg="<<<< osutils.getDeviceInfoForLUN" iSCSINodeName="iqn.1992-08.com.netapp:sn.4c78b5baa19b11eab5f0d039ea431d1b:vs.4" lunID=25
time="2020-07-09T01:25:40Z" level=debug msg="Found device." devices="[sda sdb]" fsType= iqn="iqn.1992-08.com.netapp:sn.4c78b5baa19b11eab5f0d039ea431d1b:vs.4" multipathDevice=dm-0 scsiLun=25
time="2020-07-09T01:25:40Z" level=debug msg=">>>> osutils.waitForDevice" device=/dev/dm-0
time="2020-07-09T01:25:40Z" level=debug msg="Device found." device=/dev/dm-0
time="2020-07-09T01:25:40Z" level=debug msg="<<<< osutils.waitForDevice" device=/dev/dm-0
time="2020-07-09T01:25:40Z" level=debug msg="Formatting LUN." fstype=ext4 volume=pssk_04_anemos_k8s_bp_pssk01_caas_ssk_zcp_yahoo_co_jp_pvc_097fbe03_2149_4c00_8b9c_75fc43bbeaed
time="2020-07-09T01:25:40Z" level=debug msg=">>>> osutils.formatVolume" device=/dev/dm-0 fsType=ext4
time="2020-07-09T01:25:40Z" level=debug msg=">>>> osutils.execCommand." args="[-F /dev/dm-0]" command=mkfs.ext4
time="2020-07-09T01:25:40Z" level=debug msg="<<<< osutils.execCommand." command=mkfs.ext4 error="exit status 1" output="mke2fs 1.44.1 (24-Mar-2018)\nmkfs.ext4: Device size reported to be zero.  Invalid partition specified, or\n\tpartition table wasn't reread after running fdisk, due to\n\ta modified partition being busy and in use.  You may need to reboot\n\tto re-read your partition table.\n"
time="2020-07-09T01:25:40Z" level=debug msg="Format failed, retrying." increment=1.837401191s
time="2020-07-09T01:25:42Z" level=debug msg=">>>> osutils.execCommand." args="[-F /dev/dm-0]" command=mkfs.ext4
time="2020-07-09T01:25:49Z" level=debug msg="<<<< osutils.execCommand." command=mkfs.ext4 error="<nil>" output="mke2fs 1.44.1 (24-Mar-2018)\nDiscarding device blocks:     4096/26214400\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b 1052672/26214400\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b15208448/26214400\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b20975616/26214400\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b22548480/26214400\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b23597056/26214400\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b24645632/26214400\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b                 \b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\bdone                            \nCreating filesystem with 26214400 4k blocks and 6553600 inodes\nFilesystem UUID: 28739ebf-963e-4fcd-8ff6-21a2f9233624\nSuperblock backups stored on blocks: \n\t32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, \n\t4096000, 7962624, 11239424, 20480000, 23887872\n\nAllocating group tables:   0/800\b\b\b\b\b\b\b       \b\b\b\b\b\b\bdone                            \nWriting inode tables:   0/800\b\b\b\b\b\b\b       \b\b\b\b\b\b\bdone                            \nCreating journal (131072 blocks): done\nWriting superblocks and filesystem accounting information:   0/800\b\b\b\b\b\b\b       \b\b\b\b\b\b\bdone\n"
time="2020-07-09T01:25:49Z" level=info msg="Device formatted." device=/dev/dm-0 fsType=ext4
time="2020-07-09T01:25:49Z" level=debug msg="<<<< osutils.formatVolume" device=/dev/dm-0 fsType=ext4
time="2020-07-09T01:25:49Z" level=debug msg="<<<< osutils.AttachISCSIVolume"
time="2020-07-09T01:25:49Z" level=debug msg="<<<< NodeStageVolume" Method=NodeStageVolume Type=CSI_Node
time="2020-07-09T01:25:49Z" level=debug msg="Released shared lock (NodeStageVolume-pvc-097fbe03-2149-4c00-8b9c-75fc43bbeaed)." lock=csi_node_server
time="2020-07-09T01:25:49Z" level=debug msg="GRPC response: "

In this case, the cause why format of volume is exit status 2 from blkid command.

time="2020-07-09T01:25:40Z" level=debug msg="<<<< osutils.execCommandWithTimeout." command=blkid error="exit status 2" output=
time="2020-07-09T01:25:40Z" level=debug msg="Could not get FSType for device." device=/dev/dm-0

As you may know, blkid command get device information from /proc/partitions.
Then, it get fstype from super block in device(e.g. dm-0).
If the MultiPathd device is in an incomplete state, the blkid return exit status 2.

Therefore, We checked journal log.

  • journal log (journalctl -u multipathd)
-- Logs begin at Thu 2020-07-09 01:12:16 UTC, end at Thu 2020-07-09 05:14:55 UTC. --
Jul 09 01:12:16 ubuntu-commit-e69aa7a systemd[1]: Starting Device-Mapper Multipath Device Controller...
Jul 09 01:12:16 ubuntu-commit-e69aa7a multipathd[655]: --------start up--------
Jul 09 01:12:16 ubuntu-commit-e69aa7a multipathd[655]: read /etc/multipath.conf
Jul 09 01:12:16 ubuntu-commit-e69aa7a multipathd[655]: path checkers start up
Jul 09 01:12:16 ubuntu-commit-e69aa7a systemd[1]: Started Device-Mapper Multipath Device Controller.
Jul 09 01:25:38 anemos-pssk-04-worker-default-ef7c0d40-slq2h multipathd[655]: sda: add path (uevent)
Jul 09 01:25:39 anemos-pssk-04-worker-default-ef7c0d40-slq2h multipathd[655]: sdb: add path (uevent)
Jul 09 01:25:39 anemos-pssk-04-worker-default-ef7c0d40-slq2h multipathd[655]: sdb: duration = 0 (ms)
Jul 09 01:25:39 anemos-pssk-04-worker-default-ef7c0d40-slq2h multipathd[655]: sdb: persistent reservation in: requested 8200 bytes but got 8 bytes)
Jul 09 01:25:39 anemos-pssk-04-worker-default-ef7c0d40-slq2h multipathd[655]: sdb: No key found. Device may not be registered.
Jul 09 01:25:39 anemos-pssk-04-worker-default-ef7c0d40-slq2h multipathd[655]: mpatha: retry domap for addition of new path sdb
Jul 09 01:25:40 anemos-pssk-04-worker-default-ef7c0d40-slq2h multipathd[655]: mpatha: load table [0 209715200 multipath 3 pg_init_retries 50 queue_if_no_path 0 2 1 service-time 0 1 1 8:0 1 service-time 0 1 1 8:16 1
Jul 09 01:25:40 anemos-pssk-04-worker-default-ef7c0d40-slq2h multipathd[655]: ]
Jul 09 01:25:40 anemos-pssk-04-worker-default-ef7c0d40-slq2h multipathd[655]: mpatha: event checker started
Jul 09 01:25:40 anemos-pssk-04-worker-default-ef7c0d40-slq2h multipathd[655]: sdb [8:16]: path added to devmap mpatha

At just the same time that the error is occering in Trident(blkid command), a retry process is being executed in multipathd too.

Jul 09 01:25:39 anemos-pssk-04-worker-default-ef7c0d40-slq2h multipathd[655]: mpatha: retry domap for addition of new path sdb

The retry process is implemented as follows.

main.c(L1005-L1011) in multipathd

So, I see the comments below...

https://git.opensvc.com/gitweb.cgi?p=multipath-tools/.git;a=commit;h=1c50cd320724e60fde08d496ead8a71e19189fa1

It becomes retry domap() when flock of device cannot be acquired for some reason.
Therefore, I believe that the cause of the error in blkid command(in Trident) is the retry processing in multipathd.

IMO, as a fix plan for this problem, there is a correction for error(exit status 2) handling of the blkid command in Trident.

https://github.com/NetApp/trident/blob/v20.04.0/utils/osutils.go#L1968-L1984

I think it's so serious bug for storage.
Could you check&fix the issue in your team ASAP?

@ysakashita
Copy link
Author

ysakashita commented Jul 10, 2020

We are able to reproduce the issue.
# I think it's not the smart way....

  • Just one node in Kubernetes worker.
  • Try to cerate 10 PVs in the worker using StatefulSet
  • Create and delete many times until it happens

A point of the way is to concentrate the load on a multipathd and OS.
As result, it creates a situation that delays the release of flock in device.

(e.g)

$ kubectl get node
NAME                                             STATUS   ROLES    AGE    VERSION
demo-netapptest-ingress-default-70c71976-qkwzj   Ready    <none>   31h    v1.18.3
demo-netapptest-master-default-feee7e44-2std8    Ready    <none>   31h    v1.18.3
demo-netapptest-worker-default-cc7f3c51-fdw49    Ready    <none>   2d5h   v1.18.3

$ cat test.yaml 
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: test
spec:
  serviceName: test
  replicas: 10
  selector:
    matchLabels:
      app: test
  template:
    metadata:
      labels:
        app: test
    spec:
      containers:
      - name: test
        image: ubuntu:18.04
        volumeMounts:
        - name: block
          mountPath: /mnt/block
        command:
        - sleep
        - infinity
  volumeClaimTemplates:
  - metadata:
      name: block
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: ontap-block
      resources:
        requests:
          storage: 100G

$ kubectl apply -f test.yaml 
statefulset.apps/test created

# Check trident's log dualing creation of Pod/PVC/PV.

$ kubectl logs -f trident-csi-fbjxp trident-main
...
time="2020-07-09T09:58:40Z" level=debug msg="<<<< osutils.multipathdIsRunning"
time="2020-07-09T09:58:40Z" level=debug msg="Skipping multipath discovery, multipathd isn't running."
time="2020-07-09T09:58:40Z" level=debug msg="<<<< osutils.waitForMultipathDeviceForDevices" devices="[sda sdb]"
time="2020-07-09T09:58:40Z" level=debug msg="<<<< osutils.waitForMultipathDeviceForLUN" iSCSINodeName="iqn.1992-08.com.netapp:sn.914417f3498011ea832900a098ea6221:vs.8" lunID=0
time="2020-07-09T09:58:40Z" level=debug msg=">>>> osutils.getDeviceInfoForLUN" iSCSINodeName="iqn.1992-08.com.netapp:sn.914417f3498011ea832900a098ea6221:vs.8" lunID=0
time="2020-07-09T09:58:40Z" level=debug msg=">>>> osutils.GetISCSIHostSessionMapForTarget" iSCSINodeName="iqn.1992-08.com.netapp:sn.914417f3498011ea832900a098ea6221:vs.8"
time="2020-07-09T09:58:40Z" level=debug msg="Found iSCSI host/session." hostNumber=2 sessionNumber=1
time="2020-07-09T09:58:40Z" level=debug msg="Found iSCSI host/session." hostNumber=3 sessionNumber=2
time="2020-07-09T09:58:40Z" level=debug msg="<<<< osutils.GetISCSIHostSessionMapForTarget" iSCSINodeName="iqn.1992-08.com.netapp:sn.914417f3498011ea832900a098ea6221:vs.8"
time="2020-07-09T09:58:40Z" level=debug msg=">>>> osutils.findMultipathDeviceForDevice" device=sda
time="2020-07-09T09:58:40Z" level=debug msg="Could not find multipath device for device." device=sda
time="2020-07-09T09:58:40Z" level=debug msg="<<<< osutils.findMultipathDeviceForDevice" device=sda
time="2020-07-09T09:58:40Z" level=debug msg=">>>> osutils.findMultipathDeviceForDevice" device=sdb
time="2020-07-09T09:58:40Z" level=debug msg="Could not find multipath device for device." device=sdb
time="2020-07-09T09:58:40Z" level=debug msg="<<<< osutils.findMultipathDeviceForDevice" device=sdb
time="2020-07-09T09:58:40Z" level=debug msg=">>>> osutils.getFSType" device=/dev/sda
time="2020-07-09T09:58:40Z" level=debug msg=">>>> osutils.execCommandWithTimeout." args="[/dev/sda]" command=blkid timeoutSeconds=5s
time="2020-07-09T09:58:40Z" level=debug msg="<<<< osutils.execCommandWithTimeout." command=blkid error="exit status 2" output=
time="2020-07-09T09:58:40Z" level=debug msg="Could not get FSType for device." device=/dev/sda
time="2020-07-09T09:58:40Z" level=debug msg="<<<< osutils.getFSType"
time="2020-07-09T09:58:40Z" level=debug msg="Found SCSI device." LUN=0 deviceNames="[sda sdb]" fsType= hostSessionMap="map[2:1 3:2]" multipathDevice=
time="2020-07-09T09:58:40Z" level=debug msg="<<<< osutils.getDeviceInfoForLUN" iSCSINodeName="iqn.1992-08.com.netapp:sn.914417f3498011ea832900a098ea6221:vs.8" lunID=0
time="2020-07-09T09:58:40Z" level=debug msg="Found device." devices="[sda sdb]" fsType= iqn="iqn.1992-08.com.netapp:sn.914417f3498011ea832900a098ea6221:vs.8" multipathDevice= scsiLun=0
time="2020-07-09T09:58:40Z" level=debug msg=">>>> osutils.waitForDevice" device=/dev/sda
time="2020-07-09T09:58:40Z" level=debug msg="Device found." device=/dev/sda
time="2020-07-09T09:58:40Z" level=debug msg="<<<< osutils.waitForDevice" device=/dev/sda
time="2020-07-09T09:58:40Z" level=debug msg="Formatting LUN." fstype=ext4 volume=netapptest_pvc_a6a49a9b_0bb4_4e90_830d_d7cc673d86f9
time="2020-07-09T09:58:40Z" level=debug msg=">>>> osutils.formatVolume" device=/dev/sda fsType=ext4
time="2020-07-09T09:58:40Z" level=debug msg=">>>> osutils.execCommand." args="[-F /dev/sda]" command=mkfs.ext4
...

At the same time, the retry message appare in the journal log.

Jul 09 09:58:40 demo-netapptest-worker-default-cc7f3c51-fdw49 multipathd[655]: mpatha: retry domap for addition of new path sdb

@gnarl
Copy link
Contributor

gnarl commented Jul 10, 2020

@ysakashita, thank you for the update.

@rohit-arora-dev
Copy link
Contributor

rohit-arora-dev commented Jul 20, 2020

Hello @ysakashita commit 9bc9efb was pushed that addresses the issue.

@ysakashita
Copy link
Author

ysakashita commented Jul 20, 2020

@ntap-arorar Thank you for the commit. What version do you apply the bug fix? (20.04.01 or 20.07.0)

@ysakashita
Copy link
Author

@gnarl @ntap-arorar
Unfortunately, the commit(9bc9efb) is not enough for the issue.
When tested in our environment, the PV was re-formatted.


environment

  • Kubernetes v1.18.6
    • worker x 1
  • Ubuntu is K8S node was configured node.session.scan = manual of open-iscsi
ubuntu@demo-netapptest-worker-default-fffd01e0-x25gt:~$ sudo cat /etc/iscsi/iscsid.conf 
...
node.session.scan = manual

make build for trident

  1. Checkout latest netapp/trident changes using git clone https://github.com/NetApp/trident.git command.
  2. Go to trident directory and make sure user is in master branch and also make sure user can see the 4 commits mentioned below in the log using git log.
  3. User would then need to checkout 20.04.0 release in a separate branch using git checkout tags/v20.04.0 -b v20.04.0 command.
  4. Now after user is in 20.04.0 branch, user should cherry pick below commits using git command git cherry-pick.

Commit 1 [hash: 3478163, title: skip fstype if it is not used, date: May 13, 2020]
Commit 2 [hash: 4e18565, title: added logging statement around size, date: June 4, 2020]
Commit 3 [hash: 329044c, title: Set the session scan to manual automatically, date: June 9, 2020]
Commit 4 [hash: 9bc9efb, title: Add waitForDevice call before calling the blkid, date: July 16, 2020]

e.g. git cherry-pick 3478163

  1. After cherry-picking above commits, please ensure you see these commits in your branch logs using git log command.
  2. make build
  3. Setup NetApp/Trident v20.04.0-custom
  4. Check trident version
$ ./tridentctl version -n trident
+---------------------------------------------------------+---------------------------------------------------------+
|                     SERVER VERSION                      |                     CLIENT VERSION                      |
+---------------------------------------------------------+---------------------------------------------------------+
| 20.04.0-custom+c1dbff50c6daa5cbf7438447d9c188b50e49fb46 | 20.04.0-custom+c1dbff50c6daa5cbf7438447d9c188b50e49fb46 |
+---------------------------------------------------------+---------------------------------------------------------+

test

I try to repdoduce using #418 (comment) .
In addition to above way, I recreated the worker node 3 times without deleting PVC/PV.

As result,

$ kubectl logs trident-csi-v2dv7 trident-main  |grep blkid
time="2020-07-21T06:04:30Z" level=debug msg=">>>> osutils.execCommandWithTimeout." args="[/dev/dm-0]" command=blkid timeoutSeconds=5s
time="2020-07-21T06:04:30Z" level=debug msg="<<<< osutils.execCommandWithTimeout." command=blkid error="<nil>" output="/dev/dm-0: UUID=\"e2d1176f-b220-42f6-919a-ce6bcc5c6a0b\" TYPE=\"ext4\""
time="2020-07-21T06:05:48Z" level=debug msg=">>>> osutils.execCommandWithTimeout." args="[/dev/dm-1]" command=blkid timeoutSeconds=5s
time="2020-07-21T06:05:48Z" level=debug msg="<<<< osutils.execCommandWithTimeout." command=blkid error="<nil>" output="/dev/dm-1: UUID=\"9e557b81-886a-4780-aaf0-318d342e81f8\" TYPE=\"ext4\""
time="2020-07-21T06:06:55Z" level=debug msg=">>>> osutils.execCommandWithTimeout." args="[/dev/dm-2]" command=blkid timeoutSeconds=5s
time="2020-07-21T06:06:55Z" level=debug msg="<<<< osutils.execCommandWithTimeout." command=blkid error="<nil>" output="/dev/dm-2: UUID=\"ed707ccc-0608-4ceb-a84d-df44209d89de\" TYPE=\"ext4\""
time="2020-07-21T06:08:04Z" level=debug msg=">>>> osutils.execCommandWithTimeout." args="[/dev/dm-3]" command=blkid timeoutSeconds=5s
time="2020-07-21T06:08:04Z" level=debug msg="<<<< osutils.execCommandWithTimeout." command=blkid error="<nil>" output="/dev/dm-3: UUID=\"aeda2184-4d98-45b3-a380-ad587ef41999\" TYPE=\"ext4\""
time="2020-07-21T06:09:11Z" level=debug msg=">>>> osutils.execCommandWithTimeout." args="[/dev/dm-4]" command=blkid timeoutSeconds=5s
time="2020-07-21T06:09:11Z" level=debug msg="<<<< osutils.execCommandWithTimeout." command=blkid error="<nil>" output="/dev/dm-4: UUID=\"d15bd4d6-c2ba-46fd-9c51-c3986988b65e\" TYPE=\"ext4\""
time="2020-07-21T06:09:48Z" level=debug msg=">>>> osutils.execCommandWithTimeout." args="[/dev/dm-5]" command=blkid timeoutSeconds=5s
time="2020-07-21T06:09:48Z" level=debug msg="<<<< osutils.execCommandWithTimeout." command=blkid error="<nil>" output="/dev/dm-5: UUID=\"73ce4c88-e728-409d-a957-7440cc1a1496\" TYPE=\"ext4\""
time="2020-07-21T06:10:55Z" level=debug msg=">>>> osutils.execCommandWithTimeout." args="[/dev/dm-6]" command=blkid timeoutSeconds=5s
time="2020-07-21T06:10:55Z" level=debug msg="<<<< osutils.execCommandWithTimeout." command=blkid error="<nil>" output="/dev/dm-6: UUID=\"76432877-590e-44ff-94c4-665b0e603e9f\" TYPE=\"ext4\""
time="2020-07-21T06:12:03Z" level=debug msg=">>>> osutils.execCommandWithTimeout." args="[/dev/dm-7]" command=blkid timeoutSeconds=5s
time="2020-07-21T06:12:03Z" level=debug msg="<<<< osutils.execCommandWithTimeout." command=blkid error="<nil>" output="/dev/dm-7: UUID=\"50861d1d-7347-49e5-bcce-86ef92c836b3\" TYPE=\"ext4\""
time="2020-07-21T06:13:10Z" level=debug msg=">>>> osutils.execCommandWithTimeout." args="[/dev/dm-8]" command=blkid timeoutSeconds=5s
time="2020-07-21T06:13:10Z" level=debug msg="<<<< osutils.execCommandWithTimeout." command=blkid error="exit status 2" output=
time="2020-07-21T06:14:53Z" level=debug msg=">>>> osutils.execCommandWithTimeout." args="[/dev/dm-9]" command=blkid timeoutSeconds=5s
time="2020-07-21T06:14:53Z" level=debug msg="<<<< osutils.execCommandWithTimeout." command=blkid error="exit status 2" output=

PVs (Vol: /dev/dm-0 to /dev/dm-8) don't have any issue.
However, /dev/dm-8 and dm-9 were occered 'exit status 2' in blkid command.
Then the PVs were formated.

Below is all trident's log (loglevel: debug)

trident-csi.log

Also I show a part of the log.

time="2020-07-21T06:13:10Z" level=debug msg="<<<< osutils.GetISCSIHostSessionMapForTarget" iSCSINodeName="iqn.1992-08.com.netapp:sn.b9a2cb12b64811e8894e00a098ea6221:vs.5"
time="2020-07-21T06:13:10Z" level=debug msg=">>>> osutils.findMultipathDeviceForDevice" device=sdr
time="2020-07-21T06:13:10Z" level=debug msg="<<<< osutils.findMultipathDeviceForDevice" device=sdr
time="2020-07-21T06:13:10Z" level=debug msg=">>>> osutils.getFSType" device=/dev/dm-8
time="2020-07-21T06:13:10Z" level=debug msg=">>>> osutils.waitForDevice" device=/dev/dm-8
time="2020-07-21T06:13:10Z" level=debug msg="Device found." device=/dev/dm-8
time="2020-07-21T06:13:10Z" level=debug msg="<<<< osutils.waitForDevice" device=/dev/dm-8
time="2020-07-21T06:13:10Z" level=debug msg=">>>> osutils.execCommandWithTimeout." args="[/dev/dm-8]" command=blkid timeoutSeconds=5s
time="2020-07-21T06:13:10Z" level=debug msg="<<<< osutils.execCommandWithTimeout." command=blkid error="exit status 2" output=
time="2020-07-21T06:13:10Z" level=info msg="Could not get FSType for device; err: exit status 2" device=/dev/dm-8
time="2020-07-21T06:13:10Z" level=debug msg="<<<< osutils.getFSType"
time="2020-07-21T06:13:10Z" level=debug msg="Found SCSI device." LUN=8 deviceNames="[sdr sdq]" fsType= hostSessionMap="map[2:1 3:2]" multipathDevice=dm-8
time="2020-07-21T06:13:10Z" level=debug msg="<<<< osutils.getDeviceInfoForLUN" iSCSINodeName="iqn.1992-08.com.netapp:sn.b9a2cb12b64811e8894e00a098ea6221:vs.5" lunID=8 needFSType=true
time="2020-07-21T06:13:10Z" level=debug msg="Found device." devices="[sdr sdq]" fsType= iqn="iqn.1992-08.com.netapp:sn.b9a2cb12b64811e8894e00a098ea6221:vs.5" multipathDevice=dm-8 scsiLun=8
time="2020-07-21T06:13:10Z" level=debug msg=">>>> osutils.waitForDevice" device=/dev/dm-8
time="2020-07-21T06:13:10Z" level=debug msg="Device found." device=/dev/dm-8
time="2020-07-21T06:13:10Z" level=debug msg="<<<< osutils.waitForDevice" device=/dev/dm-8
time="2020-07-21T06:13:10Z" level=debug msg="Formatting LUN." fstype=ext4 volume=netapptest_pvc_e8395ed6_9f93_4b58_8bea_dd8afb8e56f2
time="2020-07-21T06:13:10Z" level=debug msg=">>>> osutils.formatVolume" device=/dev/dm-8 fsType=ext4
time="2020-07-21T06:13:10Z" level=debug msg=">>>> osutils.execCommand." args="[-F /dev/dm-8]" command=mkfs.ext4

I found osutils.waitForDevice" device=/dev/dm-8 message in the log.
So I believe that the trident already applied the commit(9bc9efb).

@gnarl
Copy link
Contributor

gnarl commented Jul 29, 2020

@ysakashita,

Kindly confirm that your issue is resolved with commit ba749c5. This commit will be included in the Trident 20.07 release.

@ysakashita
Copy link
Author

@gnarl
I appreciate that you fixed the bug so quickly.
We applied the following commits to the v20.04.0.
Then we tried to test it.
So far, we haven't seen re-format in our test environment.

Commit 1 [hash: 3478163, title: skip fstype if it is not used, date: May 13, 2020]
Commit 2 [hash: 4e18565, title: added logging statement around size, date: June 4, 2020]
Commit 3 [hash: 329044c, title: Set the session scan to manual automatically, date: June 9, 2020]
Commit 4 [hash: 9bc9efb, title: Add waitForDevice call before calling the blkid, date: July 16, 2020]
Commit 5 [hash: bc101ff, title: Safety check to ensure an unformatted device, date: July 24, 2020]
Commit 6 [hash: ba749c5, title: Ensure length of untrimmed "dd" bytes is valid, daye: July 28, 2020 ]

@gnarl gnarl closed this as completed Jul 30, 2020
netapp-ci pushed a commit that referenced this issue Oct 16, 2020
* adds period to allowed ontap storage prefix characters

* updates changelog

Fixes: #455
@ffilippopoulos
Copy link

This is still present for us on Trident 20.10.1 (running on kube 1.19.2, Flatcar Container Linux 2605.11.0, 5.4.87-flatcar)
In particular we can see that the FsType cannot be determined on some cases and trident still ends up reformatting our pvcs, which deletes all the data:

time="2021-01-26T11:21:40Z" level=debug msg="Found device." devices="[sdd sde]" fsType= iqn="iqn.1992-08.com.netapp:sn.0205ffce026911ebb4d9d039ea1a7953:vs.9" multipathDevice=dm-1 requestID=1ceb4302-9be7-4af2-a463-81200d165d4e requestSource=CSI scsiLun=65
time="2021-01-26T11:21:40Z" level=debug msg=">>>> osutils.waitForDevice" device=/dev/dm-1 requestID=1ceb4302-9be7-4af2-a463-81200d165d4e requestSource=CSI
time="2021-01-26T11:21:40Z" level=debug msg="Device found." device=/dev/dm-1 requestID=1ceb4302-9be7-4af2-a463-81200d165d4e requestSource=CSI
time="2021-01-26T11:21:40Z" level=debug msg="<<<< osutils.waitForDevice" device=/dev/dm-1 requestID=1ceb4302-9be7-4af2-a463-81200d165d4e requestSource=CSI
time="2021-01-26T11:21:40Z" level=debug msg="Formatting LUN." fstype=ext4 requestID=1ceb4302-9be7-4af2-a463-81200d165d4e requestSource=CSI volume=trident_pvc_db68511a_0d70_4845_a0c8_5221122674a9
time="2021-01-26T11:21:40Z" level=debug msg=">>>> osutils.formatVolume" device=/dev/dm-1 fsType=ext4 requestID=1ceb4302-9be7-4af2-a463-81200d165d4e requestSource=CSI
time="2021-01-26T11:21:40Z" level=debug msg=">>>> osutils.execCommand." args="[-F /dev/dm-1]" command=mkfs.ext4 requestID=1ceb4302-9be7-4af2-a463-81200d165d4e requestSource=CSI
time="2021-01-26T11:21:48Z" level=debug msg="<<<< osutils.execCommand." command=mkfs.ext4 error="<nil>" output="mke2fs 1.45.5 (07-Jan-2020)\nDiscarding device blocks:     4096/26214400\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b  528384/26214400\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b 1576960/26214400\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b 3149824/26214400\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b10489856/26214400\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b12062720/26214400\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b13635584/26214400\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b15208448/26214400\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b16781312/26214400\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b                 \b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\bdone                            \nCreating filesystem with 26214400 4k blocks and 6553600 inodes\nFilesystem UUID: 346e0291-5523-4a1e-8c85-9cd2611f3e01\nSuperblock backups stored on blocks: \n\t32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, \n\t4096000, 7962624, 11239424, 20480000, 23887872\n\nAllocating group tables:   0/800\b\b\b\b\b\b\b       \b\b\b\b\b\b\bdone                            \nWriting inode tables:   0/800\b\b\b\b\b\b\b       \b\b\b\b\b\b\bdone                            \nCreating journal (131072 blocks): done\nWriting superblocks and filesystem accounting information:   0/800\b\b\b\b\b\b\b       \b\b\b\b\b\b\bdone\n" requestID=1ceb4302-9be7-4af2-a463-81200d165d4e requestSource=CSI
time="2021-01-26T11:21:48Z" level=info msg="Device formatted." device=/dev/dm-1 fsType=ext4 requestID=1ceb4302-9be7-4af2-a463-81200d165d4e requestSource=CSI

Is multipathd playing any role here? This is affecting crucial deployments for us so we could consider giving up on multipaths if that is to blame here.
Could this issue be re-opened since it is still affecting us, or shall I create a new one? @gnarl

@ffilippopoulos
Copy link

ffilippopoulos commented Jan 26, 2021

Furthermore, I believe that what is happening on our case is that blikid is not returning with an error code, but the response doesn't have any info for the fstype:

time="2021-01-26T11:21:40Z" level=debug msg=">>>> osutils.execCommandWithTimeout." args="[/dev/dm-1]" command=blkid requestID=1ceb4302-9be7-4af2-a463-81200d165d4e requestSource=CSI timeoutSeconds=5s
time="2021-01-26T11:21:40Z" level=debug msg="<<<< osutils.execCommandWithTimeout." command=blkid error="<nil>" requestID=1ceb4302-9be7-4af2-a463-81200d165d4e requestSource=CSI
time="2021-01-26T11:21:40Z" level=debug msg="<<<< osutils.getFSType" requestID=1ceb4302-9be7-4af2-a463-81200d165d4e requestSource=CSI
time="2021-01-26T11:21:40Z" level=debug msg="Found SCSI device." LUN=65 deviceNames="[sdd sde]" fsType= hostSessionMap="map[6:3 7:4]" multipathDevice=dm-1 requestID=1ceb4302-9be7-4af2-a463-81200d165d4e re
questSource=CSI
time="2021-01-26T11:21:40Z" level=debug msg="<<<< osutils.getDeviceInfoForLUN" iSCSINodeName="iqn.1992-08.com.netapp:sn.0205ffce026911ebb4d9d039ea1a7953:vs.9" lunID=65 needFSType=true requestID=1ceb4302-9
be7-4af2-a463-81200d165d4e requestSource=CSI
time="2021-01-26T11:21:40Z" level=debug msg="Found device." devices="[sdd sde]" fsType= iqn="iqn.1992-08.com.netapp:sn.0205ffce026911ebb4d9d039ea1a7953:vs.9" multipathDevice=dm-1 requestID=1ceb4302-9be7-4
af2-a463-81200d165d4e requestSource=CSI scsiLun=65
time="2021-01-26T11:21:40Z" level=debug msg=">>>> osutils.waitForDevice" device=/dev/dm-1 requestID=1ceb4302-9be7-4af2-a463-81200d165d4e requestSource=CSI
time="2021-01-26T11:21:40Z" level=debug msg="Device found." device=/dev/dm-1 requestID=1ceb4302-9be7-4af2-a463-81200d165d4e requestSource=CSI
time="2021-01-26T11:21:40Z" level=debug msg="<<<< osutils.waitForDevice" device=/dev/dm-1 requestID=1ceb4302-9be7-4af2-a463-81200d165d4e requestSource=CSI
time="2021-01-26T11:21:40Z" level=debug msg="Formatting LUN." fstype=ext4 requestID=1ceb4302-9be7-4af2-a463-81200d165d4e requestSource=CSI volume=trident_pvc_db68511a_0d70_4845_a0c8_5221122674a9
time="2021-01-26T11:21:40Z" level=debug msg=">>>> osutils.formatVolume" device=/dev/dm-1 fsType=ext4 requestID=1ceb4302-9be7-4af2-a463-81200d165d4e requestSource=CSI

so it cannot be grabbed from the respective code:

if strings.Contains(string(out), "TYPE=") {
and trident decides to reformat the device.

We've seen cases where blkid can return empty, but not with an error code (based on issuing commands on our hosts). For example:

core@worker-9 ~ $ blkid /dev/sdb   
core@worker-9 ~ $ echo $?
0

and presumably it could trigger that behaviour?

@gnarl gnarl reopened this Jan 26, 2021
@bswartz
Copy link
Contributor

bswartz commented Jan 26, 2021

We could put the dd call before the blkid call in case this is just strange behavior from blkid. We could call blkid only after dd indicated the volume contained data. However the other possibility (suggested above) is that blkid is behaving correctly and, at least at the moment when it was called, it really did read all zeros. This suggests something went wrong with the multipathing setup, and we will need to figure out how to reproduce that issue.

@bswartz
Copy link
Contributor

bswartz commented Jan 26, 2021

We've seen cases where blkid can return empty, but not with an error code (based on issuing commands on our hosts). For example:

core@worker-9 ~ $ blkid /dev/sdb   
core@worker-9 ~ $ echo $?
0

and presumably it could trigger that behaviour?

This specific behavior happens when you run blkid without sudo. Non-root users aren't able to read block devices typically, so blkid can't work. Trident always runs the node-plugin as root so it would not face this particular issue.

@ffilippopoulos
Copy link

@bswartz yup, my bad on the blkid command example. Any hint on how to debug this further and see if multipathd is behind this (I guess disable and test again would be one way). Since this is blocking us atm and could use a workaround while we wait for a fix.

@ffilippopoulos
Copy link

Also, in general trident could probably have a much more defensive strategy and never decide on formatting a volume that contains data, since it relies on calling os commands that could fail for a number of reasons. @bswartz

@bswartz
Copy link
Contributor

bswartz commented Jan 26, 2021

@bswartz yup, my bad on the blkid command example. Any hint on how to debug this further and see if multipathd is behind this (I guess disable and test again would be one way). Since this is blocking us atm and could use a workaround while we wait for a fix.

I would want to see a more complete log snippit that covers the whole time from iscsi login to formatting. We are already quite defensive in the existing code and it's hard to imagine a sequence of actions that could cause us to believe that a volume which had data was in fact empty and in need of formatting. The original fix for this issue addresses all of the possibilities that we're aware of, so there must be some other possibility we've never thought of or seen before. Personally, I'm suspicious about the unusual Linux distro. I've never worked with "Flatcar Container Linux" and I'm wondering if that distro does anything strange which could account for this behavior. We've certainly never tested on that distro.

@bswartz
Copy link
Contributor

bswartz commented Jan 26, 2021

Also, in general trident could probably have a much more defensive strategy and never decide on formatting a volume that contains data, since it relies on calling os commands that could fail for a number of reasons. @bswartz

The challenge here is that, due to the CSI architecture, nodes can't tell the different between a newly created volume, which always needs to be formatted, and a existing (previously-formatted) volume that's getting re-used. Nodes are expected to figure out which situation they're in on their own, which is why we do the blkid check (many other CSI drivers work like this too).

Nodes are stateless and not in a position to remember anything about volumes. The only other approach is to use the REST backchannel to talk to the controller plugin and query the state there. We haven't wanted to do that in the past because it will create new reliability and scaling issues that will harm users running very large clusters. Once we understand the root cause of the problem in your situation, that other approach might be indicated, but I don't want to jump to that solution right away because of the downsides it carries. Let's try to understand the root cause of your specific problem.

@ffilippopoulos
Copy link

I would want to see a more complete log snippit that covers the whole time from iscsi login to formatting.

I dumped the logs in a gist: https://gist.github.com/ffilippopoulos/40747b8982d174b99b88099d9a6770f4

I've never worked with "Flatcar Container Linux" and I'm wondering if that distro does anything strange which could account for this behavior

That is basically a fork of coreos since they announced eol.

@ffilippopoulos
Copy link

Also, I do not know whether that helps or not, I've built am image off v20.10.1 with just the following additional log line:

git diff
diff --git a/utils/osutils.go b/utils/osutils.go
index 0b3c36a..5bba907 100644
--- a/utils/osutils.go
+++ b/utils/osutils.go
@@ -2371,6 +2371,7 @@ func getFSType(ctx context.Context, device string) (string, error) {
                return "", err
        }
 
+       Logc(ctx).WithField("device", device).Debugf("blkid command output: %s", string(out))
        var fsType string
 
        if strings.Contains(string(out), "TYPE=") {

and I was able to reproduce this quickly and see that the blkid command output appeards to be empty indeed:

time="2021-01-26T16:15:41Z" level=debug msg=">>>> osutils.execCommandWithTimeout." args="[/dev/dm-1]" command=blkid requestID=77e396ff-4532-41a3-ae4a-1a90f779d316 requestSource=CSI timeoutSeconds=5s
time="2021-01-26T16:15:41Z" level=debug msg="<<<< osutils.execCommandWithTimeout." command=blkid error="<nil>" requestID=77e396ff-4532-41a3-ae4a-1a90f779d316 requestSource=CSI
time="2021-01-26T16:15:41Z" level=debug msg="blkid command output: " device=/dev/dm-1 requestID=77e396ff-4532-41a3-ae4a-1a90f779d316 requestSource=CSI
time="2021-01-26T16:15:41Z" level=debug msg="<<<< osutils.getFSType" requestID=77e396ff-4532-41a3-ae4a-1a90f779d316 requestSource=CSI

@george-angel
Copy link

Hey @bswartz it is indeed a fork of CoreOS Container Linux: (https://coreos.com/) which was EOLed and the CoreOS company was bought by Red Hat. The team went on to work on "Fedora CoreOS" https://getfedora.org/coreos - but that project deviated in design from the original.

And so Flatcar Linux (https://www.flatcar-linux.org/) is the spiritual successor to the original CoreOS. The idea here being a very "minimal" distribution with a big differentiation is that it doesn't have a package manager unlike most Linux distributions.

We are not ruling out that OS itself can be at fault here. We will do our best to debug and provide all the information.

$ blkid --version
blkid from util-linux 2.33  (libblkid 2.33.0, 06-Nov-2018)

Hope some of that information helps.

@gnarl gnarl added the tracked label Jan 26, 2021
@ffilippopoulos
Copy link

We could put the dd call before the blkid call in case this is just strange behavior from blkid.
I tried something similar to that:

diff --git a/utils/osutils.go b/utils/osutils.go                                                                                                                                                     [0/608]
index 0b3c36a..4444048 100644
--- a/utils/osutils.go
+++ b/utils/osutils.go
@@ -2371,6 +2371,7 @@ func getFSType(ctx context.Context, device string) (string, error) {
                return "", err
        }

+       Logc(ctx).WithField("device", device).Debugf("blkid command output: %s", string(out))
        var fsType string

        if strings.Contains(string(out), "TYPE=") {
@@ -2382,6 +2383,13 @@ func getFSType(ctx context.Context, device string) (string, error) {
                        }
                }
        }
+       // Ensure device is unformatted in case fs type could not be determined via `blkid`
+       if fsType == "" {
+               if err = ensureDeviceUnformatted(ctx, device); err != nil {
+                       Logc(ctx).WithField("device", device).Errorf("found empty fsType but device may not be unformatted; err: %v", err)
+                       return "", err
+               }
+       }
        return fsType, nil
 }

and got some good first results. It looks like we still get a few blkid empty responses, but then the mount errors and is tried again until it succeeds:

time="2021-01-26T17:50:02Z" level=debug msg="blkid command output: " device=/dev/dm-1 requestID=e9ab4117-71e9-4064-a470-caf91ec3d883 requestSource=CSI
time="2021-01-26T17:50:02Z" level=debug msg=">>>> osutils.ensureDeviceUnformatted" device=/dev/dm-1 requestID=e9ab4117-71e9-4064-a470-caf91ec3d883 requestSource=CSI
time="2021-01-26T17:50:02Z" level=debug msg=">>>> osutils.execCommandWithTimeout." args="[if=/dev/dm-1 bs=4096 count=512 status=none]" command=dd requestID=e9ab4117-71e9-4064-a470-caf91ec3d883 requestSour
ce=CSI timeoutSeconds=5s
time="2021-01-26T17:50:02Z" level=debug msg="<<<< osutils.execCommandWithTimeout." command=dd error="<nil>" requestID=e9ab4117-71e9-4064-a470-caf91ec3d883 requestSource=CSI
time="2021-01-26T17:50:02Z" level=debug msg="Verified correct number of bytes read." device=/dev/dm-1 requestID=e9ab4117-71e9-4064-a470-caf91ec3d883 requestSource=CSI
time="2021-01-26T17:50:02Z" level=error msg="device contains non-zero values" device=/dev/dm-1 error="<nil>" requestID=e9ab4117-71e9-4064-a470-caf91ec3d883 requestSource=CSI
time="2021-01-26T17:50:02Z" level=debug msg="<<<< osutils.ensureDeviceUnformatted" requestID=e9ab4117-71e9-4064-a470-caf91ec3d883 requestSource=CSI
time="2021-01-26T17:50:02Z" level=error msg="found empty fsType but device may not be unformatted; err: device /dev/dm-1 is not unformatted" device=/dev/dm-1 requestID=e9ab4117-71e9-4064-a470-caf91ec3d883
 requestSource=CSI
time="2021-01-26T17:50:02Z" level=debug msg="<<<< osutils.getFSType" requestID=e9ab4117-71e9-4064-a470-caf91ec3d883 requestSource=CSI
time="2021-01-26T17:50:02Z" level=debug msg="<<<< osutils.getDeviceInfoForLUN" iSCSINodeName="iqn.1992-08.com.netapp:sn.0205ffce026911ebb4d9d039ea1a7953:vs.9" lunID=67 needFSType=true requestID=e9ab4117-7
1e9-4064-a470-caf91ec3d883 requestSource=CSI
time="2021-01-26T17:50:02Z" level=debug msg="<<<< osutils.AttachISCSIVolume" requestID=e9ab4117-71e9-4064-a470-caf91ec3d883 requestSource=CSI

and from the pod perspective it looks that we haven't lost any data (obviously this is not thoroughly checked but it looks promising).

Hope some of that helps :)) We can also try soaking changes on our environment and raise a PR if you guys want.

@rohit-arora-dev
Copy link
Contributor

Root of the issue does appear to be with the blkid command based on the output that you have posted, it is missing the TYPE field.

blkid command output: " device=/dev/dm-1 requestID=77e396ff-4532-41a3-ae4a-1a90f779d316 requestSource=CSI

Regardless, this should be properly handled.

@bswartz
Copy link
Contributor

bswartz commented Jan 26, 2021

Based on the fact that the root cause is the different output format for blkid on this particular Linux variant, I think the correct fix is to cope with strange blkid variants better and not to add a heavyweight backchannel mechanism to statefully track volume formatting.

Edit: Sorry, there was a misunderstanding. I thought that we confirmed that blkid merely had a different output format, but it turns out that the output is empty, and the version of blkid is the regular util-linux version which should have an output format we can parse. I would still like to figure out whether the problem is caused by a permission issue, or a device-not-found problem, or something else entirely. It's easy to fix the accidental-reformat problem that can occur in this case, but I'm worried that the fix could break existing things that currently work if blkid is behaving strangely and we don't know why.

@bswartz
Copy link
Contributor

bswartz commented Jan 28, 2021

We are going to address the blkid/format issue separately from the multipath issue. The fix for the inappropriate format when blikd misbehaves will be to add more checks and verification so we never rely on blkid alone to determine that a volume is not formatted. This will prevent corruption in the case when a volume that already contains data is attached to a pod and blkid gives us bad information.

While it seems clear that something about your multipath config is the underlying cause for the blkid problem, we're tracking that as a separate problem. I expect that the fix for this bug, combined with a solution to your multipath problems, will give you a completely working setup. Until the multipath problem is solved, at least the fix for this bug will avoid any possibility of data corruption.

@bswartz
Copy link
Contributor

bswartz commented Jan 29, 2021

Closing per this commit:

787aa54

@bswartz bswartz closed this as completed Jan 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants