Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A wanring message 'Failed to expand filesystem' is logged during successful file system expansion. #560

Closed
yasuoohno opened this issue Mar 30, 2021 · 7 comments

Comments

@yasuoohno
Copy link

Describe the bug

A wanring message is logged during successful file system expansion with incorrect filesystem size information.

trident-csi-m2bmv trident-main time="2021-03-30T05:58:28Z" level=warning msg="Failed to expand filesystem; size=132224544768" requestID=2789d22d-e0d5-4fcd-a911-64e0407534bf requestSource=CSI

Environment

  • Trident version: 21.01.1
  • Trident installation flags used: -n trident
  • Container runtime: containerd
  • Kubernetes version: 1.20.5
  • Kubernetes orchestrator: kubeadm
  • Kubernetes enabled feature gates: default.
  • OS: Debian10
  • NetApp backend types: ontap-san-economy
  • Other:
    Our customer reporeted with
    • Trident version: 20.07.0
    • Kubernetes version: OpenShift 4.4.6 (v1.17.1+f63db30)
    • NetApp backend types: ontap-san-economy

To Reproduce

  • create a pod with mounted a pvc.
  • increase pvc size.
  • wait for the file sytem expantion to complete.

Expected behavior

After expanding the file system, check the file system size correctly and do not issue a warning message.

Additional context

The message is logged here.

Logc(ctx).Warnf("Failed to expand filesystem; size=%d", postExpandSize)

trident-csi-m2bmv trident-main time="2021-03-30T05:58:28Z" level=warning msg="Failed to expand filesystem; size=132224544768" requestID=2789d22d-e0d5-4fcd-a911-64e0407534bf requestSource=CSI

The reported size is for the root '/' filesystem.

Here are full debug logs.

trident-csi-m2bmv trident-main time="2021-03-30T05:58:28Z" level=debug msg="<<<< osutils.ISCSIRescanDevices" lunID=0 requestID=2789d22d-e0d5-4fcd-a911-64e0407534bf requestSource=CSI targetIQN="iqn.1992-08.com.netapp:sn.cc907975906211eba7e1d3d487c5a521:vs.2"
trident-csi-m2bmv trident-main time="2021-03-30T05:58:28Z" level=debug msg=">>>> osutils.ExpandISCSIFilesystem" devicePath=/dev/dm-0 filesystemType=ext4 mountOptions= requestID=2789d22d-e0d5-4fcd-a911-64e0407534bf requestSource=CSI stagedTargetPath=/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-eee6497c-9670-46eb-96ed-2061141e421a/globalmount
trident-csi-m2bmv trident-main time="2021-03-30T05:58:28Z" level=debug msg=">>>> osutils.mountAndExpandFilesystem" devicePath=/dev/dm-0 mountOptions= requestID=2789d22d-e0d5-4fcd-a911-64e0407534bf requestSource=CSI stagedTargetPath=/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-eee6497c-9670-46eb-96ed-2061141e421a/globalmount
trident-csi-m2bmv trident-main time="2021-03-30T05:58:28Z" level=debug msg=">>>> osutils.MountDevice" device=/dev/dm-0 mountpoint=/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-eee6497c-9670-46eb-96ed-2061141e421a/globalmount/tmp_mnt options= requestID=2789d22d-e0d5-4fcd-a911-64e0407534bf requestSource=CSI
trident-csi-m2bmv trident-main time="2021-03-30T05:58:28Z" level=debug msg=">>>> osutils.IsMounted" requestID=2789d22d-e0d5-4fcd-a911-64e0407534bf requestSource=CSI source=/dev/dm-0 target=/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-eee6497c-9670-46eb-96ed-2061141e421a/globalmount/tmp_mnt
trident-csi-m2bmv trident-main time="2021-03-30T05:58:28Z" level=debug msg="Source device: /dev/dm-0, Target: /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-eee6497c-9670-46eb-96ed-2061141e421a/globalmount/tmp_mnt, is mounted: false" requestID=2789d22d-e0d5-4fcd-a911-64e0407534bf requestSource=CSI
trident-csi-m2bmv trident-main time="2021-03-30T05:58:28Z" level=debug msg="<<<< osutils.IsMounted" requestID=2789d22d-e0d5-4fcd-a911-64e0407534bf requestSource=CSI source=/dev/dm-0 target=/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-eee6497c-9670-46eb-96ed-2061141e421a/globalmount/tmp_mnt
trident-csi-m2bmv trident-main time="2021-03-30T05:58:28Z" level=debug msg="Already mounted: false, mountpoint exists: false" requestID=2789d22d-e0d5-4fcd-a911-64e0407534bf requestSource=CSI
trident-csi-m2bmv trident-main time="2021-03-30T05:58:28Z" level=debug msg=">>>> osutils.execCommand." args="[/dev/dm-0 /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-eee6497c-9670-46eb-96ed-2061141e421a/globalmount/tmp_mnt]" command=mount requestID=2789d22d-e0d5-4fcd-a911-64e0407534bf requestSource=CSI
trident-csi-m2bmv trident-main time="2021-03-30T05:58:28Z" level=debug msg="<<<< osutils.execCommand." command=mount error="<nil>" output= requestID=2789d22d-e0d5-4fcd-a911-64e0407534bf requestSource=CSI
trident-csi-m2bmv trident-main time="2021-03-30T05:58:28Z" level=debug msg="<<<< osutils.MountDevice" requestID=2789d22d-e0d5-4fcd-a911-64e0407534bf requestSource=CSI
trident-csi-m2bmv trident-main time="2021-03-30T05:58:28Z" level=debug msg="<<<< osutils.mountAndExpandFilesystem" devicePath=/dev/dm-0 mountOptions= requestID=2789d22d-e0d5-4fcd-a911-64e0407534bf requestSource=CSI stagedTargetPath=/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-eee6497c-9670-46eb-96ed-2061141e421a/globalmount
trident-csi-m2bmv trident-main time="2021-03-30T05:58:28Z" level=debug msg=">>>> osutils.expandFilesystem" cmd=resize2fs cmdArguments=/dev/dm-0 requestID=2789d22d-e0d5-4fcd-a911-64e0407534bf requestSource=CSI tmpMountPoint=/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-eee6497c-9670-46eb-96ed-2061141e421a/globalmount/tmp_mnt
trident-csi-m2bmv trident-main time="2021-03-30T05:58:28Z" level=debug msg=">>>> osutils_linux.getFilesystemSize" requestID=2789d22d-e0d5-4fcd-a911-64e0407534bf requestSource=CSI
trident-csi-m2bmv trident-main time="2021-03-30T05:58:28Z" level=debug msg=">>>> osutils_linux.GetFilesystemStats" requestID=2789d22d-e0d5-4fcd-a911-64e0407534bf requestSource=CSI
trident-csi-m2bmv trident-main time="2021-03-30T05:58:28Z" level=debug msg="Filesystem size information" avail=29437619 blocks=32281383 bsize=4096 free=31088179 path=/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-eee6497c-9670-46eb-96ed-2061141e421a/globalmount/tmp_mnt requestID=2789d22d-e0d5-4fcd-a911-64e0407534bf requestSource=CSI size=132224544768
trident-csi-m2bmv trident-main time="2021-03-30T05:58:28Z" level=debug msg="<<<< osutils_linux.GetFilesystemStats" requestID=2789d22d-e0d5-4fcd-a911-64e0407534bf requestSource=CSI
trident-csi-m2bmv trident-main time="2021-03-30T05:58:28Z" level=debug msg="<<<< osutils_linux.getFilesystemSize" requestID=2789d22d-e0d5-4fcd-a911-64e0407534bf requestSource=CSI
trident-csi-m2bmv trident-main time="2021-03-30T05:58:28Z" level=debug msg=">>>> osutils.execCommand." args="[/dev/dm-0]" command=resize2fs requestID=2789d22d-e0d5-4fcd-a911-64e0407534bf requestSource=CSI
trident-csi-m2bmv trident-main time="2021-03-30T05:58:28Z" level=debug msg="<<<< osutils.execCommand." command=resize2fs error="<nil>" output="resize2fs 1.44.5 (15-Dec-2018)\nFilesystem at /dev/dm-0 is mounted on /var/lib/kubelet/pods/8b8de3d6-fd5e-4300-9f23-5d8379ffc3b3/volumes/kubernetes.io~csi/pvc-eee6497c-9670-46eb-96ed-2061141e421a/mount; on-line resizing required\nold_desc_blocks = 1, new_desc_blocks = 1\nThe filesystem on /dev/dm-0 is now 786432 (4k) blocks long.\n" requestID=2789d22d-e0d5-4fcd-a911-64e0407534bf requestSource=CSI
trident-csi-m2bmv trident-main time="2021-03-30T05:58:28Z" level=debug msg=">>>> osutils_linux.getFilesystemSize" requestID=2789d22d-e0d5-4fcd-a911-64e0407534bf requestSource=CSI
trident-csi-m2bmv trident-main time="2021-03-30T05:58:28Z" level=debug msg=">>>> osutils_linux.GetFilesystemStats" requestID=2789d22d-e0d5-4fcd-a911-64e0407534bf requestSource=CSI
trident-csi-m2bmv trident-main time="2021-03-30T05:58:28Z" level=debug msg="Filesystem size information" avail=29437618 blocks=32281383 bsize=4096 free=31088178 path=/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-eee6497c-9670-46eb-96ed-2061141e421a/globalmount/tmp_mnt requestID=2789d22d-e0d5-4fcd-a911-64e0407534bf requestSource=CSI size=132224544768
trident-csi-m2bmv trident-main time="2021-03-30T05:58:28Z" level=debug msg="<<<< osutils_linux.GetFilesystemStats" requestID=2789d22d-e0d5-4fcd-a911-64e0407534bf requestSource=CSI
trident-csi-m2bmv trident-main time="2021-03-30T05:58:28Z" level=debug msg="<<<< osutils_linux.getFilesystemSize" requestID=2789d22d-e0d5-4fcd-a911-64e0407534bf requestSource=CSI
trident-csi-m2bmv trident-main time="2021-03-30T05:58:28Z" level=warning msg="Failed to expand filesystem; size=132224544768" requestID=2789d22d-e0d5-4fcd-a911-64e0407534bf requestSource=CSI
trident-csi-m2bmv trident-main time="2021-03-30T05:58:28Z" level=debug msg="<<<< osutils.expandFilesystem" cmd=resize2fs cmdArguments=/dev/dm-0 requestID=2789d22d-e0d5-4fcd-a911-64e0407534bf requestSource=CSI tmpMountPoint=/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-eee6497c-9670-46eb-96ed-2061141e421a/globalmount/tmp_mnt
trident-csi-m2bmv trident-main time="2021-03-30T05:58:28Z" level=debug msg=">>>> osutils.removeMountPoint" requestID=2789d22d-e0d5-4fcd-a911-64e0407534bf requestSource=CSI
trident-csi-m2bmv trident-main time="2021-03-30T05:58:28Z" level=debug msg=">>>> osutils.Umount" mountpoint=/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-eee6497c-9670-46eb-96ed-2061141e421a/globalmount/tmp_mnt requestID=2789d22d-e0d5-4fcd-a911-64e0407534bf requestSource=CSI
trident-csi-m2bmv trident-main time="2021-03-30T05:58:28Z" level=debug msg=">>>> osutils.execCommandWithTimeout." args="[/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-eee6497c-9670-46eb-96ed-2061141e421a/globalmount/tmp_mnt]" command=umount requestID=2789d22d-e0d5-4fcd-a911-64e0407534bf requestSource=CSI timeoutSeconds=10s
trident-csi-m2bmv trident-main time="2021-03-30T05:58:28Z" level=debug msg="<<<< osutils.execCommandWithTimeout." command=umount error="<nil>" requestID=2789d22d-e0d5-4fcd-a911-64e0407534bf requestSource=CSI
trident-csi-m2bmv trident-main time="2021-03-30T05:58:28Z" level=debug msg="<<<< osutils.Umount" requestID=2789d22d-e0d5-4fcd-a911-64e0407534bf requestSource=CSI
trident-csi-m2bmv trident-main time="2021-03-30T05:58:28Z" level=debug msg="<<<< osutils.removeMountPoint" requestID=2789d22d-e0d5-4fcd-a911-64e0407534bf requestSource=CSI
trident-csi-m2bmv trident-main time="2021-03-30T05:58:28Z" level=debug msg="<<<< osutils.ExpandISCSIFilesystem" devicePath=/dev/dm-0 filesystemType=ext4 mountOptions= requestID=2789d22d-e0d5-4fcd-a911-64e0407534bf requestSource=CSI stagedTargetPath=/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-eee6497c-9670-46eb-96ed-2061141e421a/globalmount
trident-csi-m2bmv trident-main time="2021-03-30T05:58:28Z" level=debug msg="Filesystem size after expand." filesystemSize=132224544768 limitBytes=0 requestID=2789d22d-e0d5-4fcd-a911-64e0407534bf requestSource=CSI requiredBytes=3221225472
trident-csi-m2bmv trident-main time="2021-03-30T05:58:28Z" level=debug msg="Filesystem expansion completed." requestID=2789d22d-e0d5-4fcd-a911-64e0407534bf requestSource=CSI volumeId=pvc-eee6497c-9670-46eb-96ed-2061141e421a volumePath="/var/lib/kubelet/pods/8b8de3d6-fd5e-4300-9f23-5d8379ffc3b3/volumes/kubernetes.io~csi/pvc-eee6497c-9670-46eb-96ed-2061141e421a/mount"
@yasuoohno yasuoohno added the bug label Mar 30, 2021
@gnarl gnarl added the tracked label Apr 5, 2021
@ffilippopoulos
Copy link

Was the volume expansion successful? We ran into that issue today as well with trident 21.04.0 while resizing a volume and seems like the issue was because the volume was corrupted during the resizing.
@yasuoohno Do you have any manual steps to work around this issue?

@ffilippopoulos
Copy link

Also, I found this https://kb.netapp.com/Advice_and_Troubleshooting/Cloud_Services/Trident_Openshift/Resize_of_PVC_leaves_it_in_a_faulted_state and I am wondering if this the same issue expressed via trident logs?

@yasuoohno
Copy link
Author

Hi @ffilippopoulos,
Yes, they were successful with both openshift 4.4.6 and kubernetes 1.20.5.

Was the message "There are errors in the filesystem, so online resizing is not allowed" logged on your case?
If so, it is a different case than I was reporting. I think.

In my case, this line records that online expansion was completed successfully.

trident-csi-m2bmv trident-main time="2021-03-30T05:58:28Z" level=debug msg="<<<< osutils.execCommand." command=resize2fs error="" output="resize2fs 1.44.5 (15-Dec-2018)\nFilesystem at /dev/dm-0 is mounted on /var/lib/kubelet/pods/8b8de3d6-fd5e-4300-9f23-5d8379ffc3b3/volumes/kubernetes.io~csi/pvc-eee6497c-9670-46eb-96ed-2061141e421a/mount; on-line resizing required\nold_desc_blocks = 1, new_desc_blocks = 1\nThe filesystem on /dev/dm-0 is now 786432 (4k) blocks long.\n" requestID=2789d22d-e0d5-4fcd-a911-64e0407534bf requestSource=CSI

IMHO, Trident just called os utilitty 'resize2fs' for already mounted file system.
If it is corrupted, there also may be a problem on the OS side.

@george-angel
Copy link

because the volume was corrupted during the resizing

I don't think we know this.

What we saw was a container failing to start, the logs on the driver:

time="2021-05-10T15:20:57Z" level=debug msg="<<<< osutils.execCommand." command=resize2fs error="exit status 1" output="resize2fs 1.45.5 (07-Jan-2020)\nresize2fs: Permission denied to resize filesystem\nF
ilesystem at /dev/dm-6 is mounted on /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-535bc123-9570-48b7-966d-992dc94c016d/globalmount/tmp_mnt; on-line resizing required\nold_desc_blocks = 1, new_desc_bl
ocks = 3" requestID=09ed29f6-ff75-4227-99da-3e598182515b requestSource=CSI                                                                                                                                  
time="2021-05-10T15:20:57Z" level=error msg="Expanding filesystem failed; exit status 1" requestID=09ed29f6-ff75-4227-99da-3e598182515b requestSource=CSI

After we performed manual actions the filesystem ended up in a corrupt state.

However we have also successfully recovered a volume from the above state, driver logs:

time="2021-05-11T10:59:30Z" level=debug msg="<<<< osutils.execCommand." command=resize2fs error="exit status 1" output="resize2fs 1.45.5 (07-Jan-2020)\nresize2fs: Permission denied to resize filesystem\nF
ilesystem at /dev/dm-5 is mounted on /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-3d788681-b597-4642-bddf-90d1674b9e6f/globalmount/tmp_mnt; on-line resizing required\nold_desc_blocks = 1, new_desc_bl
ocks = 1" requestID=9ae22755-9f05-40fc-8927-eec5fc3a5f4e requestSource=CSI

tmp_mnt did not exist:

core@worker-9 ~ $ sudo ls -l /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-3d788681-b597-4642-bddf-90d1674b9e6f/globalmount/
total 4
-rw-------. 1 root root 311 May 11 11:01 volumePublishInfo.json

We suspect Trident unmounts the volume after a failed resize.

We then ran e2fsck:

$ sudo e2fsck -y /dev/mapper/mpathl
e2fsck 1.45.5 (07-Jan-2020)
/dev/mapper/mpathl contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/mapper/mpathl: 1373/131072 files (1.8% non-contiguous), 493893/524288 blocks

$ sudo e2fsck -y /dev/mapper/mpathl
e2fsck 1.45.5 (07-Jan-2020)
/dev/mapper/mpathl: clean, 1373/131072 files, 493893/524288 blocks

And we able to successfully mount the volume after.

@yasuoohno
Copy link
Author

I will tell you first, this is my private opinion because I'm not a developer nor a engineer of trident.

tmp_mnt did not exist:
We suspect Trident unmounts the volume after a failed resize.

https://github.com/NetApp/trident/blob/master/utils/osutils.go#L1280-L1285

Once temporary mount(of tmp_dir) is succeeded, trident calls removeMountPoint as a defer func(L1285) regardless of success/failure resizing.
When calling the mount/umount of tmp_dir, the PV is already using by a Pod with another mount point in my case,
So I think the resizing operation is safe and the corruption is from another reason.
I recommend you to open a support case.

@gnarl
Copy link
Contributor

gnarl commented May 27, 2021

We investigated resize2fs and it will fail with "Permission denied to resize filesystem" when the filesystem is in an inconsistent state during online resize. It is very likely that the filesystem was in an inconsistent state before the resize operation occurred.

ntap-troy pushed a commit to ntap-troy/trident that referenced this issue Aug 25, 2021
Fixes issue NetApp#507

* Makes DOCKER_PLUGIN_MODE settable to allow users to upgrade from <=20.07
* Allows users to set config with absolute path
* Ensures utilities are in PATH
@clintonk
Copy link
Contributor

clintonk commented Oct 22, 2021

This was fixed in commit 16e84e6 and will be included in the Trident 21.10.0 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants