-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
critical bug: zpool messes up partition table : zfs_member occupy the whole disk instead of being constrained to a partition #9105
Comments
There is a serious bug affecting zfs 0.8-1.1 (tested on latest manjaro running on linux kernel 4.19). This bug has been reported in different forums under different context. https://gitlab.gnome.org/GNOME/gparted/issues/14 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=888114 https://bbs.archlinux.org/viewtopic.php?id=206121 https://bbs.archlinux.org/viewtopic.php?id=202587 Any help on how to wipe clean the signature block information on /dev/nvme0m1 would be welcome. I have not tried yet zpool clearlabel as, as far I understand, it would wipe out the entire disk. |
The problem is that blkid, when looking at the whole disk, sees the ZFS uberblocks at the end of the nvme0m1p5 partition (which also is the end of the disk) and then thinks that the whole disk must be a zfs member. It's wrong with that. It's also a problem for The solution to this is a small, empty partition (some 10 MiB) at the end of the drive (´zpool create`, when given whole drives, does this by creating a small 'partition 9' at the end) so blkid and zpool import won't see the uberblocks at the end of the actual zfs partition when looking at the whole disk (as they'll instead see the empty space of partition 9). Do not operate zpool labelclear on the whole drive, it will not solve the problem (as the uberblocks will be rewritten, round-robin, on every txg) but has a fair chance to destroy your pool and even the whole partition table (including the backup at the end of the drive). Best option is to backup the contents of the pool, destroy it, reduce the size of the last partition by some 10-20 MiB, create a partition at the end that protects this free space (and dd if=/dev/zero that one, to get rid of the uberblocks in that area), then recreate the pool and restore the backup. |
Do you think that |
In case nvme0n1p6 is the new (protection) partition you just created the dd dosn't need the |
I also ran into this problem in my growlight project: dankamongmen/growlight#4 I filed a bug against upstream, but have heard nothing (filed 2019-08): https://www.spinics.net/lists/util-linux-ng/msg15811.html I detail how I worked around it here: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=888114 (last comment) and in the growlight issue linked above |
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions. |
Has this been fixed? |
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions. |
Stale bot should not close defects. |
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions. |
@behlendorf defect or not? |
Bumping this because it seems like a terrible data corruption bug that needs to be fixed. |
Upstream root cause and fix:
|
I see, so zfs never actually touched the partition table at all. Either way, glad it's fixed, and this issue should be closed. |
System information
Describe the problem you're observing
Critical bug: zpool has modified the signature table and occupy now the whole disk /dev/nvme0m1 instead of being constrained to one partition: /dev/nvme0m1p5
#lsblk -f
NAME FSTYPE LABEL UUID FSAVAIL FSUSE% MOUNTPOINT
nvme0m1 zfs_member tank 959334055019102200
├─nvme0m1p1 vfat ESP F8E8-2918 738,1M 5% /boot/efi
├─nvme0m1p2 vfat OS 5224-C2FA
├─nvme0m1p3 ext4 UBUNTU bed2f845-754b-477b-8bdb-3cba7d56fae3
├─nvme0m1p4 ext4 MANJARO 3134ceb0-795e-4f51-a6fb-ba172fac0312 75,5G 16% /
└─nvme0m1p5 zfs_member tank 9593340550191022900
#lsblk -a
nvme0m1 259:0 0 953,9G 0 disk
├─nvme0m1p1 259:1 0 780M 0 part /boot/efi
├─nvme0m1p2 259:2 0 5G 0 part
├─nvme0m1p3 259:3 0 97,7G 0 part
├─nvme0m1p4 259:4 0 97,7G 0 part /
└─nvme0n1p5 259:5 0 752,8G 0 part
#blkid /dev/nvme0m1
/dev/nvme0m1: LABEL="tank" UUID="9593340550191022900" UUID_SUB="541976190045946664" TYPE="zfs_member" PTUUID="e7762bd0-453e-4900-b428-26f1b11c22b5" PTTYPE="gpt"
Describe how to reproduce the problem
Followed instructions on https://wiki.archlinux.org/index.php/ZFS
zpool was created with id from ls -lh /dev/disk/by-id/
sudo zpool create -f -o ashift=13 -m /mnt/tank tank nvmePC401_NVMe_SK_hynix_1TB_MI93T003810403E62-part5
NOTE that zpool was mounted as default to occupy the whole partition (i.e. w/o redundancy or raid0)
Interestingly gparted (thus disk signature) showed correctly the partition table after installation. Everything got messed up after enabling zfs.target zfs-mount zfs-import.target zfs-import-cache
and reboot.
Include any warning/errors/backtraces from the system logs
This is a critical issue.
boot log now messed up
juil. 31 07:07:37 XPS13 systemd[1]: systemd-firstboot.service: Start request repeated too quickly.
juil. 31 07:07:37 XPS13 systemd[1]: Failed to start First Boot Wizard.
juil. 31 07:07:37 XPS13 systemd[1]: systemd-sysusers.service: Start request repeated too quickly.
juil. 31 07:07:37 XPS13 systemd[1]: Failed to start Create System Users.
juil. 31 07:07:37 XPS13 systemd[1]: systemd-fsck-root.service: Start request repeated too quickly.
juil. 31 07:07:37 XPS13 systemd[1]: Failed to start File System Check on Root Device.
juil. 31 07:07:37 XPS13 systemd[1]: systemd-binfmt.service: Start request repeated too quickly.
juil. 31 07:07:37 XPS13 systemd[1]: Failed to start Set Up Additional Binary Formats.
juil. 31 07:07:37 XPS13 systemd[1]: systemd-guest-user.service: Start request repeated too quickly.
juil. 31 07:07:37 XPS13 systemd[1]: systemd-guest-user.service: Failed with result 'start-limit-hit'.
juil. 31 07:07:37 XPS13 systemd[1]: Failed to start systemd-guest-user.service.
juil. 31 07:07:37 XPS13 systemd[1]: systemd-hwdb-update.service: Start request repeated too quickly.
juil. 31 07:07:37 XPS13 systemd[1]: Failed to start Rebuild Hardware Database.
juil. 31 07:07:37 XPS13 systemd[1]: sys-fs-fuse-connections.mount: Start request repeated too quickly.
juil. 31 07:07:37 XPS13 systemd[1]: Failed to mount FUSE Control File System.
juil. 31 07:07:37 XPS13 systemd-udevd[300]: Process '/usr/bin/alsactl restore 0' failed with exit code 99.
The text was updated successfully, but these errors were encountered: