Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partition confusion when block device is identified as filesystem by libblkid #4

Closed
dankamongmen opened this issue Jul 20, 2019 · 18 comments
Assignees
Labels

Comments

@dankamongmen
Copy link
Owner

I added two 1TB Western Digital Black NVMe M.2 drives to my machine, one requiring an Asus Hyper PCIe card. I carved out a part3 on each:

nvme1n1:

Command (? for help): p
Disk /dev/nvme1n1: 1953525168 sectors, 931.5 GiB
Model: WDS100T3X0C-00SJG0                      
Sector size (logical/physical): 512/512 bytes
Disk identifier (GUID): BF1C425C-02B7-4C9D-9E37-402BE6F84798
Partition table holds up to 128 entries
Main partition table begins at sector 2 and ends at sector 33
First usable sector is 34, last usable sector is 1953525134
Partitions will be aligned on 2048-sector boundaries
Total free space is 209717214 sectors (100.0 GiB)

Number  Start (sector)    End (sector)  Size       Code  Name
   3       209717248      1953525134   831.5 GiB   BF01  Solaris /usr & Mac ZFS

Command (? for help): 

nvme2n1:

Command (? for help): p
Disk /dev/nvme2n1: 1953525168 sectors, 931.5 GiB
Model: WDS100T3X0C-00SJG0                      
Sector size (logical/physical): 512/512 bytes
Disk identifier (GUID): E5B3CDE0-CB86-4347-B302-4A229C8928CB
Partition table holds up to 128 entries
Main partition table begins at sector 2 and ends at sector 33
First usable sector is 34, last usable sector is 1953525134
Partitions will be aligned on 2048-sector boundaries
Total free space is 209717214 sectors (100.0 GiB)

Number  Start (sector)    End (sector)  Size       Code  Name
   3       209717248      1953525134   831.5 GiB   BF01  Solaris /usr & Mac ZFS

Command (? for help): 

and enrolled them into a mirrored zfs vdev, zhomez.

2019-07-19-200224_1122x2127_scrot

Here's zpool status zhomez:

  pool: zhomez
 state: ONLINE
  scan: resilvered 120G in 0 days 00:02:02 with 0 errors on Thu Jul 18 16:51:30 2019
config:

	NAME                                            STATE     READ WRITE CKSUM
	zhomez                                          ONLINE       0     0     0
	  mirror-0                                      ONLINE       0     0     0
	    nvme-WDS100T3X0C-00SJG0_1908E1801188-part3  ONLINE       0     0     0
	    nvme-WDS100T3X0C-00SJG0_1908E1805012-part3  ONLINE       0     0     0

errors: No known data errors

Upon loading growlight, both the inline display of both devices seems screwed up (it shows almost entirely empty space; the '3' only gets one character), and the details box also has some problems.

Not sure whether the former is due to discontiguous partitions or what, but it's gotta get fixed.

See what can be done about the detail view, as well.

@dankamongmen
Copy link
Owner Author

Whoa, for that matter, why are there partition details on the top of both nvme devices? It's like they're both actively selected, despite only one being active (and there only being one possible active device). This also happens if neither is active!

@dankamongmen
Copy link
Owner Author

This is all managed by print_blockbar() in ncurses.c. I inserted some diags. *** selstr is shown for an unpartitioned device. &&& selstr is shown for a partitioned device, with an arrow indicating direction of print. When I move around within one of these devices, I get the following:

│2019-08-04 05:06:00 *** selstr is md127                                     │││
│2019-08-04 05:06:01 &&& -> selstr is nvme1n1                                │┘│
│2019-08-04 05:06:01 &&& -> selstr is empty space                            │┐│
│2019-08-04 05:06:06 &&& -> selstr is nvme1n1                                │││
│2019-08-04 05:06:06 &&& <- selstr is nvme1n1p1                              │┘│
│2019-08-04 05:06:06 &&& -> selstr is nvme1n1                                │─╯
│2019-08-04 05:06:06 &&& <- selstr is nvme1n1p2                              │
│2019-08-04 05:06:07 &&& -> selstr is nvme1n1                                │─╮
│2019-08-04 05:06:07 &&& <- selstr is nvme1n1p3                              │┐│
│2019-08-04 05:06:07 &&& -> selstr is nvme1n1                                │││
│2019-08-04 05:06:07 &&& <- selstr is partition table metadata               │┘│
│2019-08-04 05:06:08 &&& -> selstr is empty space                            │─╯
│2019-08-04 05:06:09 &&& -> selstr is nvme1n1                                │
╰────────────────────────────────────────────────────────────────────────────╯
growlight 1.0.6.1 (6) &&& <- selstr is nvme1n1p1               

So it looks like we're always electing to print nvme1n1 going to the right, and then we print what we're actually on, which is usually going to the right due to the bloat on the empty space. We're not printing the 'nvme1n1' when we're not actively on the device, but I think each device becomes active as it's discovered? So that presumably gets done once (ctrl-'L' does not make them go away)...

@dankamongmen
Copy link
Owner Author

╭─press 'v' to dismiss details───────────────────────────────────────────────╮─╯
│Sandisk Corp WD Black 2018/PC SN720 NVMe SSD                                │
│Firmware: 2101 BIOS: American Megatrends Inc. Load: 0bps                    │─╮
│nvme1n1: WDS100T3X0C-00SJG0 (931.51GiB) S/N: 1908E1801188 WC- WRVx RO-      │┐│
│Sectors: 1953525168 (512B logical / 512B physical) NVMe connect             │││
│Partitioning: gpt I/O scheduler: [none] mq-deadline kyber                   │┘│
│16383.99TB 1953525169→2047 unpartitioned space                              │─╯
│                                                                            │
╰────────────────────────────────────────────────────────────────────────────╯

pretty certain that the 1953525169→2047 is what's causing our main display problem.

@dankamongmen
Copy link
Owner Author

growlight-readline gets this right:

[growlight](-1)> blockdev detail nvme1n1
nvme1n1    WDS100T3X0C-00SJ  n/a   1.00T  512B ✓.... gpt   1908E1801188     NVMe
Unused sectors 0:2047 (1023.46Ki)
nvme1n1p1  08cbe0c1-6771-4574-8ddb-913b24278f92   1.07G Lnx  Linux filesystem
nvme1n1p2  dbe112e1-aa09-4cb1-bf18-e3f01531b942 106.30G Oth  Linux RAID
nvme1n1p3  d1cc37e8-0879-4b51-92b5-845dd89b930f 892.82G Oth  Solaris /usr & Mac ZFS
Unused sectors 1953525135:1953525168 (16.46Ki)

@dankamongmen
Copy link
Owner Author

│2019-08-04 05:17:02 (null) WITH FSEC 1953525169 LSEC 2047                   │┘│
│2019-08-04 05:17:02 nvme0n1p1 WITH FSEC 2048 LSEC 2099199                   │┐│
│2019-08-04 05:17:02 nvme0n1p2 WITH FSEC 2099200 LSEC 209717247              │││
│2019-08-04 05:17:02 nvme0n1p3 WITH FSEC 209717248 LSEC 1953525134           │┘│
│2019-08-04 05:17:02 (null) WITH FSEC 1953525135 LSEC 1953525167      

@dankamongmen
Copy link
Owner Author

It's because these have d->mnttype set, to ZFS_MEMBER. This appears to be different between our SSDs and spinning disks:

[schwarzgerat](0) $ sudo blkid /dev/nvme1n1
/dev/nvme1n1: LABEL="zhomez" UUID="7730803059136165722" UUID_SUB="6260524301877159837" TYPE="zfs_member" PTUUID="e5b3cde0-cb86-4347-b302-4a229c8928cb" PTTYPE="gpt"
[schwarzgerat](0) $ sudo blkid /dev/sdf
/dev/sdf: PTUUID="5df293fc-a619-104c-b8ba-02d4cd945ebe" PTTYPE="gpt"
[schwarzgerat](0) $ sudo blkid /dev/nvme1n1p1
/dev/nvme1n1p1: PARTLABEL="Linux filesystem" PARTUUID="08cbe0c1-6771-4574-8ddb-913b24278f92"
[schwarzgerat](0) $ sudo blkid /dev/nvme1n1p2
/dev/nvme1n1p2: UUID="0d232f32-f8c8-9170-bc2a-ca93782d54af" UUID_SUB="2b553a31-e482-66e5-0d20-ecc9b4f57537" LABEL="schwarzgerat:root" TYPE="linux_raid_member" PARTLABEL="Linux RAID" PARTUUID="dbe112e1-aa09-4cb1-bf18-e3f01531b942"
[schwarzgerat](0) $ sudo blkid /dev/nvme1n1p3
/dev/nvme1n1p3: LABEL="zhomez" UUID="7730803059136165722" UUID_SUB="6260524301877159837" TYPE="zfs_member" PARTLABEL="Solaris /usr & Mac ZFS" PARTUUID="d1cc37e8-0879-4b51-92b5-845dd89b930f"
[schwarzgerat](0) $ 

For the record, /dev/nvme1n1 is not a "zfs_member" so far as I'm concerned, no more than /dev/sdf is.

@dankamongmen
Copy link
Owner Author

dankamongmen commented Aug 4, 2019

util-linux's code is probe_zfs() from superblocks/zfs.c. Essentially, this sees a ZFS filesystem on our block device in addition to the partition where it actually lives. Note that PTTYPE is properly detected as GPT.

Label SSD Rust SSD partition Rust partition
LABEL Y N Y Y
UUID Y N Y Y
TYPE Y N Y Y
PTUUID Y Y N N
PTTYPE Y Y N N

Note that being GPT doesn't preclude the device from being ZFS -- ZFS on Linux always creates a GPT table (see openzfs/zfs#6277, openzfs/zfs#94, and openzfs/zfs#1162), at least through 0.8.0.

Note that gparted also seems to think this is a ZFS filesystem, and that it has no partition table :(. See attached image. gdisk has no problem with it:

[schwarzgerat](0) $ sudo gdisk /dev/nvme1n1
GPT fdisk (gdisk) version 1.0.3

Partition table scan:
  MBR: protective
  BSD: not present
  APM: not present
  GPT: present

Found valid GPT with protective MBR; using GPT.

Command (? for help): p
Disk /dev/nvme1n1: 1953525168 sectors, 931.5 GiB
Model: WDS100T3X0C-00SJG0                      
Sector size (logical/physical): 512/512 bytes
Disk identifier (GUID): E5B3CDE0-CB86-4347-B302-4A229C8928CB
Partition table holds up to 128 entries
Main partition table begins at sector 2 and ends at sector 33
First usable sector is 34, last usable sector is 1953525134
Partitions will be aligned on 2048-sector boundaries
Total free space is 2014 sectors (1007.0 KiB)

Number  Start (sector)    End (sector)  Size       Code  Name
   1            2048         2099199   1024.0 MiB  8300  Linux filesystem
   2         2099200       209717247   99.0 GiB    FD00  Linux RAID
   3       209717248      1953525134   831.5 GiB   BF01  Solaris /usr & Mac ZFS

Command (? for help): 
![gparted_000](https://user-images.githubusercontent.com/143473/62422212-bcb9bd80-b67c-11e9-803e-8afa14438296.png)

@dankamongmen
Copy link
Owner Author

gparted_000

@dankamongmen
Copy link
Owner Author

lol look at dumbass gparted talking about the number of cylinders on my NVMe SSD 👍

@dankamongmen
Copy link
Owner Author

Ahhh look at what we have here. They concur that this is a bug in blkid. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=888114

I've gone ahead and mailed the utils-linux-ng mailing list, along with Karel Zak and Andreas Dilger.

@dankamongmen
Copy link
Owner Author

https://gitlab.gnome.org/GNOME/gparted/issues/14 gparted says use wipefs, but that seems less than optimal. libblkid ought be able to reason that this is a block device (more correctly, a namespace).

@dankamongmen dankamongmen changed the title New M.2 NVMe drives in mirrored ZFS result in weird display Partition confusion when block device is identified as filesystem by libblkid Aug 4, 2019
@dankamongmen
Copy link
Owner Author

https://marc.info/?l=util-linux-ng&m=156491424909123&w=2 is the mail I sent to util-linux, which hasn't seen any reply yet. Either I need to fix this in libblkid, or we need use some other method...

@dankamongmen
Copy link
Owner Author

Branch dankamongmen/libblkid-bad-rec will have the work for this bug.

Why does growlight-readline get it right?!?

@dankamongmen
Copy link
Owner Author

The reason why growlight-readline "gets it right" is because it doesn't display unpartitioned space in the partitions output.

@dankamongmen
Copy link
Owner Author

I think the right way to fix this is to read the actual partition table (rather than just walking through blkid), and use that to constrain any information gathered from filesystem inspection. Partitions don't announce how large they are, but filesystems typically do. The partition table, however, ought be the authoritative word IMHO -- we are after all a blockdev tool first, and a filesystem tool second.

If we then detect a filesystem which claims to be longer (or shorter) than its containing partition boundaries (which might be implicit, as in this case -- the space is technically not on a partition), we can bring that to the user's attention. But trust the partition table.

@dankamongmen
Copy link
Owner Author

I just committed a fix which makes the weird browsing line (⇗⇨⇨⇨empty space) that was lingering while unselected go away for this case. the empty space section is still way too big.

@dankamongmen
Copy link
Owner Author

│ │growlight-curses: src/ncurses.c:827: create_zobj: Assertion `lsector >= fsector' failed.─────────╯

@dankamongmen
Copy link
Owner Author

I believe this is now fixed in my branch. I'm not sure why the coloring is different between the two nvme devices, but that might not be a bug. Either way, much improved.

growlight-fixed-nvme

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant