RFC: better import integration with systemd, hotplug et al. #4178

intelfx · 2016-01-08T09:45:49Z

First, this is not an issue report per se, but rather an "RFC issue".

Currently, zfs pool importing and mounting completely bypasses the udev and hotplug machinery. That is, even the systemd pool import unit file begins with:

Requires=systemd-udev-settle.service
After=systemd-udev-settle.service

It does not hook up itself to local-fs.target, nothing. Moreover, zfs has its own dataset mounting machinery which, again, bypasses the systemd state machine. This may sound beneficial to most due to their familiarity, but (from top of my head — the list is not too impressive, but it isn't exhaustive):

this does not allow to import pools/mount datasets after relevant devices are hot-plugged;
this does not allow to write unit files depending on zfs datasets (well, it does, but systemd won't wait for the devices to become available and will fail right away if datasets have not been mounted beforehand);
this means all of the usual problems with udevadm settle (races with disk controller init).

One of the ways to solve this, for ZFS on Linux specifically, would be to integrate nicely into the existing event/dependency infrastructure (the other way would be to devise a separate hotplug handling subsystem which does not use udevadm settle, which is likely not going to happen). In particular:

import pools as they become plugged-in (probably subject to some filtering), by triggering import from udev rules;
From the kernel driver, create "fake" block devices for each imported dataset, which then can be "mounted" from fstab (and waited for by systemd).

Or, other way around:

from udev rules, set ENV{SYSTEMD_READY}=0 on all block devices which appear to participate in zfs pools;
import pools as they become plugged-in (probably subject to some filtering), by triggering import from udev rules;
once a pool has been imported in any way, mark participating block devices as ENV{SYSTEMD_READY}=1 on all block devices of this pool;
allow mounting a zfs dataset by any of its devices and dataset name in mount options (btrfs-like).

Is anybody interested in this? Maybe there are better ways? Maybe this is already done, just not reflected in mans and anywhere?

The text was updated successfully, but these errors were encountered:

behlendorf · 2016-01-22T20:46:39Z

@Lalufu @ryao @don-brady @ColinIanKing @adilger I'm definitely supportive of better integration with systemd. What we have now is a nice first step but it can certainly be refined. We need to have a discussion with the systemd experts, zfs experts and interested parties from the distributions and agree on a solid design for how it should be on Linux. And while I think we could use some ideas from Illumos, like the cache file, I don't think we should be constrained by them. We want it to cleanly integrate with Linux and that means taking advantage of udev, libblkid, systemd, etc.

Let's try and hash out a rough design for how this could work. I'm not a systemd expert but I can definitely shed some light on what functionality exists in ZFS on Linux today which could be useful. Let me start with @intelfx comments.

import pools as they become plugged-in (probably subject to some filtering), by triggering import from udev rules;

I like this idea and I don't see why it wouldn't be workable. The hardest part would be the filtering but we already have most of the pieces. Udev will identify individual vdev members of a pool and update blkid accordingly, and the zpool import command has code to integrate with libblkid. What we need is some mechanism and policy for recognizing when enough members of a pool are available such that it can, and should, be imported. On Illumos the cache file serves this purpose which makes sense given how their boot process works.

From the kernel driver, create "fake" block devices for each imported dataset, which then can be "mounted" from fstab (and waited for by systemd).

While it's historically not the ZFS way you can add filesystems to the /etc/fstab. Just use as the identifier the pool/dataset name and the mount.zfs helper will be able to mount the filesystem as long as the pool is imported. There's no need to create "fake" block devices, or if you prefer to think about it that way we're already doing it. Just make sure you pass the zfsutil mount option.

 mount -t zfs -o zfsutil pool/dataset /mnt

allow mounting a zfs dataset by any of its devices and dataset name in mount options (btrfs-like).

This partially exists today. You can pass in any vdev member block device and it will use that to mount the pools root dataset. It would be fairly straight forward to extend that to optionally take a dataset name as well. It's just a matter of syntax and if btrfs has already established a precedent for this so much the better.

Is anybody interested in this?

From what I can tell many people are interested in having this work smoothly. It would be great if we could get a few people to volunteer to work out a good design and implement it.

intelfx · 2016-01-22T21:43:06Z

There's no need to create "fake" block devices, or if you prefer to think about it that way we're already doing it.

Sure, I understand that one can write pool/dataset in fstab and it will work from zfs' side. But from systemd side, it needs a valid device name so that it can listen to udev events for that device and wait for the "ready" state (which should be set by udev rules in form of SYSTEMD_READY variable).

Here is how it looks for btrfs: https://github.com/systemd/systemd/blob/master/rules/64-btrfs.rules

...and for LVM2: https://git.fedorahosted.org/cgit/lvm2.git/tree/udev/69-dm-lvm-metad.rules.in

behlendorf · 2016-01-22T22:35:46Z

Interesting, I wasn't aware of that's how things worked now with btrfs. Once a pool is imported creating this fake block devices would be straight forward, this infrastructure largely already exists for zvols. Is there any documentation for how these devices should be named or what ioctl they have to support?

intelfx · 2016-01-23T12:34:08Z

AFAIK the devices themselves do not need to provide anything. They just need to exist in the sense of the kernel sending out netlink events for them. That is, really a "stub device" which is no more than a "label" for kernel events.

The problem with this approach is naming: zfs allows to nest datasets, and if we will try to map dataset names to fake block device names naïvely (i. e. dataset pool/A/B/C -> device /dev/zfs-datasets/pool/A/B/C), then we won't be able to mount parent datasets as they would be represented by directories, not block devices. A possible solution is escaping, but that feels like a big kludge.

spacelama · 2016-07-04T02:14:13Z

It appears this systemd integration has killed the sysv startup? Up until a few weeks ago, I had been using the zfs package provided by zfsonlinux itself: zfs_0.6.5.4-1_amd64.deb, which had some handy /etc/init.d/zfs-{mount,import,share,zed} scripts. These have been removed now that we're installing from debian zfs: zfsutils-linux, which only provides /lib/systemd/system/zfs*, and with a reboot this morning, I discover my pools are no longer being imported and mounted.

All other debian packages function fine regardless of whether systemd or sysv are the boot system, via some standard fallbacks. You guys got any idea of what is required to make this work here too?

rlaager · 2016-07-04T02:21:37Z

@spacelama If you want sysv init support, then you need sysv init scripts. If you want Debian to ship them, then my advice would be to grab the old ZFS sysv init scripts, integrate them into the Debian packaging, and submit a packaging patch to Debian.

intelfx · 2016-07-04T13:25:27Z

@spacelama: there's no "this systemd integration". There was no work done in relation to this report since I'm not using ZFS anymore, and nobody probably cares much.

mailinglists35 · 2016-12-31T01:51:43Z

Hi,
Is there any workaround to automatically import USB pool at boot, other than touching rc.local?
I'm running git master with installation into /usr/local (make install in spl/zfs dirs) on ubuntu 16.04.

Looking at dmesg, the zfs* units and kernel modules are loaded about 10 seconds before usb drives spin up, and at that time there is nothing to import.

behlendorf · 2017-01-01T18:19:34Z

There has been some progress on this front and a proposed prototype for comment, review and testing in #4943. We'd love to get some feedback on it.

behlendorf added the Type: Feature Feature request or new feature label Jan 22, 2016

behlendorf mentioned this issue Mar 31, 2016

zfs-mount.service is called too late on Debian/Jessie with ZFS root #4474

Closed

pdf mentioned this issue Feb 8, 2017

wiki: Ubuntu 16.04 Root on ZFS /var/tmp and encrypted swap issues #5754

Closed

intelfx mentioned this issue Feb 8, 2017

[WIP] Prototype for systemd and fstab integration #4943

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: better import integration with systemd, hotplug et al. #4178

RFC: better import integration with systemd, hotplug et al. #4178

intelfx commented Jan 8, 2016

behlendorf commented Jan 22, 2016

intelfx commented Jan 22, 2016

behlendorf commented Jan 22, 2016

intelfx commented Jan 23, 2016

spacelama commented Jul 4, 2016

rlaager commented Jul 4, 2016

intelfx commented Jul 4, 2016

mailinglists35 commented Dec 31, 2016

behlendorf commented Jan 1, 2017

RFC: better import integration with systemd, hotplug et al. #4178

RFC: better import integration with systemd, hotplug et al. #4178

Comments

intelfx commented Jan 8, 2016

behlendorf commented Jan 22, 2016

intelfx commented Jan 22, 2016

behlendorf commented Jan 22, 2016

intelfx commented Jan 23, 2016

spacelama commented Jul 4, 2016

rlaager commented Jul 4, 2016

intelfx commented Jul 4, 2016

mailinglists35 commented Dec 31, 2016

behlendorf commented Jan 1, 2017