Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: better import integration with systemd, hotplug et al. #4178

Open
intelfx opened this issue Jan 8, 2016 · 9 comments
Open

RFC: better import integration with systemd, hotplug et al. #4178

intelfx opened this issue Jan 8, 2016 · 9 comments
Labels
good first issue Indicates a good issue for first-time contributors Status: Inactive Not being actively updated Type: Feature Feature request or new feature

Comments

@intelfx
Copy link

intelfx commented Jan 8, 2016

First, this is not an issue report per se, but rather an "RFC issue".

Currently, zfs pool importing and mounting completely bypasses the udev and hotplug machinery. That is, even the systemd pool import unit file begins with:

Requires=systemd-udev-settle.service
After=systemd-udev-settle.service

It does not hook up itself to local-fs.target, nothing. Moreover, zfs has its own dataset mounting machinery which, again, bypasses the systemd state machine. This may sound beneficial to most due to their familiarity, but (from top of my head — the list is not too impressive, but it isn't exhaustive):

  • this does not allow to import pools/mount datasets after relevant devices are hot-plugged;
  • this does not allow to write unit files depending on zfs datasets (well, it does, but systemd won't wait for the devices to become available and will fail right away if datasets have not been mounted beforehand);
  • this means all of the usual problems with udevadm settle (races with disk controller init).

One of the ways to solve this, for ZFS on Linux specifically, would be to integrate nicely into the existing event/dependency infrastructure (the other way would be to devise a separate hotplug handling subsystem which does not use udevadm settle, which is likely not going to happen). In particular:

  1. import pools as they become plugged-in (probably subject to some filtering), by triggering import from udev rules;
  2. From the kernel driver, create "fake" block devices for each imported dataset, which then can be "mounted" from fstab (and waited for by systemd).

Or, other way around:

  1. from udev rules, set ENV{SYSTEMD_READY}=0 on all block devices which appear to participate in zfs pools;
  2. import pools as they become plugged-in (probably subject to some filtering), by triggering import from udev rules;
  3. once a pool has been imported in any way, mark participating block devices as ENV{SYSTEMD_READY}=1 on all block devices of this pool;
  4. allow mounting a zfs dataset by any of its devices and dataset name in mount options (btrfs-like).

Is anybody interested in this? Maybe there are better ways? Maybe this is already done, just not reflected in mans and anywhere?

@behlendorf behlendorf added the Type: Feature Feature request or new feature label Jan 22, 2016
@behlendorf
Copy link
Contributor

@Lalufu @ryao @don-brady @ColinIanKing @adilger I'm definitely supportive of better integration with systemd. What we have now is a nice first step but it can certainly be refined. We need to have a discussion with the systemd experts, zfs experts and interested parties from the distributions and agree on a solid design for how it should be on Linux. And while I think we could use some ideas from Illumos, like the cache file, I don't think we should be constrained by them. We want it to cleanly integrate with Linux and that means taking advantage of udev, libblkid, systemd, etc.

Let's try and hash out a rough design for how this could work. I'm not a systemd expert but I can definitely shed some light on what functionality exists in ZFS on Linux today which could be useful. Let me start with @intelfx comments.

import pools as they become plugged-in (probably subject to some filtering), by triggering import from udev rules;

I like this idea and I don't see why it wouldn't be workable. The hardest part would be the filtering but we already have most of the pieces. Udev will identify individual vdev members of a pool and update blkid accordingly, and the zpool import command has code to integrate with libblkid. What we need is some mechanism and policy for recognizing when enough members of a pool are available such that it can, and should, be imported. On Illumos the cache file serves this purpose which makes sense given how their boot process works.

From the kernel driver, create "fake" block devices for each imported dataset, which then can be "mounted" from fstab (and waited for by systemd).

While it's historically not the ZFS way you can add filesystems to the /etc/fstab. Just use as the identifier the pool/dataset name and the mount.zfs helper will be able to mount the filesystem as long as the pool is imported. There's no need to create "fake" block devices, or if you prefer to think about it that way we're already doing it. Just make sure you pass the zfsutil mount option.

 mount -t zfs -o zfsutil pool/dataset /mnt

allow mounting a zfs dataset by any of its devices and dataset name in mount options (btrfs-like).

This partially exists today. You can pass in any vdev member block device and it will use that to mount the pools root dataset. It would be fairly straight forward to extend that to optionally take a dataset name as well. It's just a matter of syntax and if btrfs has already established a precedent for this so much the better.

Is anybody interested in this?

From what I can tell many people are interested in having this work smoothly. It would be great if we could get a few people to volunteer to work out a good design and implement it.

@intelfx
Copy link
Author

intelfx commented Jan 22, 2016

There's no need to create "fake" block devices, or if you prefer to think about it that way we're already doing it.

Sure, I understand that one can write pool/dataset in fstab and it will work from zfs' side. But from systemd side, it needs a valid device name so that it can listen to udev events for that device and wait for the "ready" state (which should be set by udev rules in form of SYSTEMD_READY variable).

Here is how it looks for btrfs: https://github.com/systemd/systemd/blob/master/rules/64-btrfs.rules

...and for LVM2: https://git.fedorahosted.org/cgit/lvm2.git/tree/udev/69-dm-lvm-metad.rules.in

@behlendorf
Copy link
Contributor

Interesting, I wasn't aware of that's how things worked now with btrfs. Once a pool is imported creating this fake block devices would be straight forward, this infrastructure largely already exists for zvols. Is there any documentation for how these devices should be named or what ioctl they have to support?

@intelfx
Copy link
Author

intelfx commented Jan 23, 2016

AFAIK the devices themselves do not need to provide anything. They just need to exist in the sense of the kernel sending out netlink events for them. That is, really a "stub device" which is no more than a "label" for kernel events.

The problem with this approach is naming: zfs allows to nest datasets, and if we will try to map dataset names to fake block device names naïvely (i. e. dataset pool/A/B/C -> device /dev/zfs-datasets/pool/A/B/C), then we won't be able to mount parent datasets as they would be represented by directories, not block devices. A possible solution is escaping, but that feels like a big kludge.

@spacelama
Copy link

It appears this systemd integration has killed the sysv startup? Up until a few weeks ago, I had been using the zfs package provided by zfsonlinux itself: zfs_0.6.5.4-1_amd64.deb, which had some handy /etc/init.d/zfs-{mount,import,share,zed} scripts. These have been removed now that we're installing from debian zfs: zfsutils-linux, which only provides /lib/systemd/system/zfs*, and with a reboot this morning, I discover my pools are no longer being imported and mounted.

All other debian packages function fine regardless of whether systemd or sysv are the boot system, via some standard fallbacks. You guys got any idea of what is required to make this work here too?

@rlaager
Copy link
Member

rlaager commented Jul 4, 2016

@spacelama If you want sysv init support, then you need sysv init scripts. If you want Debian to ship them, then my advice would be to grab the old ZFS sysv init scripts, integrate them into the Debian packaging, and submit a packaging patch to Debian.

@intelfx
Copy link
Author

intelfx commented Jul 4, 2016

@spacelama: there's no "this systemd integration". There was no work done in relation to this report since I'm not using ZFS anymore, and nobody probably cares much.

@mailinglists35
Copy link

Hi,
Is there any workaround to automatically import USB pool at boot, other than touching rc.local?
I'm running git master with installation into /usr/local (make install in spl/zfs dirs) on ubuntu 16.04.

Looking at dmesg, the zfs* units and kernel modules are loaded about 10 seconds before usb drives spin up, and at that time there is nothing to import.

@behlendorf
Copy link
Contributor

There has been some progress on this front and a proposed prototype for comment, review and testing in #4943. We'd love to get some feedback on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Indicates a good issue for first-time contributors Status: Inactive Not being actively updated Type: Feature Feature request or new feature
Projects
None yet
Development

No branches or pull requests

6 participants
@rlaager @behlendorf @intelfx @spacelama @mailinglists35 and others