-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make Linux hibernation (suspend-to-disk) more robust #12842
Comments
Root on ZFS guides for Arch Linux and NixOS, hosted at openzfs-docs, already feature instructions for suspending to LUKS2-encrypted disk partitions. Maybe worth reviewing those and fix any existing deficiencies. Hibernation also has other requirements besides file system support. I have had all sorts of problems with Ryzen APU graphics and VirtIO graphics (black screen, frozen screen on resume, etc.) |
#10924: Hibernation also requires ARC to be emptied |
I'm probably one of those NixOS LUKS users. I don't currently remember much of the details, but IIRC, my only major ZFS corruption to date has come from (orthogonally from the above) something like the following sequence:
I'm not familiar with ZFS's internal mechanisms, but I believe this resulted in some high level metadata corruption and most of the data remaining otherwise intact but hard to recover. Improved robustness would be greatly welcome. 🎉 |
I think until this works, the manual should be updated to explicitly say that hibernation on ZFS/zvol does not work. Right now it doesn't; in fact it even mentions that you can put swap on zvol and only mentions "may lead to deadlock" when in practice people get corruption which is much worse than deadlock! |
Sorry, but I don't agree that hibernation doesn't work. All corruption issues I followed so far (including #14118, see #14118 (comment)) were caused by the boot process importing a hibernated pool before resuming from hibernation. Importing an already imported pool of course has the potential of corrupting it. Swap must be located on a separate, non-ZFS partition for hibernation to work, but I think that's expected. I'm using hibernation on a daily basis for years without any issue so far. What could be done is documenting the caveats and ideally adding code to detect suspended pools and refusing to import them. |
I think you are both stating the same thing, which is that hibernation should NOT be used with swap on a ZFS volume, and @nh2 states that THAT (NOT having swap on ZFS volume when using hibernation) should be documented. |
@AttilaFueloep there might be something to it, as I've had my ZFS pool holding the OS corrupted twice by accidentally hibernating and I'm running NixOS. As it turns out NixOS, for some unknown to me reason sets
Recently they added an Which is a very backwards way of solving the issue based on what you said, as what we should be doing is disable |
@Greek64 Yes, that's true. I was trying to summarize what was already proposed here (no swap on zvol and adding a mechanism to detect and refuse importing hibernated pools). Sorry for not being clear. @jakubgs Yes, force importing the root pool isn't a good idea, I can't comment on the "backwards compatibility purposes," though. What I can't follow is the need to import the root pool during resume from hibernation. I'm using Arch and initcpio doesn't import pools during resume from hibernation, it simply skips the ZFS part of the boot process. I think the proper fix would be to refactor the NixOS boot process to do the same. Being not familiar with NixOS I can't tell if this viable though. Thinking more of it, NixOS could detect if swap is on a ZVOL and disallow hibernation if so. If not, there would be no need to import pools during resume and the above would apply. |
@AttilaFueloep yes, you are correct. An actual fix is that the boot process itself would be adjusted to not import the pool at boot when resuming, but that would require a bunch of research and testing. For now I think I just want to add an |
Questions:
|
Do you mean as of right now? But the purpose of this issue is exactly that, to add a prevention mechanism on "suspended" pools inside zfs itself.
Going by the previous answer, until the proposal of this issue is implemented it is up to the OS/User to avoid and prevent importing "suspended" pools. |
Is it allowed/safe to import a "suspended" pool as Some projects like ZFSBootMenu need to import the zfs pools on boot in order to get the initramfs file (the boot partition is part of the root zfs pool), but do so importing the pool as If the above proposed changes are implemented as is, ZFSBootMenu would seize to work with hibernation, as it would need to use the |
Related to: openzfs/zfs#12842 NixOS/nixpkgs#171680 NixOS/nixpkgs#203524 Signed-off-by: Jakub Sokołowski <jakub@status.im>
According to a ZFS issue about hibernation causing data corruption: openzfs/zfs#12842 The way this happens is if the system force imports a pool that was suspended during hibernation. I've had this happen twice on NixOS and I'd like to avoid having this happen again, to me or others. To do this I've added an assertion that makes sure you can't have `forceImportRoot` or `forceImportAll` enabled with `allowHibernation`. Signed-off-by: Jakub Sokołowski <jakub@status.im>
I do not even use ZFS for the OS / SWAP or Hybination file but face ZFS data corruption every time I attempt suspend and for some reason I can never get ZFS to force unmount either so I can not export / import the pool when I want to put the PC to sleep |
This comment was marked as resolved.
This comment was marked as resolved.
They are saying that they do use ZFS on non-OS/non-OS swap partitions, and that ZFS corrupts even in this case. |
Squashed: 6c29422 Pre-squash commits included grahamperrin@d8a86c5 (stronger wording). |
@nh2 thanks for clarification. I have hidden my previous comment (resolved). |
( This is a follow-up of #260 (comment) and subsequent comments. )
Background
When resuming a hibernated system, either the kernel or initrd loads the hibernation image back into RAM.
After restoring the pre-hibernation in-core from that image, the kernel resumes operation by unfreezing kthreads and user processes.
It is not safe to use a local zpool for swap space, and hence also not for hibernation. The reason is that there's an inherent chicken-and-egg problem between freezing kernel threads (and hence the ZIO pipeline) before creating the hibernation image, and then writing the hibernation image to stable storage.
I believe it is safe to hibernate a system with imported zpools, if the swapfile/hibernation image is stored on a block device that is safe to use by the kernel's hibernation procedure. For example, a raw block device, or a LUKS volume.
However, there are several problems to such setups:
This issue proposes to address the latter category by making ZFS more robust.
Bugs in initrd scripts should not be able to cause full pool corruption as easily as they can today.
Let me quote @danielmorlock's and my findings on this issue. It was on Gentoo, but I wouldn't want to rule out that the problem is present in other distro's initrd scripts as well.
Design Proposal
(Copied from #260 (comment) )
Import of a pool that is part of hibernated system should fail. Even
import -f
should fail.And the failure should be unambiguously pointing the user to the problem, and explain how to resolve the situation.
Proposal for the hibernation workflow:
freeze_fs
andthaw_fs
are not useful for this.org.openzfs:hibernation_cookie=$cookie_value
.quiescing -> syncing
zio_suspend
/zio_resume
machinery for this.Proposal for the resume workflow:
spa_t
:org.openzfs:hibernation_cookie
from disk.spa_t
with the value loaded from disk.quiescing -> syncing
transitions again.To prevent accidental imports, we extend
zpool import
/spa_import
such that they will fail by default if a hibernation cookie is present in the on-disk MOS.This behavior can be overridden by a new flag
zpool import --discard-hibernation-state-and-fail-resume
.The text was updated successfully, but these errors were encountered: