-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kernel panic while browsing snapshots #3257
Comments
What distro are you using? Try to boot with the backup initrd (that ends with ´´´.old-dkms´´´ if found). Then make a snapshot and backup of the grub directory first! and try to update all the zfs and spl (but not the grub without a backup!). Be sure to the initramfs are created. |
distro: custom build CRUX 3.1 https://crux.nu P.S.
|
In what exact moment you experimented this Panic? Please be more specific with this issue, it not may be necessary related to zfs/spl. Try with kernel 3.18.0 and be sure to have all the modules up to date. See this thread, it may be helpful. |
Here to add nothing more than a "Me too" but i am seeing the same thing Fedora 21 I run no snapshots currently. If i take 1 snapshot on any volume, the kernel panics with this error at 5 min when it trys to expire the snapshot. also get the 'too many levels of symbolic links' when listing the .zfs/snapshots directory. cd'ing to the directory relieves this, but a pwd shows: (unknown)/SNAPNAME zpool status & scrubs report no errors. runs fine with no snapshots. take 1 snapshot and it panics within 5 min. |
mm.. while viewing snapshots via vfs_shadow_copy (samba) 2 Amitie10g
|
per Amitie10g's suggestion above of booting an older kernel i tried that. Fedora21 after taking a snapshot i get: ZFS: Unable to automount tank/data@2015_04_07_Tues at /tank/data/.zfs/snapshot/2015_04_07_Tues: 512 If I cd into the snapshot directory the 2015_04_07_Tues directory is there, but it has no contents. I do not get the kernel panic, but i dont get a snapshot either. zfs never attempts to expire the snapshot. this looks problem like #3030 and #2841 which seem to be related to the 3.18 kernel. so i am really out of luck right now. kernel 3.18 doesnt automount snapshots and kernel 3.19 panics after taking one. |
|
downgrade All OK |
upgraded to 0.6.4-1 pkgs from zol.org , and upgraded the pool, but got same results. Fedora 21 results here: 5 minutes later kernel panic when zfs attempts to auto umount the snapshot. |
I have the same, kernel panic in guest Arch Linux running under kvm. |
I had that a couple weeks ago when browsing snapshots via samba. datastore4 ~ # cat /sys/module/{spl,zfs}/version Here is the kernel panic: |
In case this is useful: prior to kernel panic, I am able to use the system for a little while and also start browsing snapshots, however obviously there is something amiss, see example below:
Note: error report when changing directory to |
FWIW I captured the following in serial console
Please note, crash happens only some time after I've used ZFS snapshots |
This does not happen on kernel version 3.18.11 . Instead in this kernel I am unable to mount the snapshots (but no crash). This is the same issue as #3030 . I will try bisecting the kernel to find which commit changed kernel behaviour from
to
@cooper75 this is the same as you are reporting above |
@Bronek, yes exact same issue/problem. |
It is possible for an automounted snapshot to expire, trigger the auto-unmount, successfully unmount but still return EBUSY. This can occur if the unmount command returns an unexpected error code. If concurrent with this another process triggers the snapshot auto-mount the snapshot will be remounted. This is correct and desirable behavior but it breaks that assumption that it's always safe to re-add the snapshot to the AVL tree. Therefore, we must check the AVL snapshot tree before assuming it's safe to add the snapshot. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#3257
Could someone who's able to trigger this issue please verify that #3287 resolves the issue. I haven't been able to reliably reproduce the issue. |
@behlendorf I'd love to help, but i am out of my depth with git. how do i go about pulling this commit into a local tree? |
@behlendorf I applied the patch on top of 0.6.3-1.3 and ran it with kernel 3.19.3 . It still does not quite work i.e. commands which trigger automount are reporting errors and then some commands wil work and others will not I spoke too soon, some time after browsing few snapshots I got this:
However the machine is still responsive; it's not a kernel panic this time. Also, when trying to unload zfs module (after I successfully executed Just got the same crash (but not kernel panic) with patched version 0.6.4 running under 3.19.3 When testing patched versions 0.6.3-1.3 and 0.6.4 against 3.18.11 I got the same as already documented in #3030 |
@behlendorf I downloaded your tree, and My results were similar to @Bronek. It no longer panics after the 5min auto-umount expires, but it eventually does lock up the machine. I was unable to do anything but hard reset.
$ uname -r $ modinfo zfs|head |
Thanks for the quick test guys. It looks like I'll need to reproduce this locally. |
@behlendorf FWIW, I'm using almost vanilla (only patch is change of log severity) kernel from ArchLinux running under kvm/qemu. |
I wrote simple wrapper for /bin/mount
kernel 3.17.8 - "mount" called once, all OK
Now, on 3.19.x:
all OK if we are INSIDE (after cd ).
mount just disappears if exiting from mounted snapshoot. And again, within 5 minutes we got kernel panic:
Digging kernel sources I stuck somewhere in fs/namei.c ( |
I also hade a similar kernel panic with ubuntu vivid Linux 3.19.0-15-generic and zfs 0.6.4.1-1
The machine crashed always after around 36 hours without doing anything. The strange thing was I have two new equally setup machines. One crashed the other not. After digging around I found the difference.
One zfs filesystem had set the Still not sure I removing the property will fix the problem, but I hope. Time will tell. |
"36 hours" cron job? Some updatedb(8) ? Traversing fs tree, visible is traversed. |
I had the same crash as [https://github.com//issues/3257#issuecomment-92294234] on debian unstable running 4.0.0-1, and snapdir was also set to 'visible' for me. Now trying if changing this to hidden helps. |
setting this to hidden don't helped for me. When you go into any .zfs directory the server crashed. As long as you don't do that it doesn't crash. But without going into any .zfs directory zfs is not really usable :-(. So the combination ubuntu vivid Linux 3.19.0 and zfs 0.6.4.1-1 doesn't really works. I went back to ubuntu trusty Linux 3.16.0 and zfs 0.6.4.1-1. Perhaps it's to early to switch already to 3.19 and systemd. |
I found what seems to be related bug in Linux mainline which has been fixed in torvalds/linux@8f502d5 and merged into 4.0.2 . Can anyone verify that this crash still happens (or not) in this version? |
Just an quick update. I'm still working on resolving this cleanly but I haven't had a ton of time to devote to it as I'm helping to get some other large changes reviewed, finalized, and merged. |
@behlendorf no worries, your work (also on these other large changes) is hugely appreciated. |
I am seeing an issue similar to @Bronek, and I have a core dump from one of the crashes. Details are on #3243 (comment). |
root@amnesiac:/# modinfo zfs root@amnesiac:/# uname -a root@amnesiac:/home/ftp/.zfs/snapshot# ls -laR ./GMT-2015.06.17-07.41.05: (((((( P.S. |
for reference - found some occurrences going back to 2008 and earlier: http://zfs-discuss.opensolaris.narkive.com/VjPjoypg/panic-avl-find-succeeded-inside-avl-add the last one references a fix on FreeBSD and the one on FreeBSD seems completely unrelated to this interesting ... Anyway - I've been running #3344 for some time and haven't seen this error (yet) knock on wood thanks for it ! |
I was bitten by this today, when I upgraded to kernel 4.1.3 and forgot to include patch #3344 into my build of ZoL. It would be nice to have this fixed already, without the need for extra patches ... |
+1, ran into the issue with a stock Arch Linux 4.1.2 kernel… |
Agreed. It's one of the few remaining blockers for 0.6.5 so expect it to be fixed fairly soon. |
+1 had this issue. |
spl-8ac6ffecaf
and panic after few minutes |
Re-factor the .zfs/snapshot auto-mouting code to take in to account changes made to the upstream kernels. And to lay the groundwork for enabling access to .zfs snapshots via NFS clients. This patch makes the following core improvements. * All actively auto-mounted snapshots are now tracked in two global trees which are indexed by snapshot name and objset id respectively. This allows for fast lookups of any auto-mounted snapshot regardless without needing access to the parent dataset. * Snapshot entries are added to the tree in zfsctl_snapshot_mount(). However, they are now removed from the tree in the context of the unmount process. This eliminates the need complicated error logic in zfsctl_snapshot_unmount() to handle unmount failures. * References are now taken on the snapshot entries in the tree to ensure they always remain valid while a task is outstanding. * The MNT_SHRINKABLE flag is set on the snapshot vfsmount_t right after the auto-mount succeeds. This allows to kernel to unmount idle auto-mounted snapshots if needed removing the need for the zfsctl_unmount_snapshots() function. * Snapshots in active use will not be automatically unmounted. As long as at least one dentry is revalidated every zfs_expire_snapshot/2 seconds the auto-unmount expiration timer will be extended. * Commit torvalds/linux@bafc9b7 caused snapshots auto-mounted by ZFS to be immediately unmounted when the dentry was revalidated. This was a consequence of ZFS invaliding all snapdir dentries to ensure that negative dentries didn't mask new snapshots. This patch modifies the behavior such that only negative dentries are invalidated. This solves the issue and may result in a performance improvement. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#3589 Issue openzfs#3344 Issue openzfs#3295 Issue openzfs#3257 Issue openzfs#3243 Issue openzfs#3030 Issue openzfs#2841
This seems to be fixed by #3718 , thanks @behlendorf ! |
Re-factor the .zfs/snapshot auto-mouting code to take in to account changes made to the upstream kernels. And to lay the groundwork for enabling access to .zfs snapshots via NFS clients. This patch makes the following core improvements. * All actively auto-mounted snapshots are now tracked in two global trees which are indexed by snapshot name and objset id respectively. This allows for fast lookups of any auto-mounted snapshot regardless without needing access to the parent dataset. * Snapshot entries are added to the tree in zfsctl_snapshot_mount(). However, they are now removed from the tree in the context of the unmount process. This eliminates the need complicated error logic in zfsctl_snapshot_unmount() to handle unmount failures. * References are now taken on the snapshot entries in the tree to ensure they always remain valid while a task is outstanding. * The MNT_SHRINKABLE flag is set on the snapshot vfsmount_t right after the auto-mount succeeds. This allows to kernel to unmount idle auto-mounted snapshots if needed removing the need for the zfsctl_unmount_snapshots() function. * Snapshots in active use will not be automatically unmounted. As long as at least one dentry is revalidated every zfs_expire_snapshot/2 seconds the auto-unmount expiration timer will be extended. * Commit torvalds/linux@bafc9b7 caused snapshots auto-mounted by ZFS to be immediately unmounted when the dentry was revalidated. This was a consequence of ZFS invaliding all snapdir dentries to ensure that negative dentries didn't mask new snapshots. This patch modifies the behavior such that only negative dentries are invalidated. This solves the issue and may result in a performance improvement. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#3589 Closes openzfs#3344 Closes openzfs#3295 Closes openzfs#3257 Closes openzfs#3243 Closes openzfs#3030 Closes openzfs#2841
Re-factor the .zfs/snapshot auto-mouting code to take in to account changes made to the upstream kernels. And to lay the groundwork for enabling access to .zfs snapshots via NFS clients. This patch makes the following core improvements. * All actively auto-mounted snapshots are now tracked in two global trees which are indexed by snapshot name and objset id respectively. This allows for fast lookups of any auto-mounted snapshot regardless without needing access to the parent dataset. * Snapshot entries are added to the tree in zfsctl_snapshot_mount(). However, they are now removed from the tree in the context of the unmount process. This eliminates the need complicated error logic in zfsctl_snapshot_unmount() to handle unmount failures. * References are now taken on the snapshot entries in the tree to ensure they always remain valid while a task is outstanding. * The MNT_SHRINKABLE flag is set on the snapshot vfsmount_t right after the auto-mount succeeds. This allows to kernel to unmount idle auto-mounted snapshots if needed removing the need for the zfsctl_unmount_snapshots() function. * Snapshots in active use will not be automatically unmounted. As long as at least one dentry is revalidated every zfs_expire_snapshot/2 seconds the auto-unmount expiration timer will be extended. * Commit torvalds/linux@bafc9b7 caused snapshots auto-mounted by ZFS to be immediately unmounted when the dentry was revalidated. This was a consequence of ZFS invaliding all snapdir dentries to ensure that negative dentries didn't mask new snapshots. This patch modifies the behavior such that only negative dentries are invalidated. This solves the issue and may result in a performance improvement. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#3589 Closes openzfs#3344 Closes openzfs#3295 Closes openzfs#3257 Closes openzfs#3243 Closes openzfs#3030 Closes openzfs#2841
Re-factor the .zfs/snapshot auto-mouting code to take in to account changes made to the upstream kernels. And to lay the groundwork for enabling access to .zfs snapshots via NFS clients. This patch makes the following core improvements. * All actively auto-mounted snapshots are now tracked in two global trees which are indexed by snapshot name and objset id respectively. This allows for fast lookups of any auto-mounted snapshot regardless without needing access to the parent dataset. * Snapshot entries are added to the tree in zfsctl_snapshot_mount(). However, they are now removed from the tree in the context of the unmount process. This eliminates the need complicated error logic in zfsctl_snapshot_unmount() to handle unmount failures. * References are now taken on the snapshot entries in the tree to ensure they always remain valid while a task is outstanding. * The MNT_SHRINKABLE flag is set on the snapshot vfsmount_t right after the auto-mount succeeds. This allows to kernel to unmount idle auto-mounted snapshots if needed removing the need for the zfsctl_unmount_snapshots() function. * Snapshots in active use will not be automatically unmounted. As long as at least one dentry is revalidated every zfs_expire_snapshot/2 seconds the auto-unmount expiration timer will be extended. * Commit torvalds/linux@bafc9b7 caused snapshots auto-mounted by ZFS to be immediately unmounted when the dentry was revalidated. This was a consequence of ZFS invaliding all snapdir dentries to ensure that negative dentries didn't mask new snapshots. This patch modifies the behavior such that only negative dentries are invalidated. This solves the issue and may result in a performance improvement. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#3589 Closes openzfs#3344 Closes openzfs#3295 Closes openzfs#3257 Closes openzfs#3243 Closes openzfs#3030 Closes openzfs#2841
Re-factor the .zfs/snapshot auto-mouting code to take in to account changes made to the upstream kernels. And to lay the groundwork for enabling access to .zfs snapshots via NFS clients. This patch makes the following core improvements. * All actively auto-mounted snapshots are now tracked in two global trees which are indexed by snapshot name and objset id respectively. This allows for fast lookups of any auto-mounted snapshot regardless without needing access to the parent dataset. * Snapshot entries are added to the tree in zfsctl_snapshot_mount(). However, they are now removed from the tree in the context of the unmount process. This eliminates the need complicated error logic in zfsctl_snapshot_unmount() to handle unmount failures. * References are now taken on the snapshot entries in the tree to ensure they always remain valid while a task is outstanding. * The MNT_SHRINKABLE flag is set on the snapshot vfsmount_t right after the auto-mount succeeds. This allows to kernel to unmount idle auto-mounted snapshots if needed removing the need for the zfsctl_unmount_snapshots() function. * Snapshots in active use will not be automatically unmounted. As long as at least one dentry is revalidated every zfs_expire_snapshot/2 seconds the auto-unmount expiration timer will be extended. * Commit torvalds/linux@bafc9b7 caused snapshots auto-mounted by ZFS to be immediately unmounted when the dentry was revalidated. This was a consequence of ZFS invaliding all snapdir dentries to ensure that negative dentries didn't mask new snapshots. This patch modifies the behavior such that only negative dentries are invalidated. This solves the issue and may result in a performance improvement. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#3589 Closes openzfs#3344 Closes openzfs#3295 Closes openzfs#3257 Closes openzfs#3243 Closes openzfs#3030 Closes openzfs#2841 Conflicts: config/kernel.m4 module/zfs/zfs_ctldir.c
http://i.imgur.com/hEaJDSy.png
The text was updated successfully, but these errors were encountered: