You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Same setup as outlined in issue #1761. Rebooted a server (hostname ctc-san2) affected by the "dataset busy" error to allow snapshot send / recv to continue at roughly 16:17 local time. At 16:45 local time, scripted zfs send cron task kicked off from another host, with ctc-san2 as the "zfs recv" side. Fairly large snapshot stream of around 20 incremental snapshots. Logs indicate that about eight snapshots were received successfully, up to 17:26 local time. Shortly after that (17:34 local time) the system dumped core. (Large core file, so the dump took awhile. But when it did finish, the system rebooted and continued with normal operation.)
I've been running main + some patches (see below) for two weeks now with major snapshot transfer traffic. No kernel panics or dataset busy errors! It appears this issue is fixed.
[root@dtc-san2 spl]# git status
On branch master
Your branch is ahead of 'origin/master' by 2 commits.
nothing to commit (working directory clean)
[root@dtc-san2 spl]# git log --oneline | head -n 4
75a9afb Merge branch 'kernel-slab' of https://github.com/DeHackEd/spl 4c99541 Remove incorrect use of EXTRA_DIST for man pages
1f3c06f Use Linux slab allocator for small allocations d58a99a Define the needed ISA types for Sparc
[root@dtc-san2 spl]# cd ../zfs
[root@dtc-san2 zfs]# git status
On branch master
Your branch is ahead of 'origin/master' by 17 commits.
nothing to commit (working directory clean)
[root@dtc-san2 zfs]# git log --oneline | head -n 20
05c3ca1 Merge branch 'arc-mutexes' of https://github.com/DeHackEd/zfs
cad1342 Merge branch 'issue-2094' of https://github.com/behlendorf/zfs dc47d0c Workaround for pools affected by #2094 erratum 02c0d35 Revert changes to zbookmark_t bff1dc8 Add zimport.sh compatibility test script 92d5283 Set "arc_meta_limit" to 3/4 arc_c_max by default 520f372 Split "data_size" into "meta" and "data" 513198f Prioritize "metadata" in arc_get_data_buf 863f942 Remove "arc_meta_used" from arc_adjust calculation d03b1ff Prune metadata from ghost lists in arc_adjust_meta cd661cd Revert "Return -1 from arc_shrinker_func()" f5ee929 Disable arc_p adapt dampener by default fdd21ca Allow "arc_p" to drop to zero or grow to "arc_c" 00fc78f Disable aggressive arc_p growth by default 7364874 Adjust arc_p based on "bytes" in arc_shrink 8e82b07 Set zfs_arc_min to 4MB 98fad86 Propagate errors when registering "relatime" property callback. 99d3ece Add default FILEDIR path to zpool-config scripts f12971e Add explicit Conflicts for zfs-fuse packages 0820ec6 Fix zconfig.sh test 9
Same setup as outlined in issue #1761. Rebooted a server (hostname ctc-san2) affected by the "dataset busy" error to allow snapshot send / recv to continue at roughly 16:17 local time. At 16:45 local time, scripted zfs send cron task kicked off from another host, with ctc-san2 as the "zfs recv" side. Fairly large snapshot stream of around 20 incremental snapshots. Logs indicate that about eight snapshots were received successfully, up to 17:26 local time. Shortly after that (17:34 local time) the system dumped core. (Large core file, so the dump took awhile. But when it did finish, the system rebooted and continued with normal operation.)
Comments / suggestions most welcome. =)
Debug output from crash utility:
crash> log
...
BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
IP: [] mutex_lock+0x21/0x50
PGD 108383d067 PUD 1186785067 PMD 0
Oops: 0002 [#1] SMP
last sysfs file: /sys/devices/virtual/block/zd16/dev
CPU 1
Modules linked in: mptctl mptbase drbd(U) autofs4 sunrpc cpufreq_ondemand freq_table pcc_cpufreq bonding 8021q garp stp llc ipv6 uinput hpilo hpwdt qlcnic zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate microcode serio_raw iTCO_wdt iTCO_vendor_support ioatdma dca sg power_meter bnx2x libcrc32c mdio tg3 shpchp ext4 jbd2 mbcache sd_mod crc_t10dif sr_mod cdrom hpsa pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
Pid: 14962, comm: zvol_id Tainted: P --------------- 2.6.32-358.18.1.el6.x86_64 #1 HP ProLiant DL380p Gen8
RIP: 0010:[] [] mutex_lock+0x21/0x50
RSP: 0018:ffff88107f9cdbe8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000058 RCX: ffff8816f611d9e8
RDX: 0000000000000000 RSI: ffffffffa038d628 RDI: 0000000000000058
RBP: ffff88107f9cdbf8 R08: ffff88107f9cdb20 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffa038d628
R13: 000000000000001d R14: ffff8817c3f05be0 R15: ffff8810803ce000
FS: 00007f961929d740(0000) GS:ffff880094820000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000058 CR3: 000000108118f000 CR4: 00000000000407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process zvol_id (pid: 14962, threadinfo ffff88107f9cc000, task ffff88181b29a040)
Stack:
0000000000000000 ffffffffa038d628 ffff88107f9cdc18 ffffffffa02dca75
ffff8816fffc6000 ffffffffa038d628 ffff88107f9cdc38 ffffffffa0380c6c
ffff8816fffc6000 0000000000000001 ffff88107f9cdc58 ffffffffa0380d58
Call Trace:
[] dmu_buf_rele+0x25/0x40 [zfs]
[] zvol_last_close+0x3c/0xb0 [zfs]
[] zvol_release+0x78/0x80 [zfs]
[] __blkdev_put+0x1a6/0x1b0
[] __blkdev_get+0x2e9/0x3b0
[] ? blkdev_open+0x0/0xc0
[] blkdev_get+0x10/0x20
[] blkdev_open+0x71/0xc0
[] __dentry_open+0x10a/0x360
[] ? security_inode_permission+0x1f/0x30
[] nameidata_to_filp+0x54/0x70
[] do_filp_open+0x6d0/0xdc0
[] ? __do_page_fault+0x1ec/0x480
[] ? alloc_fd+0x92/0x160
[] do_sys_open+0x69/0x140
[] sys_open+0x20/0x30
[] system_call_fastpath+0x16/0x1b
Code: ff ff 85 c0 75 ed eb d6 66 90 55 48 89 e5 48 83 ec 10 48 89 1c 24 4c 89 64 24 08 0f 1f 44 00 00 48 89 fb e8 a2 f4 ff ff 48 89 df ff 0f 79 05 e8 25 00 00 00 65 48 8b 04 25 c8 cb 00 00 48 2d
RIP [] mutex_lock+0x21/0x50
RSP
CR2: 0000000000000058
BACKTRACE
crash> bt
PID: 14962 TASK: ffff88181b29a040 CPU: 1 COMMAND: "zvol_id"
#0 [ffff88107f9cd7b0] machine_kexec at ffffffff81035d6b
#1 [ffff88107f9cd810] crash_kexec at ffffffff810c0e22
#2 [ffff88107f9cd8e0] oops_end at ffffffff81511c20
#3 [ffff88107f9cd910] no_context at ffffffff81046c1b
#4 [ffff88107f9cd960] __bad_area_nosemaphore at ffffffff81046ea5
#5 [ffff88107f9cd9b0] bad_area at ffffffff81046fce
#6 [ffff88107f9cd9e0] __do_page_fault at ffffffff81047780
#7 [ffff88107f9cdb00] do_page_fault at ffffffff81513b6e
#8 [ffff88107f9cdb30] page_fault at ffffffff81510f25
[exception RIP: mutex_lock+33]
RIP: ffffffff8150f621 RSP: ffff88107f9cdbe8 RFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000058 RCX: ffff8816f611d9e8
RDX: 0000000000000000 RSI: ffffffffa038d628 RDI: 0000000000000058
RBP: ffff88107f9cdbf8 R8: ffff88107f9cdb20 R9: 0000000000000001
R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffa038d628
R13: 000000000000001d R14: ffff8817c3f05be0 R15: ffff8810803ce000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#9 [ffff88107f9cdc00] dmu_buf_rele at ffffffffa02dca75 [zfs]
#10 [ffff88107f9cdc20] zvol_last_close at ffffffffa0380c6c [zfs]
#11 [ffff88107f9cdc40] zvol_release at ffffffffa0380d58 [zfs]
#12 [ffff88107f9cdc60] __blkdev_put at ffffffff811bcdb6
#13 [ffff88107f9cdcb0] __blkdev_get at ffffffff811bd169
#14 [ffff88107f9cdd10] blkdev_get at ffffffff811bd240
#15 [ffff88107f9cdd20] blkdev_open at ffffffff811bd2c1
#16 [ffff88107f9cdd50] __dentry_open at ffffffff8117e49a
#17 [ffff88107f9cddb0] nameidata_to_filp at ffffffff8117e804
#18 [ffff88107f9cddd0] do_filp_open at ffffffff81193260
#19 [ffff88107f9cdf20] do_sys_open at ffffffff8117e249
#20 [ffff88107f9cdf70] sys_open at ffffffff8117e360
#21 [ffff88107f9cdf80] system_call_fastpath at ffffffff8100b072
Subset of zpool history -i:
2013-10-16.17:05:09 [internal destroy txg:4673114] dataset = 45990
2013-10-16.17:05:14 zfs destroy pool/chicago@hour-20131016130101
2013-10-16.17:07:23 [internal snapshot txg:4673142] dataset = 61585
2013-10-16.17:07:23 [internal destroy txg:4673143] dataset = 61535
2013-10-16.17:07:23 [internal property set txg:4673143] reservation=0 dataset = 61535
2013-10-16.17:07:23 [internal replay_inc_sync txg:4673144] dataset = 61591
2013-10-16.17:11:46 [internal snapshot txg:4673198] dataset = 61615
2013-10-16.17:11:46 [internal destroy txg:4673199] dataset = 61591
2013-10-16.17:11:46 [internal property set txg:4673199] reservation=0 dataset = 61591
2013-10-16.17:11:47 [internal replay_inc_sync txg:4673200] dataset = 61621
2013-10-16.17:21:05 [internal snapshot txg:4673313] dataset = 61645
2013-10-16.17:21:05 [internal destroy txg:4673314] dataset = 61621
2013-10-16.17:21:05 [internal property set txg:4673314] reservation=0 dataset = 61621
2013-10-16.17:21:06 [internal replay_inc_sync txg:4673315] dataset = 61651
2013-10-16.17:22:15 [internal snapshot txg:4673330] dataset = 61670
2013-10-16.17:22:15 [internal destroy txg:4673331] dataset = 61651
2013-10-16.17:22:15 [internal property set txg:4673331] reservation=0 dataset = 61651
2013-10-16.17:22:16 [internal replay_inc_sync txg:4673332] dataset = 61676
2013-10-16.17:25:35 [internal snapshot txg:4673373] dataset = 61702
2013-10-16.17:25:36 [internal destroy txg:4673374] dataset = 61676
2013-10-16.17:25:36 [internal property set txg:4673374] reservation=0 dataset = 61676
2013-10-16.17:25:36 [internal replay_inc_sync txg:4673375] dataset = 61708
2013-10-16.17:26:32 [internal snapshot txg:4673388] dataset = 61731
2013-10-16.17:26:32 [internal destroy txg:4673389] dataset = 61708
2013-10-16.17:26:32 [internal property set txg:4673389] reservation=0 dataset = 61708
2013-10-16.17:26:33 [internal replay_inc_sync txg:4673390] dataset = 61737
2013-10-16.17:34:19 [internal snapshot txg:4673485] dataset = 61764
2013-10-16.17:34:19 [internal destroy txg:4673486] dataset = 61737
2013-10-16.17:34:19 [internal property set txg:4673486] reservation=0 dataset = 61737
2013-10-16.19:01:01 [internal snapshot txg:4673546] dataset = 61443
2013-10-16.19:01:06 zfs snapshot pool/chicago@hour-20131016190101
The text was updated successfully, but these errors were encountered: