-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZFS hangs when hotplugging disk #2185
Comments
I got more logs from kernel with debug: After SysRq + W [ 400.980936] zd0: unknown partition table and soft lockup bugs: |
Things have improved considerably in the master source. I would be very interested to see if you can reproduce this issue with the latest code. |
Hello,
I'm using ZoL 0.6.2 with Linux kernel 3.10.30. I've seen similar issues with SCST patched kernel but there are no SCST patches.
I created three zpools on three different drives with one zvol on each pool then I shut down the system and unplugged all three disks. Ten I booted up again system. Zpool status returned all three pools and they were marked as unavailable. I plugged all three disks one by one and executed "zpool status" command. Zpool status hanged and I couldn't kill this process. I displayed blocked tasks with SysRq + W:
[ 348.498695] SysRq : Show Blocked State
[ 348.498705] task PC stack pid father
[ 348.498786] zpool D 0000000000000000 0 27672 26976 0x00000000
[ 348.498794] ffff880073476e80 0000000000000002 ffffffff816c50a0 ffff88007e387500
[ 348.498801] ffff88007b22ffd8 ffff88007b22ffd8 ffff88007b22ffd8 ffff880073476e80
[ 348.498807] ffff88007b4cc148 ffff880073476ec8 ffff88007b4cc5fa ffffffff8107365f
[ 348.498813] Call Trace:
[ 348.498828] [] ? check_preempt_wakeup+0x12f/0x270
[ 348.498837] [] ? _raw_spin_unlock_irqrestore+0x8/0x10
[ 348.498844] [] ? try_to_wake_up+0xcb/0x290
[ 348.498851] [] ? set_user_nice+0xd5/0x190
[ 348.498871] [] ? taskq_create+0x2e5/0x4d0 [spl]
[ 348.498880] [] ? abort_exclusive_wait+0xb0/0xb0
[ 348.498935] [] ? spa_activate+0x2f3/0x400 [zfs]
[ 348.498986] [] ? spa_open_common+0x137/0x380 [zfs]
[ 348.499041] [] ? pool_status_check+0x41/0xa0 [zfs]
[ 348.499094] [] ? zfsdev_ioctl+0x1a5/0x1b0 [zfs]
[ 348.499101] [] ? do_vfs_ioctl+0x8a/0x4f0
[ 348.499108] [] ? get_vtime_delta+0x16/0x80
[ 348.499114] [] ? vtime_account_user+0x50/0x70
[ 348.499120] [] ? SyS_ioctl+0xa0/0xc0
[ 348.499127] [] ? tracesys+0xdd/0xe2
[ 348.499154] txg_sync D 0000000000000001 0 27891 2 0x00000000
[ 348.499161] ffff8800730ef500 0000000000000002 ffff880000000000 ffff8800730ee180
[ 348.499166] ffff88007177dfd8 ffff88007177dfd8 ffff88007177dfd8 ffff8800730ef500
[ 348.499172] ffff88007e412c40 0000000000000000 0000000000000000 ffffffff8106c95c
[ 348.499177] Call Trace:
[ 348.499185] [] ? check_preempt_curr+0x7c/0x90
[ 348.499192] [] ? ttwu_do_wakeup+0x11/0x90
[ 348.499198] [] ? try_to_wake_up+0xcb/0x290
[ 348.499204] [] ? __wake_up_common+0x4f/0x80
[ 348.499211] [] ? io_schedule+0x56/0x80
[ 348.499228] [] ? cv_wait_common+0xa5/0x1c0 [spl]
[ 348.499241] [] ? taskq_dispatch_ent+0x6d/0x1c0 [spl]
[ 348.499249] [] ? abort_exclusive_wait+0xb0/0xb0
[ 348.499306] [] ? zio_wait+0xeb/0x1a0 [zfs]
[ 348.499352] [] ? dsl_pool_sync+0xe6/0x530 [zfs]
[ 348.499401] [] ? spa_sync+0x3f9/0xac0 [zfs]
[ 348.499416] [] ? __gethrtime+0xc/0x30 [spl]
[ 348.499423] [] ? ktime_get_ts+0x3d/0xd0
[ 348.499473] [] ? txg_sync_thread+0x2eb/0x550 [zfs]
[ 348.499481] [] ? sched_clock+0x5/0x10
[ 348.499532] [] ? txg_thread_wait.isra.2+0x30/0x30 [zfs]
[ 348.499545] [] ? thread_generic_wrapper+0x75/0x90 [spl]
[ 348.499559] [] ? __thread_create+0x310/0x310 [spl]
[ 348.499565] [] ? kthread+0xb3/0xc0
[ 348.499571] [] ? alloc_pid+0x1e0/0x490
[ 348.499578] [] ? kthread_freezable_should_stop+0x60/0x60
[ 348.499584] [] ? ret_from_fork+0x7c/0xb0
[ 348.499591] [] ? kthread_freezable_should_stop+0x60/0x60
and after 40 seconds I got RCU stall:
[ 386.240004] INFO: rcu_sched self-detected stall on CPU { 0} (t=6000 jiffies g=3771 c=3770 q=0)
[ 386.240005] sending NMI to all CPUs:
[ 386.240005] NMI backtrace for cpu 0
[ 386.240005] CPU: 0 PID: 27908 Comm: vol_id Tainted: P O 3.10.30-oe64-00000-g8b3a21b #50
[ 386.240005] Hardware name: To be filled by O.E.M. WDBLGT0080KBK-40/GA8-IBLV, BIOS 4.6.4 10/18/2011
[ 386.240005] task: ffff880071f15b00 ti: ffff880075c6e000 task.ti: ffff880075c6e000
[ 386.240005] RIP: 0010:[] [] kasprintf+0x50/0x50
[ 386.240005] RSP: 0000:ffff88007e403e40 EFLAGS: 00000096
[ 386.240005] RAX: 0000000000000c00 RBX: 0000000000002710 RCX: 0000000000000006
[ 386.240005] RDX: 0000000000000007 RSI: 0000000000000080 RDI: ffffffff81a19b60
[ 386.240005] RBP: ffffffff8198dd40 R08: 20676e69646e6573 R09: 61206f7420494d4e
[ 386.240005] R10: 00000000000003a5 R11: 3a73555043206c6c R12: ffffffff8198dd40
[ 386.240005] R13: 0000000000000000 R14: ffffffff81a1a040 R15: ffff880075c6e000
[ 386.240005] FS: 0000000000000000(0000) GS:ffff88007e400000(0063) knlGS:00000000f75e66c0
[ 386.240005] CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b
[ 386.240005] CR2: 00000000f76a5230 CR3: 0000000075c39000 CR4: 00000000000007f0
[ 386.240005] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 386.240005] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 386.240005] Stack:
[ 386.240005] ffffffff8102d548 ffff88007e40e640 ffffffff810b1dbb ffff88007e412c40
[ 386.240005] 0000000000000000 ffffffff810121af ffffffff81012215 ffffffff81070b85
[ 386.240005] ffff880071f15b00 0000000000000000 0000000000000000 ffff88007e40e198
[ 386.240005] Call Trace:
[ 386.240005]
[ 386.240005] [] ? arch_trigger_all_cpu_backtrace+0x78/0x90
[ 386.240005] [] ? rcu_check_callbacks+0x1eb/0x5b0
[ 386.240005] [] ? native_sched_clock+0xf/0x70
[ 386.240005] [] ? sched_clock+0x5/0x10
[ 386.240005] [] ? sched_clock_local+0x15/0x80
[ 386.240005] [] ? update_process_times+0x3f/0x80
[ 386.240005] [] ? tick_sched_timer+0x66/0x90
[ 386.240005] [] ? __run_hrtimer.isra.31+0x4e/0xe0
[ 386.240005] [] ? hrtimer_interrupt+0xfc/0x250
[ 386.240005] [] ? smp_apic_timer_interrupt+0x63/0xa0
[ 386.240005] [] ? apic_timer_interrupt+0x6d/0x80
[ 386.240005]
[ 386.240005] [] ? try_module_get+0x3/0x20
[ 386.240005] [] ? get_disk+0x2e/0x80
[ 386.240005] [] ? exact_lock+0xc/0x20
[ 386.240005] [] ? kobj_lookup+0xd9/0x160
[ 386.240005] [] ? disk_map_sector_rcu+0x70/0x70
[ 386.240005] [] ? get_gendisk+0x34/0x120
[ 386.240005] [] ? __blkdev_get+0x11a/0x410
[ 386.240005] [] ? blkdev_get_block+0x20/0x20
[ 386.240005] [] ? blkdev_get+0x4b/0x2f0
[ 386.240005] [] ? unlock_new_inode+0x3a/0x60
[ 386.240005] [] ? bdget+0x112/0x130
[ 386.240005] [] ? blkdev_get+0x2f0/0x2f0
[ 386.240005] [] ? do_dentry_open+0x235/0x2b0
[ 386.240005] [] ? finish_open+0x28/0x40
[ 386.240005] [] ? do_last+0x7ca/0xe30
[ 386.240005] [] ? __inode_permission+0x29/0xa0
[ 386.240005] [] ? link_path_walk+0x245/0x920
[ 386.240005] [] ? flush_tlb_page+0x42/0xb0
[ 386.240005] [] ? path_openat+0xc6/0x500
[ 386.240005] [] ? handle_pte_fault+0xaa/0x930
[ 386.240005] [] ? do_filp_open+0x45/0xb0
[ 386.240005] [] ? __alloc_fd+0x3d/0x110
[ 386.240005] [] ? do_sys_open+0xfe/0x1f0
[ 386.240005] [] ? compat_SyS_open+0x6c/0x110
[ 386.240005] [] ? get_vtime_delta+0x16/0x80
[ 386.240005] [] ? vtime_account_user+0x50/0x70
[ 386.240005] [] ? syscall_trace_enter+0x1a/0x1f0
[ 386.240005] [] ? ia32_do_call+0x13/0x13
[ 386.240005] Code: 24 20 4c 89 4c 24 48 c7 44 24 08 10 00 00 00 48 89 44 24 18 e8 22 ff ff ff 48 83 c4 58 c3 90 90 90 90 90 90 90 90 90 90 90 90 90 <8d> 4e 3f 85 f6 0f 49 ce c1 f9 06 85 c9 7e 65 31 c0 48 83 3f 00
[ 386.240553] NMI backtrace for cpu 2
[ 386.240566] CPU: 2 PID: 0 Comm: swapper/2 Tainted: P O 3.10.30-oe64-00000-g8b3a21b #50
[ 386.240571] Hardware name: To be filled by O.E.M. WDBLGT0080KBK-40/GA8-IBLV, BIOS 4.6.4 10/18/2011
[ 386.240578] task: ffff88007e387500 ti: ffff88007e3c8000 task.ti: ffff88007e3c8000
[ 386.240583] RIP: 0010:[] [] intel_idle+0xc7/0x140
[ 386.240597] RSP: 0000:ffff88007e3c9e48 EFLAGS: 00000046
[ 386.240602] RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000001
[ 386.240607] RDX: 0000000000000000 RSI: ffff88007e3c9fd8 RDI: 0000000000000002
[ 386.240611] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000582f10
[ 386.240616] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
[ 386.240620] R13: 0000000000000001 R14: 0000000000000001 R15: ffff88007e3c9fd8
[ 386.240626] FS: 0000000000000000(0000) GS:ffff88007e500000(0000) knlGS:0000000000000000
[ 386.240631] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 386.240636] CR2: 00000000f7762000 CR3: 000000007aedf000 CR4: 00000000000007e0
[ 386.240641] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 386.240645] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 386.240649] Stack:
[ 386.240652] 0000000000000000 000000020a61a3ff ffffffff8157ae3e ffff88007e518800
[ 386.240660] 00000059ed731d9a ffffffff819a36b0 ffffffff819a3640 ffffffff8157ae2c
[ 386.240667] 0000000000000000 000000000a61a3ff 0000000000000000 ffff88007e518800
[ 386.240674] Call Trace:
[ 386.240686] [] ? cpuidle_enter_state+0x5e/0xf0
[ 386.240693] [] ? cpuidle_enter_state+0x4c/0xf0
[ 386.240701] [] ? cpuidle_idle_call+0xa1/0x150
[ 386.240710] [] ? arch_cpu_idle+0x9/0x30
[ 386.240718] [] ? cpu_startup_entry+0x83/0x170
[ 386.240723] Code: 48 8b 34 25 f0 c5 00 00 48 89 d1 48 8d 86 38 e0 ff ff 0f 01 c8 0f ae f0 48 8b 86 38 e0 ff ff a8 08 75 08 b1 01 4c 89 e0 0f 01 c9 <85> 1d d3 b7 63 00 75 0f 48 8d 74 24 0c bf 05 00 00 00 e8 52 a0
[ 386.240795] NMI backtrace for cpu 3
[ 386.240803] CPU: 3 PID: 0 Comm: swapper/3 Tainted: P O 3.10.30-oe64-00000-g8b3a21b #50
[ 386.240808] Hardware name: To be filled by O.E.M. WDBLGT0080KBK-40/GA8-IBLV, BIOS 4.6.4 10/18/2011
[ 386.240814] task: ffff88007e3d0000 ti: ffff88007e3ca000 task.ti: ffff88007e3ca000
[ 386.240819] RIP: 0010:[] [] interruptible_sleep_on+0x20/0x20
[ 386.240829] RSP: 0000:ffff88007e3cbf00 EFLAGS: 00000296
[ 386.240834] RAX: ffff88007e3d0000 RBX: ffff88007e3cbfd8 RCX: 0000000000000020
[ 386.240838] RDX: 0000000000000000 RSI: 0000000000000082 RDI: 0000000000000082
[ 386.240843] RBP: ffffffff81a1a030 R08: 0000000000000000 R09: 0000000000000000
[ 386.240847] R10: 0000000000000001 R11: 0000000000000000 R12: ffff88007e3cbfd8
[ 386.240852] R13: ffff88007e3cbfd8 R14: ffff88007e3cbfd8 R15: ffff88007e3cbfd8
[ 386.240857] FS: 0000000000000000(0000) GS:ffff88007e580000(0000) knlGS:0000000000000000
[ 386.240862] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 386.240867] CR2: ffffffffff600400 CR3: 000000007a3fd000 CR4: 00000000000007e0
[ 386.240871] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 386.240876] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 386.240880] Stack:
[ 386.240883] ffffffff816b17c9 ffff88007e3cbfd8 ffffffff8107c44c 0000000000000001
[ 386.240890] 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 386.240896] 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 386.240903] Call Trace:
[ 386.240912] [] ? schedule_preempt_disabled+0x9/0x10
[ 386.240920] [] ? cpu_startup_entry+0x12c/0x170
[ 386.240927] Code: 00 00 00 e9 f3 fe ff ff 0f 1f 00 48 ba ff ff ff ff ff ff ff 7f be 01 00 00 00 e9 dc fe ff ff 66 66 66 2e 0f 1f 84 00 00 00 00 00 <41> 57 65 48 8b 04 25 f0 c5 00 00 41 56 65 48 8b 14 25 00 c6 00
[ 386.240011] NMI backtrace for cpu 1
[ 386.240011] CPU: 1 PID: 0 Comm: swapper/1 Tainted: P O 3.10.30-oe64-00000-g8b3a21b #50
[ 386.240011] Hardware name: To be filled by O.E.M. WDBLGT0080KBK-40/GA8-IBLV, BIOS 4.6.4 10/18/2011
[ 386.240011] task: ffff88007e386e80 ti: ffff88007e3be000 task.ti: ffff88007e3be000
[ 386.240011] RIP: 0010:[] [] intel_idle+0xc7/0x140
[ 386.240011] RSP: 0018:ffff88007e3bfe48 EFLAGS: 00000046
[ 386.240011] RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000001
[ 386.240011] RDX: 0000000000000000 RSI: ffff88007e3bffd8 RDI: 0000000000000001
[ 386.240011] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000066d51
[ 386.240011] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
[ 386.240011] R13: 0000000000000001 R14: 0000000000000001 R15: ffff88007e3bffd8
[ 386.240011] FS: 0000000000000000(0000) GS:ffff88007e480000(0000) knlGS:0000000000000000
[ 386.240011] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 386.240011] CR2: 00000000f77a2000 CR3: 0000000072108000 CR4: 00000000000007e0
[ 386.240011] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 386.240011] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 386.240011] Stack:
[ 386.240011] 0000000000000000 0000000101ad1892 ffffffff8157ae3e ffff88007e498800
[ 386.240011] 00000059ed1aabee ffffffff819a36b0 ffffffff819a3640 ffffffff8157ae2c
[ 386.240011] 0000000000000000 0000000001ad1892 0000000000000000 ffff88007e498800
[ 386.240011] Call Trace:
[ 386.240011] [] ? cpuidle_enter_state+0x5e/0xf0
[ 386.240011] [] ? cpuidle_enter_state+0x4c/0xf0
[ 386.240011] [] ? cpuidle_idle_call+0xa1/0x150
[ 386.240011] [] ? arch_cpu_idle+0x9/0x30
[ 386.240011] [] ? cpu_startup_entry+0x83/0x170
[ 386.240011] Code: 48 8b 34 25 f0 c5 00 00 48 89 d1 48 8d 86 38 e0 ff ff 0f 01 c8 0f ae f0 48 8b 86 38 e0 ff ff a8 08 75 08 b1 01 4c 89 e0 0f 01 c9 <85> 1d d3 b7 63 00 75 0f 48 8d 74 24 0c bf 05 00 00 00 e8 52 a0
This is 100% reproducible with 3 drives.
The text was updated successfully, but these errors were encountered: