Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hung/Crash for unknown reason #2247

Closed
jby opened this issue Apr 8, 2014 · 20 comments
Closed

Hung/Crash for unknown reason #2247

jby opened this issue Apr 8, 2014 · 20 comments
Milestone

Comments

@jby
Copy link

jby commented Apr 8, 2014

There are a couple of rsync tasks at the end, they might not be interesting but then again they might be. The hung rsync tasks are caused by the previous hangs.

Don't know if it's 100% ZFS related since it starts off with the hung kthreadd.

Apr  8 02:23:58 psysbkp2 kernel: INFO: task kthreadd:2 blocked for more than 120 seconds.
Apr  8 02:23:58 psysbkp2 kernel:      Tainted: P           ---------------    2.6.32-431.5.1.el6.x86_64 #1
Apr  8 02:23:58 psysbkp2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr  8 02:23:58 psysbkp2 kernel: kthreadd      D 0000000000000003     0     2      0 0x00000000
Apr  8 02:23:58 psysbkp2 kernel: ffff88043a1e15a0 0000000000000046 0000000000000000 ffff880400000005
Apr  8 02:23:58 psysbkp2 kernel: 0000000500000044 0000000000000001 ffff88043a1e1640 0000000000000086
Apr  8 02:23:58 psysbkp2 kernel: ffff88043a1d9058 ffff88043a1e1fd8 000000000000fbc8 ffff88043a1d9058
Apr  8 02:23:58 psysbkp2 kernel: Call Trace:
Apr  8 02:23:58 psysbkp2 kernel: [<ffffffffa0c8d475>] cv_wait_common+0x105/0x1c0 [spl]
Apr  8 02:23:58 psysbkp2 kernel: [<ffffffff8109b290>] ? autoremove_wake_function+0x0/0x40
Apr  8 02:23:58 psysbkp2 kernel: [<ffffffffa0c8d585>] __cv_wait+0x15/0x20 [spl]
Apr  8 02:23:58 psysbkp2 kernel: [<ffffffffa0d9ddbb>] txg_wait_open+0x8b/0x110 [zfs]
Apr  8 02:23:58 psysbkp2 kernel: [<ffffffffa0d5e7cd>] dmu_tx_wait+0xed/0xf0 [zfs]
Apr  8 02:23:58 psysbkp2 kernel: [<ffffffffa0d5e8ae>] dmu_tx_assign+0x8e/0x4e0 [zfs]
Apr  8 02:23:58 psysbkp2 kernel: [<ffffffffa0dd9866>] zfs_inactive+0x186/0x220 [zfs]
Apr  8 02:23:58 psysbkp2 kernel: [<ffffffffa0dec11e>] zpl_clear_inode+0xe/0x10 [zfs]
Apr  8 02:23:58 psysbkp2 kernel: [<ffffffff811a5bcc>] clear_inode+0xac/0x140
Apr  8 02:23:58 psysbkp2 kernel: [<ffffffff811a5ca0>] dispose_list+0x40/0x120
Apr  8 02:23:58 psysbkp2 kernel: [<ffffffff811a5ff4>] shrink_icache_memory+0x274/0x2e0
Apr  8 02:23:58 psysbkp2 kernel: [<ffffffff81138aca>] shrink_slab+0x12a/0x1a0
Apr  8 02:23:58 psysbkp2 kernel: [<ffffffff8113b6ce>] zone_reclaim+0x3ae/0x650
Apr  8 02:23:58 psysbkp2 kernel: [<ffffffff8112d82c>] get_page_from_freelist+0x6ac/0x870
Apr  8 02:23:58 psysbkp2 kernel: [<ffffffff8112f393>] __alloc_pages_nodemask+0x113/0x8d0
Apr  8 02:23:58 psysbkp2 kernel: [<ffffffff8116ec8b>] ? cache_alloc_refill+0x15b/0x240
Apr  8 02:23:58 psysbkp2 kernel: [<ffffffff810153a3>] ? native_sched_clock+0x13/0x80
Apr  8 02:23:58 psysbkp2 kernel: [<ffffffff81014969>] ? sched_clock+0x9/0x10
Apr  8 02:23:58 psysbkp2 kernel: [<ffffffff810a2525>] ? sched_clock_local+0x25/0x90
Apr  8 02:23:58 psysbkp2 kernel: [<ffffffff81167a8a>] alloc_pages_current+0xaa/0x110
Apr  8 02:23:58 psysbkp2 kernel: [<ffffffff8112cf2e>] __get_free_pages+0xe/0x50
Apr  8 02:23:58 psysbkp2 kernel: [<ffffffff8106fa46>] copy_process+0x126/0x1450
Apr  8 02:23:58 psysbkp2 kernel: [<ffffffff81070e11>] do_fork+0xa1/0x480
Apr  8 02:23:58 psysbkp2 kernel: [<ffffffff810096f0>] ? __switch_to+0xd0/0x320
Apr  8 02:23:58 psysbkp2 kernel: [<ffffffff8100c1a2>] kernel_thread+0x82/0xe0
Apr  8 02:23:58 psysbkp2 kernel: [<ffffffff8109ae50>] ? kthread+0x0/0xa0
Apr  8 02:23:58 psysbkp2 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
Apr  8 02:23:58 psysbkp2 kernel: [<ffffffff8109ae15>] ? kthreadd+0x185/0x1c0
Apr  8 02:23:58 psysbkp2 kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20
Apr  8 02:23:58 psysbkp2 kernel: [<ffffffff8109ac90>] ? kthreadd+0x0/0x1c0
Apr  8 02:23:58 psysbkp2 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
Apr  8 02:23:58 psysbkp2 kernel: INFO: task spl_system_task:841 blocked for more than 120 seconds.
Apr  8 02:23:58 psysbkp2 kernel:      Tainted: P           ---------------    2.6.32-431.5.1.el6.x86_64 #1
Apr  8 02:23:58 psysbkp2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr  8 02:23:58 psysbkp2 kernel: spl_system_ta D 0000000000000000     0   841      2 0x00000000
Apr  8 02:23:58 psysbkp2 kernel: ffff880436a731f0 0000000000000046 0000000000000000 ffff88010005f200
Apr  8 02:23:58 psysbkp2 kernel: ffffffffa0df39c0 0000000000000000 ffff880436a73190 ffffffffa0de2b72
Apr  8 02:23:58 psysbkp2 kernel: ffff8804356cfab8 ffff880436a73fd8 000000000000fbc8 ffff8804356cfab8
Apr  8 02:23:58 psysbkp2 kernel: Call Trace:
Apr  8 02:23:58 psysbkp2 kernel: [<ffffffffa0de2b72>] ? zio_vdev_io_done+0x42/0x190 [zfs]
Apr  8 02:23:58 psysbkp2 kernel: [<ffffffff8109b4de>] ? prepare_to_wait_exclusive+0x4e/0x80
Apr  8 02:23:58 psysbkp2 kernel: [<ffffffffa0c8d475>] cv_wait_common+0x105/0x1c0 [spl]
Apr  8 02:23:58 psysbkp2 kernel: [<ffffffff8109b290>] ? autoremove_wake_function+0x0/0x40
Apr  8 02:23:58 psysbkp2 kernel: [<ffffffffa0c8d585>] __cv_wait+0x15/0x20 [spl]
Apr  8 02:23:58 psysbkp2 kernel: [<ffffffffa0d5cdb5>] traverse_prefetcher+0xa5/0x150 [zfs]
Apr  8 02:23:58 psysbkp2 kernel: [<ffffffff8152921e>] ? mutex_lock+0x1e/0x50
Apr  8 02:23:58 psysbkp2 kernel: [<ffffffffa0d5d149>] traverse_visitbp+0x2e9/0x830 [zfs]
Apr  8 02:23:58 psysbkp2 kernel: [<ffffffffa0d46609>] ? arc_read+0x6b9/0x8b0 [zfs]
Apr  8 02:23:58 psysbkp2 kernel: [<ffffffffa0d42c60>] ? arc_getbuf_func+0x0/0x80 [zfs]
Apr  8 02:23:58 psysbkp2 kernel: [<ffffffffa0d5d2ed>] traverse_visitbp+0x48d/0x830 [zfs]
Apr  8 02:23:58 psysbkp2 kernel: [<ffffffffa0d5d2ed>] traverse_visitbp+0x48d/0x830 [zfs]
Apr  8 02:23:58 psysbkp2 kernel: [<ffffffffa0de3de3>] ? zio_buf_alloc+0x23/0x30 [zfs]
Apr  8 02:23:58 psysbkp2 kernel: [<ffffffffa0d5daa4>] traverse_dnode+0x74/0x100 [zfs]
Apr  8 02:23:58 psysbkp2 kernel: [<ffffffffa0d5d473>] traverse_visitbp+0x613/0x830 [zfs]
Apr  8 02:23:58 psysbkp2 kernel: [<ffffffffa0d46609>] ? arc_read+0x6b9/0x8b0 [zfs]
Apr  8 02:23:58 psysbkp2 kernel: [<ffffffffa0d5d2ed>] traverse_visitbp+0x48d/0x830 [zfs]
Apr  8 02:23:58 psysbkp2 kernel: [<ffffffffa0d5d2ed>] traverse_visitbp+0x48d/0x830 [zfs]
Apr  8 02:23:58 psysbkp2 kernel: [<ffffffffa0d5d2ed>] traverse_visitbp+0x48d/0x830 [zfs]
Apr  8 02:23:58 psysbkp2 kernel: [<ffffffffa0d46609>] ? arc_read+0x6b9/0x8b0 [zfs]
Apr  8 02:23:58 psysbkp2 kernel: [<ffffffffa0d5d2ed>] traverse_visitbp+0x48d/0x830 [zfs]
Apr  8 02:23:58 psysbkp2 kernel: [<ffffffffa0d46609>] ? arc_read+0x6b9/0x8b0 [zfs]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0d5d2ed>] traverse_visitbp+0x48d/0x830 [zfs]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0d46609>] ? arc_read+0x6b9/0x8b0 [zfs]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0d5d2ed>] traverse_visitbp+0x48d/0x830 [zfs]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0de3de3>] ? zio_buf_alloc+0x23/0x30 [zfs]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0d45557>] ? arc_get_data_buf+0x307/0x480 [zfs]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0d5daa4>] traverse_dnode+0x74/0x100 [zfs]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0d5d531>] traverse_visitbp+0x6d1/0x830 [zfs]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff8105a570>] ? __dequeue_entity+0x30/0x50
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff810096f0>] ? __switch_to+0xd0/0x320
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0d5dbb3>] traverse_prefetch_thread+0x83/0xc0 [zfs]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0d5cd10>] ? traverse_prefetcher+0x0/0x150 [zfs]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0c86628>] taskq_thread+0x218/0x4b0 [spl]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff81527c20>] ? thread_return+0x4e/0x76e
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff81065df0>] ? default_wake_function+0x0/0x20
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0c86410>] ? taskq_thread+0x0/0x4b0 [spl]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff8109aee6>] kthread+0x96/0xa0
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff8109ae50>] ? kthread+0x0/0xa0
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
Apr  8 02:23:59 psysbkp2 kernel: INFO: task z_rd_int/1:1100 blocked for more than 120 seconds.
Apr  8 02:23:59 psysbkp2 kernel:      Tainted: P           ---------------    2.6.32-431.5.1.el6.x86_64 #1
Apr  8 02:23:59 psysbkp2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr  8 02:23:59 psysbkp2 kernel: z_rd_int/1    D 0000000000000001     0  1100      2 0x00000000
Apr  8 02:23:59 psysbkp2 kernel: ffff88042c115c30 0000000000000046 0000000000000048 0000000000000282
Apr  8 02:23:59 psysbkp2 kernel: ffff88042c115cc0 ffffffffa0c7c9e7 ffff880435418060 ffff88042c115c00
Apr  8 02:23:59 psysbkp2 kernel: ffff880435bbd098 ffff88042c115fd8 000000000000fbc8 ffff880435bbd098
Apr  8 02:23:59 psysbkp2 kernel: Call Trace:
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0c7c9e7>] ? spl_debug_msg+0x427/0x9d0 [spl]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff8152938e>] __mutex_lock_slowpath+0x13e/0x180
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff8152922b>] mutex_lock+0x2b/0x50
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0d8c249>] spa_async_request+0x49/0xb0 [zfs]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0da1f5b>] vdev_probe+0x20b/0x210 [zfs]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff811c4ab3>] ? check_disk_change+0x33/0x80
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0de2c0e>] zio_vdev_io_done+0xde/0x190 [zfs]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0de3ab3>] zio_execute+0xb3/0x140 [zfs]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0c86628>] taskq_thread+0x218/0x4b0 [spl]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff81527c20>] ? thread_return+0x4e/0x76e
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff81065df0>] ? default_wake_function+0x0/0x20
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0c86410>] ? taskq_thread+0x0/0x4b0 [spl]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff8109aee6>] kthread+0x96/0xa0
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff8109ae50>] ? kthread+0x0/0xa0
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
Apr  8 02:23:59 psysbkp2 kernel: INFO: task txg_sync:1181 blocked for more than 120 seconds.
Apr  8 02:23:59 psysbkp2 kernel:      Tainted: P           ---------------    2.6.32-431.5.1.el6.x86_64 #1
Apr  8 02:23:59 psysbkp2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr  8 02:23:59 psysbkp2 kernel: txg_sync      D 0000000000000006     0  1181      2 0x00000000
Apr  8 02:23:59 psysbkp2 kernel: ffff88042a4d1a70 0000000000000046 ffff88042a4d19f0 ffff88042cc4d578
Apr  8 02:23:59 psysbkp2 kernel: 0000000000000000 ffff8800283968a8 0000000000000001 ffff880028396840
Apr  8 02:23:59 psysbkp2 kernel: ffff88042cc4daf8 ffff88042a4d1fd8 000000000000fbc8 ffff88042cc4daf8
Apr  8 02:23:59 psysbkp2 kernel: Call Trace:
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff81528ac5>] schedule_timeout+0x215/0x2e0
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff81068ff5>] ? enqueue_entity+0x125/0x450
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff81069384>] ? enqueue_task_fair+0x64/0x100
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff81528743>] wait_for_common+0x123/0x180
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff81065df0>] ? default_wake_function+0x0/0x20
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0d910f0>] ? spa_async_thread+0x0/0x280 [zfs]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff8152885d>] wait_for_completion+0x1d/0x20
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff8109b059>] kthread_create+0x99/0x120
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0c85410>] ? thread_generic_wrapper+0x0/0x80 [spl]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0c81d0b>] ? kmem_free_debug+0x4b/0x150 [spl]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0c851dc>] __thread_create+0x11c/0x350 [spl]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff8152921e>] ? mutex_lock+0x1e/0x50
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0d8bcac>] spa_sync+0x90c/0xa80 [zfs]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff810149b9>] ? read_tsc+0x9/0x20
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0d9e707>] txg_sync_thread+0x307/0x590 [zfs]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff81059329>] ? set_user_nice+0xc9/0x130
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0d9e400>] ? txg_sync_thread+0x0/0x590 [zfs]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0c85478>] thread_generic_wrapper+0x68/0x80 [spl]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0c85410>] ? thread_generic_wrapper+0x0/0x80 [spl]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff8109aee6>] kthread+0x96/0xa0
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff8109ae50>] ? kthread+0x0/0xa0
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
Apr  8 02:23:59 psysbkp2 kernel: INFO: task rsync:28118 blocked for more than 120 seconds.
Apr  8 02:23:59 psysbkp2 kernel:      Tainted: P           ---------------    2.6.32-431.5.1.el6.x86_64 #1
Apr  8 02:23:59 psysbkp2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr  8 02:23:59 psysbkp2 kernel: rsync         D 0000000000000005     0 28118  28117 0x00000080
Apr  8 02:23:59 psysbkp2 kernel: ffff88024ed43b28 0000000000000082 0000000000000055 ffff880100000001
Apr  8 02:23:59 psysbkp2 kernel: 0000000100000044 0000000000000001 ffff88024ed43bc8 0000000000000086
Apr  8 02:23:59 psysbkp2 kernel: ffff8804344ea638 ffff88024ed43fd8 000000000000fbc8 ffff8804344ea638
Apr  8 02:23:59 psysbkp2 kernel: Call Trace:
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff8109b4de>] ? prepare_to_wait_exclusive+0x4e/0x80
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0c8d475>] cv_wait_common+0x105/0x1c0 [spl]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff8109b290>] ? autoremove_wake_function+0x0/0x40
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0c8d585>] __cv_wait+0x15/0x20 [spl]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0d9ddbb>] txg_wait_open+0x8b/0x110 [zfs]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0d5e7cd>] dmu_tx_wait+0xed/0xf0 [zfs]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0dd784e>] zfs_write+0x3be/0xca0 [zfs]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0deb2c2>] zpl_write_common+0x52/0x70 [zfs]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0deb348>] zpl_write+0x68/0xa0 [zfs]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff812263b6>] ? security_file_permission+0x16/0x20
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff81188f68>] vfs_write+0xb8/0x1a0
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff81189861>] sys_write+0x51/0x90
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff810e1e4e>] ? __audit_syscall_exit+0x25e/0x290
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Apr  8 02:23:59 psysbkp2 kernel: INFO: task rsync:28180 blocked for more than 120 seconds.
Apr  8 02:23:59 psysbkp2 kernel:      Tainted: P           ---------------    2.6.32-431.5.1.el6.x86_64 #1
Apr  8 02:23:59 psysbkp2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr  8 02:23:59 psysbkp2 kernel: rsync         D 0000000000000003     0 28180  28179 0x00000080
Apr  8 02:23:59 psysbkp2 kernel: ffff88031e7bda88 0000000000000086 0000000000000055 ffff880400000005
Apr  8 02:23:59 psysbkp2 kernel: 0000000500000044 0000000000000001 ffff88031e7bdb28 0000000000000082
Apr  8 02:23:59 psysbkp2 kernel: ffff880435eec638 ffff88031e7bdfd8 000000000000fbc8 ffff880435eec638
Apr  8 02:23:59 psysbkp2 kernel: Call Trace:
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff8109b4de>] ? prepare_to_wait_exclusive+0x4e/0x80
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0c8d475>] cv_wait_common+0x105/0x1c0 [spl]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff8109b290>] ? autoremove_wake_function+0x0/0x40
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0c8d585>] __cv_wait+0x15/0x20 [spl]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0d9ddbb>] txg_wait_open+0x8b/0x110 [zfs]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0d5e7cd>] dmu_tx_wait+0xed/0xf0 [zfs]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0dd8d7c>] zfs_rename+0x43c/0xda0 [zfs]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff8122751f>] ? security_inode_permission+0x1f/0x30
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0deb9ae>] zpl_rename+0x5e/0x90 [zfs]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff81197ef9>] vfs_rename+0x419/0x480
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff8119a4f9>] sys_renameat+0x309/0x3a0
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff811aacd4>] ? mnt_drop_write+0x34/0x40
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff812826f5>] ? _atomic_dec_and_lock+0x55/0x80
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff811aaa10>] ? mntput_no_expire+0x30/0x110
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff811974f1>] ? path_put+0x31/0x40
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff810e2057>] ? audit_syscall_entry+0x1d7/0x200
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff8119a5ab>] sys_rename+0x1b/0x20
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Apr  8 02:23:59 psysbkp2 kernel: INFO: task rsync:28367 blocked for more than 120 seconds.
Apr  8 02:23:59 psysbkp2 kernel:      Tainted: P           ---------------    2.6.32-431.5.1.el6.x86_64 #1
Apr  8 02:23:59 psysbkp2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr  8 02:23:59 psysbkp2 kernel: rsync         D 0000000000000007     0 28367  28366 0x00000080
Apr  8 02:23:59 psysbkp2 kernel: ffff88017e27fb28 0000000000000086 0000000000000055 ffff880100000001
Apr  8 02:23:59 psysbkp2 kernel: 0000000100000044 0000000000000001 ffff88017e27fbc8 0000000000000086
Apr  8 02:23:59 psysbkp2 kernel: ffff880434ff1af8 ffff88017e27ffd8 000000000000fbc8 ffff880434ff1af8
Apr  8 02:23:59 psysbkp2 kernel: Call Trace:
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff8109b4de>] ? prepare_to_wait_exclusive+0x4e/0x80
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0c8d475>] cv_wait_common+0x105/0x1c0 [spl]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff8109b290>] ? autoremove_wake_function+0x0/0x40
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0c8d585>] __cv_wait+0x15/0x20 [spl]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0d9ddbb>] txg_wait_open+0x8b/0x110 [zfs]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0d5e7cd>] dmu_tx_wait+0xed/0xf0 [zfs]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0dd784e>] zfs_write+0x3be/0xca0 [zfs]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0deb2c2>] zpl_write_common+0x52/0x70 [zfs]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0deb348>] zpl_write+0x68/0xa0 [zfs]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff812263b6>] ? security_file_permission+0x16/0x20
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff81188f68>] vfs_write+0xb8/0x1a0
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff81189861>] sys_write+0x51/0x90
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff810e1e4e>] ? __audit_syscall_exit+0x25e/0x290
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Apr  8 02:23:59 psysbkp2 kernel: INFO: task rsync:28371 blocked for more than 120 seconds.
Apr  8 02:23:59 psysbkp2 kernel:      Tainted: P           ---------------    2.6.32-431.5.1.el6.x86_64 #1
Apr  8 02:23:59 psysbkp2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr  8 02:23:59 psysbkp2 kernel: rsync         D 0000000000000005     0 28371  28369 0x00000080
Apr  8 02:23:59 psysbkp2 kernel: ffff8802ccb23b28 0000000000000086 0000000000000055 ffff880100000001
Apr  8 02:23:59 psysbkp2 kernel: 0000000100000044 0000000000000001 ffff8802ccb23bc8 0000000000000086
Apr  8 02:23:59 psysbkp2 kernel: ffff8804342e65f8 ffff8802ccb23fd8 000000000000fbc8 ffff8804342e65f8
Apr  8 02:23:59 psysbkp2 kernel: Call Trace:
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff8109b4de>] ? prepare_to_wait_exclusive+0x4e/0x80
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0c8d475>] cv_wait_common+0x105/0x1c0 [spl]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff8109b290>] ? autoremove_wake_function+0x0/0x40
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0c8d585>] __cv_wait+0x15/0x20 [spl]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0d9ddbb>] txg_wait_open+0x8b/0x110 [zfs]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0d5e7cd>] dmu_tx_wait+0xed/0xf0 [zfs]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0dd784e>] zfs_write+0x3be/0xca0 [zfs]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0deb2c2>] zpl_write_common+0x52/0x70 [zfs]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0deb348>] zpl_write+0x68/0xa0 [zfs]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff812263b6>] ? security_file_permission+0x16/0x20
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff81188f68>] vfs_write+0xb8/0x1a0
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff81189861>] sys_write+0x51/0x90
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff810e1e4e>] ? __audit_syscall_exit+0x25e/0x290
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Apr  8 02:23:59 psysbkp2 kernel: INFO: task rsync:28373 blocked for more than 120 seconds.
Apr  8 02:23:59 psysbkp2 kernel:      Tainted: P           ---------------    2.6.32-431.5.1.el6.x86_64 #1
Apr  8 02:23:59 psysbkp2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr  8 02:23:59 psysbkp2 kernel: rsync         D 0000000000000005     0 28373  28372 0x00000080
Apr  8 02:23:59 psysbkp2 kernel: ffff8803f3739b28 0000000000000086 0000000000000055 ffff880100000006
Apr  8 02:23:59 psysbkp2 kernel: 0000000600000044 0000000000000001 ffff8803f3739bc8 0000000000000086
Apr  8 02:23:59 psysbkp2 kernel: ffff8804346ffaf8 ffff8803f3739fd8 000000000000fbc8 ffff8804346ffaf8
Apr  8 02:23:59 psysbkp2 kernel: Call Trace:
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff8109b4de>] ? prepare_to_wait_exclusive+0x4e/0x80
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0c8d475>] cv_wait_common+0x105/0x1c0 [spl]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff8109b290>] ? autoremove_wake_function+0x0/0x40
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0c8d585>] __cv_wait+0x15/0x20 [spl]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0d9ddbb>] txg_wait_open+0x8b/0x110 [zfs]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0d5e7cd>] dmu_tx_wait+0xed/0xf0 [zfs]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0dd784e>] zfs_write+0x3be/0xca0 [zfs]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0deb2c2>] zpl_write_common+0x52/0x70 [zfs]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffffa0deb348>] zpl_write+0x68/0xa0 [zfs]
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff812263b6>] ? security_file_permission+0x16/0x20
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff81188f68>] vfs_write+0xb8/0x1a0
Apr  8 02:23:59 psysbkp2 kernel: [<ffffffff81189861>] sys_write+0x51/0x90
Apr  8 02:24:00 psysbkp2 kernel: [<ffffffff810e1e4e>] ? __audit_syscall_exit+0x25e/0x290
Apr  8 02:24:00 psysbkp2 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Apr  8 02:24:00 psysbkp2 kernel: INFO: task rsync:28380 blocked for more than 120 seconds.
Apr  8 02:24:00 psysbkp2 kernel:      Tainted: P           ---------------    2.6.32-431.5.1.el6.x86_64 #1
Apr  8 02:24:00 psysbkp2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr  8 02:24:00 psysbkp2 kernel: rsync         D 0000000000000002     0 28380  28379 0x00000080
Apr  8 02:24:00 psysbkp2 kernel: ffff88029b995b28 0000000000000082 0000000000000055 ffff880100000005
Apr  8 02:24:00 psysbkp2 kernel: 0000000500000044 0000000000000001 ffff88029b995bc8 0000000000000086
Apr  8 02:24:00 psysbkp2 kernel: ffff880427ecfab8 ffff88029b995fd8 000000000000fbc8 ffff880427ecfab8
Apr  8 02:24:00 psysbkp2 kernel: Call Trace:
Apr  8 02:24:00 psysbkp2 kernel: [<ffffffff8109b4de>] ? prepare_to_wait_exclusive+0x4e/0x80
Apr  8 02:24:00 psysbkp2 kernel: [<ffffffffa0c8d475>] cv_wait_common+0x105/0x1c0 [spl]
Apr  8 02:24:00 psysbkp2 kernel: [<ffffffff8109b290>] ? autoremove_wake_function+0x0/0x40
Apr  8 02:24:00 psysbkp2 kernel: [<ffffffffa0c8d585>] __cv_wait+0x15/0x20 [spl]
Apr  8 02:24:00 psysbkp2 kernel: [<ffffffffa0d9ddbb>] txg_wait_open+0x8b/0x110 [zfs]
Apr  8 02:24:00 psysbkp2 kernel: [<ffffffffa0d5e7cd>] dmu_tx_wait+0xed/0xf0 [zfs]
Apr  8 02:24:00 psysbkp2 kernel: [<ffffffffa0dd784e>] zfs_write+0x3be/0xca0 [zfs]
Apr  8 02:24:00 psysbkp2 kernel: [<ffffffff8144b4d4>] ? release_sock+0xe4/0xf0
Apr  8 02:24:00 psysbkp2 kernel: [<ffffffffa0deb2c2>] zpl_write_common+0x52/0x70 [zfs]
Apr  8 02:24:00 psysbkp2 kernel: [<ffffffffa0deb348>] zpl_write+0x68/0xa0 [zfs]
Apr  8 02:24:00 psysbkp2 kernel: [<ffffffff812263b6>] ? security_file_permission+0x16/0x20
Apr  8 02:24:00 psysbkp2 kernel: [<ffffffff81188f68>] vfs_write+0xb8/0x1a0
Apr  8 02:24:00 psysbkp2 kernel: [<ffffffff81189861>] sys_write+0x51/0x90
Apr  8 02:24:00 psysbkp2 kernel: [<ffffffff810e1e4e>] ? __audit_syscall_exit+0x25e/0x290
Apr  8 02:24:00 psysbkp2 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
@behlendorf
Copy link
Contributor

I'm sorry to say it does look zfs related. However, based in the stacks this might be a known issue which was addressed in the latest code.

@behlendorf behlendorf added this to the 0.6.4 milestone Apr 14, 2014
@behlendorf behlendorf added the Bug label Apr 14, 2014
@jby
Copy link
Author

jby commented Apr 16, 2014

Any guess as to when the next release is expected?

@jby
Copy link
Author

jby commented Apr 17, 2014

Had another early this morning, don't know if it's the same or not.

Apr 17 07:15:35 psysbkp2 kernel: INFO: task spl_system_task:843 blocked for more than 120 seconds.
Apr 17 07:15:35 psysbkp2 kernel:      Tainted: P           ---------------    2.6.32-431.11.2.el6.x86_64 #1
Apr 17 07:15:35 psysbkp2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 17 07:15:35 psysbkp2 kernel: spl_system_ta D 0000000000000003     0   843      2 0x00000000
Apr 17 07:15:35 psysbkp2 kernel: ffff8804386f9110 0000000000000046 0000000000000000 ffff88009e8e64e0
Apr 17 07:15:35 psysbkp2 kernel: ffffffffa0df39c0 0000000000000000 ffff8804386f90b0 ffffffffa0de2b72
Apr 17 07:15:35 psysbkp2 kernel: ffff8804371cfab8 ffff8804386f9fd8 000000000000fbc8 ffff8804371cfab8
Apr 17 07:15:35 psysbkp2 kernel: Call Trace:
Apr 17 07:15:35 psysbkp2 kernel: [<ffffffffa0de2b72>] ? zio_vdev_io_done+0x42/0x190 [zfs]
Apr 17 07:15:35 psysbkp2 kernel: [<ffffffff8109b4de>] ? prepare_to_wait_exclusive+0x4e/0x80
Apr 17 07:15:35 psysbkp2 kernel: [<ffffffffa0c8d475>] cv_wait_common+0x105/0x1c0 [spl]
Apr 17 07:15:35 psysbkp2 kernel: [<ffffffff8109b290>] ? autoremove_wake_function+0x0/0x40
Apr 17 07:15:35 psysbkp2 kernel: [<ffffffffa0c8d585>] __cv_wait+0x15/0x20 [spl]
Apr 17 07:15:35 psysbkp2 kernel: [<ffffffffa0d5cdb5>] traverse_prefetcher+0xa5/0x150 [zfs]
Apr 17 07:15:35 psysbkp2 kernel: [<ffffffff8152952e>] ? mutex_lock+0x1e/0x50
Apr 17 07:15:35 psysbkp2 kernel: [<ffffffffa0d5d149>] traverse_visitbp+0x2e9/0x830 [zfs]
Apr 17 07:15:35 psysbkp2 kernel: [<ffffffffa0d46609>] ? arc_read+0x6b9/0x8b0 [zfs]
Apr 17 07:15:35 psysbkp2 kernel: [<ffffffffa0d42c60>] ? arc_getbuf_func+0x0/0x80 [zfs]
Apr 17 07:15:35 psysbkp2 kernel: [<ffffffffa0d5d2ed>] traverse_visitbp+0x48d/0x830 [zfs]
Apr 17 07:15:35 psysbkp2 kernel: [<ffffffffa0d5d2ed>] traverse_visitbp+0x48d/0x830 [zfs]
Apr 17 07:15:35 psysbkp2 kernel: [<ffffffffa0d5d2ed>] traverse_visitbp+0x48d/0x830 [zfs]
Apr 17 07:15:35 psysbkp2 kernel: [<ffffffff8152952e>] ? mutex_lock+0x1e/0x50
Apr 17 07:15:35 psysbkp2 kernel: [<ffffffffa0d5daa4>] traverse_dnode+0x74/0x100 [zfs]
Apr 17 07:15:35 psysbkp2 kernel: [<ffffffffa0d5d473>] traverse_visitbp+0x613/0x830 [zfs]
Apr 17 07:15:35 psysbkp2 kernel: [<ffffffffa0d5d2ed>] traverse_visitbp+0x48d/0x830 [zfs]
Apr 17 07:15:35 psysbkp2 kernel: [<ffffffffa0d46609>] ? arc_read+0x6b9/0x8b0 [zfs]
Apr 17 07:15:35 psysbkp2 kernel: [<ffffffffa0d5d2ed>] traverse_visitbp+0x48d/0x830 [zfs]
Apr 17 07:15:35 psysbkp2 kernel: [<ffffffffa0d46609>] ? arc_read+0x6b9/0x8b0 [zfs]
Apr 17 07:15:35 psysbkp2 kernel: [<ffffffffa0d5d2ed>] traverse_visitbp+0x48d/0x830 [zfs]
Apr 17 07:15:35 psysbkp2 kernel: [<ffffffffa0d46609>] ? arc_read+0x6b9/0x8b0 [zfs]
Apr 17 07:15:35 psysbkp2 kernel: [<ffffffffa0d5d2ed>] traverse_visitbp+0x48d/0x830 [zfs]
Apr 17 07:15:35 psysbkp2 kernel: [<ffffffffa0d46609>] ? arc_read+0x6b9/0x8b0 [zfs]
Apr 17 07:15:35 psysbkp2 kernel: [<ffffffffa0d5d2ed>] traverse_visitbp+0x48d/0x830 [zfs]
Apr 17 07:15:35 psysbkp2 kernel: [<ffffffffa0d46609>] ? arc_read+0x6b9/0x8b0 [zfs]
Apr 17 07:15:35 psysbkp2 kernel: [<ffffffffa0d5d2ed>] traverse_visitbp+0x48d/0x830 [zfs]
Apr 17 07:15:35 psysbkp2 kernel: [<ffffffffa0de3de3>] ? zio_buf_alloc+0x23/0x30 [zfs]
Apr 17 07:15:35 psysbkp2 kernel: [<ffffffffa0d45557>] ? arc_get_data_buf+0x307/0x480 [zfs]
Apr 17 07:15:44 psysbkp2 kernel: [<ffffffffa0d5daa4>] traverse_dnode+0x74/0x100 [zfs]
Apr 17 07:15:44 psysbkp2 kernel: [<ffffffffa0d5d531>] traverse_visitbp+0x6d1/0x830 [zfs]
Apr 17 07:15:44 psysbkp2 kernel: [<ffffffff81069ccb>] ? dequeue_task_fair+0x12b/0x130
Apr 17 07:15:44 psysbkp2 kernel: [<ffffffff810096f0>] ? __switch_to+0xd0/0x320
Apr 17 07:15:44 psysbkp2 kernel: [<ffffffffa0d5dbb3>] traverse_prefetch_thread+0x83/0xc0 [zfs]
Apr 17 07:15:44 psysbkp2 kernel: [<ffffffffa0d5cd10>] ? traverse_prefetcher+0x0/0x150 [zfs]
Apr 17 07:15:44 psysbkp2 kernel: [<ffffffffa0c86628>] taskq_thread+0x218/0x4b0 [spl]
Apr 17 07:15:50 psysbkp2 kernel: [<ffffffff81527f30>] ? thread_return+0x4e/0x76e
Apr 17 07:15:50 psysbkp2 kernel: [<ffffffff81065df0>] ? default_wake_function+0x0/0x20
Apr 17 07:15:50 psysbkp2 kernel: [<ffffffffa0c86410>] ? taskq_thread+0x0/0x4b0 [spl]
Apr 17 07:15:50 psysbkp2 kernel: [<ffffffff8109aee6>] kthread+0x96/0xa0
Apr 17 07:15:50 psysbkp2 kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20
Apr 17 07:15:50 psysbkp2 kernel: [<ffffffff8109ae50>] ? kthread+0x0/0xa0
Apr 17 07:15:50 psysbkp2 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
Apr 17 07:15:50 psysbkp2 kernel: INFO: task txg_sync:1188 blocked for more than 120 seconds.
Apr 17 07:15:50 psysbkp2 kernel:      Tainted: P           ---------------    2.6.32-431.11.2.el6.x86_64 #1
Apr 17 07:15:53 psysbkp2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 17 07:15:53 psysbkp2 kernel: txg_sync      D 0000000000000004     0  1188      2 0x00000000
Apr 17 07:15:53 psysbkp2 kernel: ffff880428fbdb60 0000000000000046 0000000000000001 ffff88043597ac70
Apr 17 07:15:53 psysbkp2 kernel: 0000000000000000 0000000000000000 ffff880428fbdae0 ffffffff81065e02
Apr 17 07:15:53 psysbkp2 kernel: ffff880435e77af8 ffff880428fbdfd8 000000000000fbc8 ffff880435e77af8
Apr 17 07:15:53 psysbkp2 kernel: Call Trace:
Apr 17 07:15:53 psysbkp2 kernel: [<ffffffff81065e02>] ? default_wake_function+0x12/0x20
Apr 17 07:15:53 psysbkp2 kernel: [<ffffffff810a7091>] ? ktime_get_ts+0xb1/0xf0
Apr 17 07:15:53 psysbkp2 kernel: [<ffffffff815286c3>] io_schedule+0x73/0xc0
Apr 17 07:15:53 psysbkp2 kernel: [<ffffffffa0c8d41c>] cv_wait_common+0xac/0x1c0 [spl]
Apr 17 07:15:53 psysbkp2 kernel: [<ffffffffa0de3a00>] ? zio_execute+0x0/0x140 [zfs]
Apr 17 07:15:53 psysbkp2 kernel: [<ffffffff8109b290>] ? autoremove_wake_function+0x0/0x40
Apr 17 07:15:53 psysbkp2 kernel: [<ffffffffa0c8d548>] __cv_wait_io+0x18/0x20 [spl]
Apr 17 07:15:53 psysbkp2 kernel: [<ffffffffa0de3c3b>] zio_wait+0xfb/0x1b0 [zfs]
Apr 17 07:15:53 psysbkp2 kernel: [<ffffffffa0d77235>] dsl_pool_sync+0x2f5/0x540 [zfs]
Apr 17 07:15:53 psysbkp2 kernel: [<ffffffffa0d8b7ae>] spa_sync+0x40e/0xa80 [zfs]
Apr 17 07:15:53 psysbkp2 kernel: [<ffffffff810149b9>] ? read_tsc+0x9/0x20
Apr 17 07:15:53 psysbkp2 kernel: [<ffffffffa0d9e707>] txg_sync_thread+0x307/0x590 [zfs]
Apr 17 07:15:53 psysbkp2 kernel: [<ffffffff81059329>] ? set_user_nice+0xc9/0x130
Apr 17 07:15:53 psysbkp2 kernel: [<ffffffffa0d9e400>] ? txg_sync_thread+0x0/0x590 [zfs]
Apr 17 07:15:53 psysbkp2 kernel: [<ffffffffa0c85478>] thread_generic_wrapper+0x68/0x80 [spl]
Apr 17 07:15:53 psysbkp2 kernel: [<ffffffffa0c85410>] ? thread_generic_wrapper+0x0/0x80 [spl]
Apr 17 07:16:04 psysbkp2 kernel: [<ffffffff8109aee6>] kthread+0x96/0xa0
Apr 17 07:16:04 psysbkp2 kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20
Apr 17 07:16:04 psysbkp2 kernel: [<ffffffff8109ae50>] ? kthread+0x0/0xa0
Apr 17 07:16:04 psysbkp2 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
Apr 17 07:16:04 psysbkp2 kernel: INFO: task txg_sync:1376 blocked for more than 120 seconds.
Apr 17 07:16:04 psysbkp2 kernel:      Tainted: P           ---------------    2.6.32-431.11.2.el6.x86_64 #1
Apr 17 07:16:04 psysbkp2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 17 07:16:04 psysbkp2 kernel: txg_sync      D 0000000000000001     0  1376      2 0x00000000
Apr 17 07:16:04 psysbkp2 kernel: ffff880434405b60 0000000000000046 0000000000000001 ffff880427a53e30
Apr 17 07:16:04 psysbkp2 kernel: 0000000000000000 0000000000000000 ffff880434405ae0 ffffffff81065e02
Apr 17 07:16:04 psysbkp2 kernel: ffff880436467ab8 ffff880434405fd8 000000000000fbc8 ffff880436467ab8
Apr 17 07:16:04 psysbkp2 kernel: Call Trace:
Apr 17 07:16:04 psysbkp2 kernel: [<ffffffff81065e02>] ? default_wake_function+0x12/0x20
Apr 17 07:16:04 psysbkp2 kernel: [<ffffffff810a7091>] ? ktime_get_ts+0xb1/0xf0
Apr 17 07:16:04 psysbkp2 kernel: [<ffffffff815286c3>] io_schedule+0x73/0xc0
Apr 17 07:16:04 psysbkp2 kernel: [<ffffffffa0c8d41c>] cv_wait_common+0xac/0x1c0 [spl]
Apr 17 07:16:04 psysbkp2 kernel: [<ffffffffa0de3a00>] ? zio_execute+0x0/0x140 [zfs]
Apr 17 07:16:04 psysbkp2 kernel: [<ffffffff8109b290>] ? autoremove_wake_function+0x0/0x40
Apr 17 07:16:04 psysbkp2 kernel: [<ffffffffa0c8d548>] __cv_wait_io+0x18/0x20 [spl]
Apr 17 07:16:04 psysbkp2 kernel: [<ffffffffa0de3c3b>] zio_wait+0xfb/0x1b0 [zfs]
Apr 17 07:16:04 psysbkp2 kernel: [<ffffffffa0d77235>] dsl_pool_sync+0x2f5/0x540 [zfs]
Apr 17 07:16:04 psysbkp2 kernel: [<ffffffffa0d8b7ae>] spa_sync+0x40e/0xa80 [zfs]
Apr 17 07:16:04 psysbkp2 kernel: [<ffffffff810149b9>] ? read_tsc+0x9/0x20
Apr 17 07:16:04 psysbkp2 kernel: [<ffffffffa0d9e707>] txg_sync_thread+0x307/0x590 [zfs]
Apr 17 07:16:04 psysbkp2 kernel: [<ffffffff81059329>] ? set_user_nice+0xc9/0x130
Apr 17 07:16:04 psysbkp2 kernel: [<ffffffffa0d9e400>] ? txg_sync_thread+0x0/0x590 [zfs]
Apr 17 07:16:04 psysbkp2 kernel: [<ffffffffa0c85478>] thread_generic_wrapper+0x68/0x80 [spl]
Apr 17 07:16:04 psysbkp2 kernel: [<ffffffffa0c85410>] ? thread_generic_wrapper+0x0/0x80 [spl]
Apr 17 07:16:04 psysbkp2 kernel: [<ffffffff8109aee6>] kthread+0x96/0xa0
Apr 17 07:16:04 psysbkp2 kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20
Apr 17 07:16:04 psysbkp2 kernel: [<ffffffff8109ae50>] ? kthread+0x0/0xa0
Apr 17 07:16:04 psysbkp2 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
Apr 17 07:16:04 psysbkp2 kernel: INFO: task jbd2/sda1-8:1425 blocked for more than 120 seconds.
Apr 17 07:16:04 psysbkp2 kernel:      Tainted: P           ---------------    2.6.32-431.11.2.el6.x86_64 #1
Apr 17 07:16:04 psysbkp2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 17 07:16:04 psysbkp2 kernel: jbd2/sda1-8   D 0000000000000005     0  1425      2 0x00000000
Apr 17 07:16:04 psysbkp2 kernel: ffff88013ddefa80 0000000000000046 0000000000000000 ffff88013ddefa44
Apr 17 07:16:04 psysbkp2 kernel: ffff88013ddef9f0 ffffffff810149b9 ffff88013ddefa30 ffffffff810a7091
Apr 17 07:16:04 psysbkp2 kernel: ffff88043381f058 ffff88013ddeffd8 000000000000fbc8 ffff88043381f058
Apr 17 07:16:04 psysbkp2 kernel: Call Trace:
Apr 17 07:16:04 psysbkp2 kernel: [<ffffffff810149b9>] ? read_tsc+0x9/0x20
Apr 17 07:16:04 psysbkp2 kernel: [<ffffffff810a7091>] ? ktime_get_ts+0xb1/0xf0
Apr 17 07:16:04 psysbkp2 kernel: [<ffffffff810a7091>] ? ktime_get_ts+0xb1/0xf0
Apr 17 07:16:04 psysbkp2 kernel: [<ffffffff8111f920>] ? sync_page+0x0/0x50
Apr 17 07:16:04 psysbkp2 kernel: [<ffffffff815286c3>] io_schedule+0x73/0xc0
Apr 17 07:16:04 psysbkp2 kernel: [<ffffffff8111f95d>] sync_page+0x3d/0x50
Apr 17 07:16:08 psysbkp2 kernel: [<ffffffff81528f5a>] __wait_on_bit_lock+0x5a/0xc0
Apr 17 07:16:08 psysbkp2 kernel: [<ffffffff8111f8f7>] __lock_page+0x67/0x70
Apr 17 07:16:08 psysbkp2 kernel: [<ffffffff8109b310>] ? wake_bit_function+0x0/0x50
Apr 17 07:16:08 psysbkp2 kernel: [<ffffffff81135c05>] ? pagevec_lookup_tag+0x25/0x40
Apr 17 07:16:08 psysbkp2 kernel: [<ffffffff81134b5d>] write_cache_pages+0x3cd/0x4c0
Apr 17 07:16:08 psysbkp2 kernel: [<ffffffff811336b0>] ? __writepage+0x0/0x40
Apr 17 07:16:08 psysbkp2 kernel: [<ffffffff81134c74>] generic_writepages+0x24/0x30
Apr 17 07:16:26 psysbkp2 kernel: [<ffffffffa0bad4d7>] journal_submit_inode_data_buffers+0x47/0x50 [jbd2]
Apr 17 07:16:26 psysbkp2 kernel: [<ffffffffa0bad9ed>] jbd2_journal_commit_transaction+0x37d/0x1500 [jbd2]
Apr 17 07:16:26 psysbkp2 kernel: [<ffffffff810096f0>] ? __switch_to+0xd0/0x320
Apr 17 07:16:26 psysbkp2 kernel: [<ffffffff8108410c>] ? lock_timer_base+0x3c/0x70
Apr 17 07:16:26 psysbkp2 kernel: [<ffffffff8109b290>] ? autoremove_wake_function+0x0/0x40
Apr 17 07:16:26 psysbkp2 kernel: [<ffffffffa0bb3a48>] kjournald2+0xb8/0x220 [jbd2]
Apr 17 07:16:26 psysbkp2 kernel: [<ffffffff8109b290>] ? autoremove_wake_function+0x0/0x40
Apr 17 07:16:26 psysbkp2 kernel: [<ffffffffa0bb3990>] ? kjournald2+0x0/0x220 [jbd2]
Apr 17 07:16:26 psysbkp2 kernel: [<ffffffff8109aee6>] kthread+0x96/0xa0
Apr 17 07:16:26 psysbkp2 kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20
Apr 17 07:16:26 psysbkp2 kernel: [<ffffffff8109ae50>] ? kthread+0x0/0xa0
Apr 17 07:16:26 psysbkp2 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
Apr 17 07:16:26 psysbkp2 kernel: INFO: task zfs:10153 blocked for more than 120 seconds.
Apr 17 07:16:26 psysbkp2 kernel:      Tainted: P           ---------------    2.6.32-431.11.2.el6.x86_64 #1
Apr 17 07:16:26 psysbkp2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 17 07:16:26 psysbkp2 kernel: zfs           D 0000000000000003     0 10153   9874 0x00000080
Apr 17 07:16:26 psysbkp2 kernel: ffff880241648f98 0000000000000082 ffff880241648ef8 ffffffff81065e02
Apr 17 07:16:26 psysbkp2 kernel: ffff880241648f48 0000000000000000 ffff88043466a2c0 ffff880241648f84
Apr 17 07:16:26 psysbkp2 kernel: ffff8803ab8a9ab8 ffff880241649fd8 000000000000fbc8 ffff8803ab8a9ab8
Apr 17 07:16:26 psysbkp2 kernel: Call Trace:
Apr 17 07:16:26 psysbkp2 kernel: [<ffffffff81065e02>] ? default_wake_function+0x12/0x20
Apr 17 07:16:26 psysbkp2 kernel: [<ffffffffa0c86285>] taskq_wait_id+0x65/0xa0 [spl]
Apr 17 07:16:26 psysbkp2 kernel: [<ffffffff8109b290>] ? autoremove_wake_function+0x0/0x40
Apr 17 07:16:26 psysbkp2 kernel: [<ffffffffa0d5b830>] ? dump_bytes_strategy+0x0/0xc0 [zfs]
Apr 17 07:16:26 psysbkp2 kernel: [<ffffffffa0d93bfa>] spa_taskq_dispatch_sync+0x7a/0xa0 [zfs]
Apr 17 07:16:26 psysbkp2 kernel: [<ffffffffa0d5b932>] dump_bytes+0x42/0x50 [zfs]
Apr 17 07:16:26 psysbkp2 kernel: [<ffffffffa0d5c3d5>] backup_cb+0x385/0x6c0 [zfs]
Apr 17 07:16:26 psysbkp2 kernel: [<ffffffffa0d5d149>] traverse_visitbp+0x2e9/0x830 [zfs]
Apr 17 07:16:26 psysbkp2 kernel: [<ffffffffa0d46609>] ? arc_read+0x6b9/0x8b0 [zfs]
Apr 17 07:16:26 psysbkp2 kernel: [<ffffffffa0d3f9a0>] ? remove_reference+0xa0/0xc0 [zfs]
Apr 17 07:16:26 psysbkp2 kernel: [<ffffffffa0d42c60>] ? arc_getbuf_func+0x0/0x80 [zfs]
Apr 17 07:16:26 psysbkp2 kernel: [<ffffffffa0d5d2ed>] traverse_visitbp+0x48d/0x830 [zfs]
Apr 17 07:16:26 psysbkp2 kernel: [<ffffffffa0d46609>] ? arc_read+0x6b9/0x8b0 [zfs]
Apr 17 07:16:26 psysbkp2 kernel: [<ffffffffa0d5d2ed>] traverse_visitbp+0x48d/0x830 [zfs]
Apr 17 07:16:26 psysbkp2 kernel: [<ffffffffa0d46609>] ? arc_read+0x6b9/0x8b0 [zfs]
Apr 17 07:16:26 psysbkp2 kernel: [<ffffffffa0d5d2ed>] traverse_visitbp+0x48d/0x830 [zfs]
Apr 17 07:16:26 psysbkp2 kernel: [<ffffffffa0d45437>] ? arc_get_data_buf+0x1e7/0x480 [zfs]
Apr 17 07:16:39 psysbkp2 kernel: [<ffffffffa0d5daa4>] traverse_dnode+0x74/0x100 [zfs]
Apr 17 07:16:39 psysbkp2 kernel: [<ffffffffa0d5d473>] traverse_visitbp+0x613/0x830 [zfs]
Apr 17 07:16:39 psysbkp2 kernel: [<ffffffffa0d5d2ed>] traverse_visitbp+0x48d/0x830 [zfs]
Apr 17 07:16:39 psysbkp2 kernel: [<ffffffffa0d46609>] ? arc_read+0x6b9/0x8b0 [zfs]
Apr 17 07:16:39 psysbkp2 kernel: [<ffffffffa0d5d2ed>] traverse_visitbp+0x48d/0x830 [zfs]
Apr 17 07:16:39 psysbkp2 kernel: [<ffffffffa0d46609>] ? arc_read+0x6b9/0x8b0 [zfs]
Apr 17 07:16:39 psysbkp2 kernel: [<ffffffffa0d5d2ed>] traverse_visitbp+0x48d/0x830 [zfs]
Apr 17 07:16:39 psysbkp2 kernel: [<ffffffffa0d46609>] ? arc_read+0x6b9/0x8b0 [zfs]
Apr 17 07:16:39 psysbkp2 kernel: [<ffffffffa0d5d2ed>] traverse_visitbp+0x48d/0x830 [zfs]
Apr 17 07:16:39 psysbkp2 kernel: [<ffffffffa0d46609>] ? arc_read+0x6b9/0x8b0 [zfs]
Apr 17 07:16:39 psysbkp2 kernel: [<ffffffffa0d5d2ed>] traverse_visitbp+0x48d/0x830 [zfs]
Apr 17 07:16:39 psysbkp2 kernel: [<ffffffffa0d46609>] ? arc_read+0x6b9/0x8b0 [zfs]
Apr 17 07:16:39 psysbkp2 kernel: [<ffffffffa0d5d2ed>] traverse_visitbp+0x48d/0x830 [zfs]
Apr 17 07:16:39 psysbkp2 kernel: [<ffffffffa0de3de3>] ? zio_buf_alloc+0x23/0x30 [zfs]
Apr 17 07:16:39 psysbkp2 kernel: [<ffffffffa0d45557>] ? arc_get_data_buf+0x307/0x480 [zfs]
Apr 17 07:16:47 psysbkp2 kernel: [<ffffffffa0d5daa4>] traverse_dnode+0x74/0x100 [zfs]
Apr 17 07:16:47 psysbkp2 kernel: [<ffffffffa0d5d531>] traverse_visitbp+0x6d1/0x830 [zfs]
Apr 17 07:16:47 psysbkp2 kernel: [<ffffffff81058d53>] ? __wake_up+0x53/0x70
Apr 17 07:16:47 psysbkp2 kernel: [<ffffffffa0d5db30>] ? traverse_prefetch_thread+0x0/0xc0 [zfs]
Apr 17 07:16:47 psysbkp2 kernel: [<ffffffffa0d5d802>] traverse_impl+0x172/0x310 [zfs]
Apr 17 07:16:47 psysbkp2 kernel: [<ffffffffa0d5da25>] traverse_dataset+0x45/0x50 [zfs]
Apr 17 07:16:47 psysbkp2 kernel: [<ffffffffa0d5c050>] ? backup_cb+0x0/0x6c0 [zfs]
Apr 17 07:16:47 psysbkp2 kernel: [<ffffffffa0d5be8e>] dmu_send+0x32e/0x4f0 [zfs]
Apr 17 07:16:47 psysbkp2 kernel: [<ffffffffa0dc84a2>] zfs_ioc_send+0x302/0x390 [zfs]
Apr 17 07:16:47 psysbkp2 kernel: [<ffffffff81059216>] ? enqueue_task+0x66/0x80
Apr 17 07:16:47 psysbkp2 kernel: [<ffffffffa0dc4a9d>] zfsdev_ioctl+0xfd/0x1d0 [zfs]
Apr 17 07:16:47 psysbkp2 kernel: [<ffffffff8119dc12>] vfs_ioctl+0x22/0xa0
Apr 17 07:16:47 psysbkp2 kernel: [<ffffffff81070ead>] ? do_fork+0x13d/0x480
Apr 17 07:16:47 psysbkp2 kernel: [<ffffffff8119ddb4>] do_vfs_ioctl+0x84/0x580
Apr 17 07:16:47 psysbkp2 kernel: [<ffffffff8119e331>] sys_ioctl+0x81/0xa0
Apr 17 07:16:47 psysbkp2 kernel: [<ffffffff8152b15e>] ? do_device_not_available+0xe/0x10
Apr 17 07:16:47 psysbkp2 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Apr 17 07:16:47 psysbkp2 kernel: INFO: task rsync:23114 blocked for more than 120 seconds.
Apr 17 07:16:47 psysbkp2 kernel:      Tainted: P           ---------------    2.6.32-431.11.2.el6.x86_64 #1
Apr 17 07:16:47 psysbkp2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 17 07:16:47 psysbkp2 kernel: rsync         D 0000000000000000     0 23114  23113 0x00000080
Apr 17 07:16:47 psysbkp2 kernel: ffff880315dc7b28 0000000000000082 0000000000000055 ffff8802a8b5b9a0
Apr 17 07:16:47 psysbkp2 kernel: 0000000000000044 0000000000000286 ffff880315dc7bc8 ffffffffa0c7c9e7
Apr 17 07:16:47 psysbkp2 kernel: ffff88043471d058 ffff880315dc7fd8 000000000000fbc8 ffff88043471d058
Apr 17 07:16:47 psysbkp2 kernel: Call Trace:
Apr 17 07:16:47 psysbkp2 kernel: [<ffffffffa0c7c9e7>] ? spl_debug_msg+0x427/0x9d0 [spl]
Apr 17 07:16:47 psysbkp2 kernel: [<ffffffff8109b4de>] ? prepare_to_wait_exclusive+0x4e/0x80
Apr 17 07:16:47 psysbkp2 kernel: [<ffffffffa0c8d475>] cv_wait_common+0x105/0x1c0 [spl]
Apr 17 07:16:47 psysbkp2 kernel: [<ffffffff8109b290>] ? autoremove_wake_function+0x0/0x40
Apr 17 07:16:47 psysbkp2 kernel: [<ffffffffa0c8d585>] __cv_wait+0x15/0x20 [spl]
Apr 17 07:16:55 psysbkp2 kernel: [<ffffffffa0d9ddbb>] txg_wait_open+0x8b/0x110 [zfs]
Apr 17 07:16:55 psysbkp2 kernel: [<ffffffffa0d5e7cd>] dmu_tx_wait+0xed/0xf0 [zfs]
Apr 17 07:16:55 psysbkp2 kernel: [<ffffffffa0dd784e>] zfs_write+0x3be/0xca0 [zfs]
Apr 17 07:16:55 psysbkp2 kernel: [<ffffffffa0deb2c2>] zpl_write_common+0x52/0x70 [zfs]
Apr 17 07:16:55 psysbkp2 kernel: [<ffffffffa0deb348>] zpl_write+0x68/0xa0 [zfs]
Apr 17 07:16:55 psysbkp2 kernel: [<ffffffff81226496>] ? security_file_permission+0x16/0x20
Apr 17 07:16:55 psysbkp2 kernel: [<ffffffff81189048>] vfs_write+0xb8/0x1a0
Apr 17 07:16:55 psysbkp2 kernel: [<ffffffff81189941>] sys_write+0x51/0x90
Apr 17 07:16:55 psysbkp2 kernel: [<ffffffff810e1e4e>] ? __audit_syscall_exit+0x25e/0x290
Apr 17 07:16:55 psysbkp2 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Apr 17 07:16:55 psysbkp2 kernel: INFO: task rsync:6862 blocked for more than 120 seconds.
Apr 17 07:16:55 psysbkp2 kernel:      Tainted: P           ---------------    2.6.32-431.11.2.el6.x86_64 #1
Apr 17 07:16:55 psysbkp2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 17 07:16:55 psysbkp2 kernel: rsync         D 0000000000000004     0  6862   6861 0x00000080
Apr 17 07:16:55 psysbkp2 kernel: ffff8800692e5a88 0000000000000082 0000000000000055 ffff880200000000
Apr 17 07:16:55 psysbkp2 kernel: 0000000000000044 0000000000000001 ffff8800692e5b28 0000000000000082
Apr 17 07:16:55 psysbkp2 kernel: ffff88005e0a3af8 ffff8800692e5fd8 000000000000fbc8 ffff88005e0a3af8
Apr 17 07:16:55 psysbkp2 kernel: Call Trace:
Apr 17 07:16:55 psysbkp2 kernel: [<ffffffff8109b4de>] ? prepare_to_wait_exclusive+0x4e/0x80
Apr 17 07:16:55 psysbkp2 kernel: [<ffffffffa0c8d475>] cv_wait_common+0x105/0x1c0 [spl]
Apr 17 07:16:55 psysbkp2 kernel: [<ffffffff8109b290>] ? autoremove_wake_function+0x0/0x40
Apr 17 07:16:55 psysbkp2 kernel: [<ffffffffa0c8d585>] __cv_wait+0x15/0x20 [spl]
Apr 17 07:16:55 psysbkp2 kernel: [<ffffffffa0d9ddbb>] txg_wait_open+0x8b/0x110 [zfs]
Apr 17 07:16:55 psysbkp2 kernel: [<ffffffffa0d5e7cd>] dmu_tx_wait+0xed/0xf0 [zfs]
Apr 17 07:16:55 psysbkp2 kernel: [<ffffffffa0dd8d7c>] zfs_rename+0x43c/0xda0 [zfs]
Apr 17 07:16:55 psysbkp2 kernel: [<ffffffff8100bb8e>] ? apic_timer_interrupt+0xe/0x20
Apr 17 07:16:55 psysbkp2 kernel: [<ffffffff811a328b>] ? __d_instantiate+0xbb/0x110
Apr 17 07:16:55 psysbkp2 kernel: [<ffffffffa0deb9ae>] zpl_rename+0x5e/0x90 [zfs]
Apr 17 07:16:55 psysbkp2 kernel: [<ffffffff81197fd9>] vfs_rename+0x419/0x480
Apr 17 07:16:55 psysbkp2 kernel: [<ffffffff8119a5d9>] sys_renameat+0x309/0x3a0
Apr 17 07:16:55 psysbkp2 kernel: [<ffffffff811aadb4>] ? mnt_drop_write+0x34/0x40
Apr 17 07:16:55 psysbkp2 kernel: [<ffffffff81282935>] ? _atomic_dec_and_lock+0x55/0x80
Apr 17 07:16:55 psysbkp2 kernel: [<ffffffff811aaaf0>] ? mntput_no_expire+0x30/0x110
Apr 17 07:16:55 psysbkp2 kernel: [<ffffffff811975d1>] ? path_put+0x31/0x40
Apr 17 07:16:55 psysbkp2 kernel: [<ffffffff810e2057>] ? audit_syscall_entry+0x1d7/0x200
Apr 17 07:16:55 psysbkp2 kernel: [<ffffffff8119a68b>] sys_rename+0x1b/0x20
Apr 17 07:16:55 psysbkp2 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Apr 17 07:16:55 psysbkp2 kernel: INFO: task rsync:7611 blocked for more than 120 seconds.
Apr 17 07:16:55 psysbkp2 kernel:      Tainted: P           ---------------    2.6.32-431.11.2.el6.x86_64 #1
Apr 17 07:16:59 psysbkp2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 17 07:16:59 psysbkp2 kernel: rsync         D 0000000000000006     0  7611   7610 0x00000080
Apr 17 07:16:59 psysbkp2 kernel: ffff8800a7ac7b28 0000000000000086 0000000000000055 ffff880200000000
Apr 17 07:16:59 psysbkp2 kernel: 0000000000000044 0000000000000001 ffff8800a7ac7bc8 0000000000000086
Apr 17 07:16:59 psysbkp2 kernel: ffff880171c8a638 ffff8800a7ac7fd8 000000000000fbc8 ffff880171c8a638
Apr 17 07:16:59 psysbkp2 kernel: Call Trace:
Apr 17 07:16:59 psysbkp2 kernel: [<ffffffff8109b4de>] ? prepare_to_wait_exclusive+0x4e/0x80
Apr 17 07:17:21 psysbkp2 kernel: [<ffffffffa0c8d475>] cv_wait_common+0x105/0x1c0 [spl]
Apr 17 07:17:21 psysbkp2 kernel: [<ffffffff8109b290>] ? autoremove_wake_function+0x0/0x40
Apr 17 07:17:21 psysbkp2 kernel: [<ffffffffa0c8d585>] __cv_wait+0x15/0x20 [spl]
Apr 17 07:17:21 psysbkp2 kernel: [<ffffffffa0d9ddbb>] txg_wait_open+0x8b/0x110 [zfs]
Apr 17 07:17:21 psysbkp2 kernel: [<ffffffffa0d5e7cd>] dmu_tx_wait+0xed/0xf0 [zfs]
Apr 17 07:17:21 psysbkp2 kernel: [<ffffffffa0dd784e>] zfs_write+0x3be/0xca0 [zfs]
Apr 17 07:17:21 psysbkp2 kernel: [<ffffffffa0deb2c2>] zpl_write_common+0x52/0x70 [zfs]
Apr 17 07:17:40 psysbkp2 kernel: [<ffffffffa0deb348>] zpl_write+0x68/0xa0 [zfs]
Apr 17 07:17:40 psysbkp2 kernel: [<ffffffff81226496>] ? security_file_permission+0x16/0x20
Apr 17 07:17:40 psysbkp2 kernel: [<ffffffff81189048>] vfs_write+0xb8/0x1a0
Apr 17 07:17:40 psysbkp2 kernel: [<ffffffff81189941>] sys_write+0x51/0x90
Apr 17 07:17:40 psysbkp2 kernel: [<ffffffff810e1e4e>] ? __audit_syscall_exit+0x25e/0x290
Apr 17 07:17:40 psysbkp2 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Apr 17 07:17:40 psysbkp2 kernel: INFO: task rsync:7997 blocked for more than 120 seconds.
Apr 17 07:17:40 psysbkp2 kernel:      Tainted: P           ---------------    2.6.32-431.11.2.el6.x86_64 #1
Apr 17 07:17:40 psysbkp2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 17 07:17:40 psysbkp2 kernel: rsync         D 0000000000000002     0  7997   7996 0x00000080
Apr 17 07:17:40 psysbkp2 kernel: ffff88007fe5bb28 0000000000000086 0000000000000055 ffff880200000000
Apr 17 07:17:40 psysbkp2 kernel: 0000000000000044 0000000000000001 ffff88007fe5bbc8 0000000000000086
Apr 17 07:17:40 psysbkp2 kernel: ffff88013dcadaf8 ffff88007fe5bfd8 000000000000fbc8 ffff88013dcadaf8
Apr 17 07:17:40 psysbkp2 kernel: Call Trace:
Apr 17 07:17:40 psysbkp2 kernel: [<ffffffff8109b4de>] ? prepare_to_wait_exclusive+0x4e/0x80
Apr 17 07:17:40 psysbkp2 kernel: [<ffffffffa0c8d475>] cv_wait_common+0x105/0x1c0 [spl]
Apr 17 07:17:40 psysbkp2 kernel: [<ffffffff8109b290>] ? autoremove_wake_function+0x0/0x40
Apr 17 07:17:40 psysbkp2 kernel: [<ffffffffa0c8d585>] __cv_wait+0x15/0x20 [spl]
Apr 17 07:17:40 psysbkp2 kernel: [<ffffffffa0d9ddbb>] txg_wait_open+0x8b/0x110 [zfs]
Apr 17 07:17:40 psysbkp2 kernel: [<ffffffffa0d5e7cd>] dmu_tx_wait+0xed/0xf0 [zfs]
Apr 17 07:17:40 psysbkp2 kernel: [<ffffffffa0dd784e>] zfs_write+0x3be/0xca0 [zfs]
Apr 17 07:17:40 psysbkp2 kernel: [<ffffffffa0deb2c2>] zpl_write_common+0x52/0x70 [zfs]
Apr 17 07:17:40 psysbkp2 kernel: [<ffffffffa0deb348>] zpl_write+0x68/0xa0 [zfs]
Apr 17 07:17:55 psysbkp2 kernel: [<ffffffff81226496>] ? security_file_permission+0x16/0x20
Apr 17 07:17:55 psysbkp2 kernel: [<ffffffff81189048>] vfs_write+0xb8/0x1a0
Apr 17 07:17:55 psysbkp2 kernel: [<ffffffff81189941>] sys_write+0x51/0x90
Apr 17 07:17:55 psysbkp2 kernel: [<ffffffff810e1e4e>] ? __audit_syscall_exit+0x25e/0x290
Apr 17 07:17:55 psysbkp2 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Apr 17 07:17:55 psysbkp2 kernel: INFO: task dd:8369 blocked for more than 120 seconds.
Apr 17 07:17:55 psysbkp2 kernel:      Tainted: P           ---------------    2.6.32-431.11.2.el6.x86_64 #1
Apr 17 07:17:55 psysbkp2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 17 07:17:55 psysbkp2 kernel: dd            D 0000000000000002     0  8369   8368 0x00000080
Apr 17 07:17:55 psysbkp2 kernel: ffff88006cc49b28 0000000000000082 0000000000000055 ffff8802a6a74b20
Apr 17 07:17:55 psysbkp2 kernel: 0000000000000044 0000000000000286 ffff88006cc49bc8 ffffffffa0c7c9e7
Apr 17 07:17:55 psysbkp2 kernel: ffff880433987af8 ffff88006cc49fd8 000000000000fbc8 ffff880433987af8
Apr 17 07:17:55 psysbkp2 kernel: Call Trace:
Apr 17 07:17:55 psysbkp2 kernel: [<ffffffffa0c7c9e7>] ? spl_debug_msg+0x427/0x9d0 [spl]
Apr 17 07:17:55 psysbkp2 kernel: [<ffffffff8109b4de>] ? prepare_to_wait_exclusive+0x4e/0x80
Apr 17 07:17:55 psysbkp2 kernel: [<ffffffffa0c8d475>] cv_wait_common+0x105/0x1c0 [spl]
Apr 17 07:17:55 psysbkp2 kernel: [<ffffffff8109b290>] ? autoremove_wake_function+0x0/0x40
Apr 17 07:17:55 psysbkp2 kernel: [<ffffffffa0c8d585>] __cv_wait+0x15/0x20 [spl]
Apr 17 07:17:55 psysbkp2 kernel: [<ffffffffa0d9ddbb>] txg_wait_open+0x8b/0x110 [zfs]
Apr 17 07:17:55 psysbkp2 kernel: [<ffffffffa0d5e7cd>] dmu_tx_wait+0xed/0xf0 [zfs]
Apr 17 07:17:55 psysbkp2 kernel: [<ffffffffa0dd784e>] zfs_write+0x3be/0xca0 [zfs]
Apr 17 07:17:55 psysbkp2 kernel: [<ffffffffa0deb2c2>] zpl_write_common+0x52/0x70 [zfs]
Apr 17 07:17:55 psysbkp2 kernel: [<ffffffffa0deb348>] zpl_write+0x68/0xa0 [zfs]
Apr 17 07:17:55 psysbkp2 kernel: [<ffffffff81226496>] ? security_file_permission+0x16/0x20
Apr 17 07:17:55 psysbkp2 kernel: [<ffffffff81189048>] vfs_write+0xb8/0x1a0
Apr 17 07:17:55 psysbkp2 kernel: [<ffffffff81189941>] sys_write+0x51/0x90
Apr 17 07:17:55 psysbkp2 kernel: [<ffffffff810e1e4e>] ? __audit_syscall_exit+0x25e/0x290
Apr 17 07:17:55 psysbkp2 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b

@behlendorf
Copy link
Contributor

Yes, it looks the same. We'll get the next release out once we resolve the remaining outstanding issues associated with it.

https://github.com/zfsonlinux/zfs/issues?labels=&milestone=23&page=1&state=open

@jby
Copy link
Author

jby commented Apr 18, 2014

Any tips on what to do in the meantime, to try to minimize the occurrence of this?

@behlendorf
Copy link
Contributor

@jby If your comfortable patching and rebuilding ZFS my suggestion would be to try cherry-picking a168788 from the master branch. There's a good chance it will resolve your issue.

@jby
Copy link
Author

jby commented Apr 22, 2014

OK, are there any more detailed install instructions somewhere, or is it very straight forward to just run autogen.sh and build from there?

Any prerequisites or requirements that I need to be aware of?

@jby
Copy link
Author

jby commented Apr 22, 2014

Never mind, found this: http://zfsonlinux.org/generic-rpm.html

@jby
Copy link
Author

jby commented Apr 22, 2014

Hmm, do I perhaps need something more than a168788 ?

../../module/zfs/vdev_mirror.c: In function 'vdev_mirror_pending':
../../module/zfs/vdev_mirror.c:92: error: 'vdev_queue_t' has no member named 'vq_pending_tree'
make[5]: *** [vdev_mirror.lo] Error 1
make[5]: Leaving directory `/tmp/zfs-build-jonasb-JQ1sPKJt/BUILD/zfs-0.6.2/lib/libzpool'
make[4]: *** [all-recursive] Error 1
make[4]: Leaving directory `/tmp/zfs-build-jonasb-JQ1sPKJt/BUILD/zfs-0.6.2/lib'
make[3]: *** [all-recursive] Error 1
make[3]: Leaving directory `/tmp/zfs-build-jonasb-JQ1sPKJt/BUILD/zfs-0.6.2'
make[2]: *** [all] Error 2
make[2]: Leaving directory `/tmp/zfs-build-jonasb-JQ1sPKJt/BUILD/zfs-0.6.2'
error: Bad exit status from /tmp/zfs-build-jonasb-JQ1sPKJt/TMP/rpm-tmp.wIRkUv (%build)


RPM build errors:
    Bad exit status from /tmp/zfs-build-jonasb-JQ1sPKJt/TMP/rpm-tmp.wIRkUv (%build)
make[1]: *** [rpm-common] Error 1
make[1]: Leaving directory `/home/jonasb/src/zfs'
make: *** [rpm-utils] Error 2

@jby
Copy link
Author

jby commented Apr 22, 2014

Or maybe I shouldn't try to cherry-pick that onto master, since it seems to have been updated with something newer.

Onto what should I cherry-pick a168788 ?

@behlendorf
Copy link
Contributor

@jby That fix is already applied to master. My suggestion would be just to run master, or cherry-pick that back to 0.6.2. Although it may not apply cleanly so just building packages using master is probably easiest.

@jby
Copy link
Author

jby commented Apr 23, 2014

Thanks, I'll go with master then.

@jby
Copy link
Author

jby commented Apr 24, 2014

Hmm,

This is from very early this morning - running master from tuesday (zfs-0.6.2-259_gde39ec1.el6.x86_64):

Apr 24 02:10:04 psysbkp2 kernel: INFO: task txg_sync:1884 blocked for more than 120 seconds.
Apr 24 02:10:04 psysbkp2 kernel:      Tainted: P           ---------------    2.6.32-431.11.2.el6.x86_64 #1
Apr 24 02:10:04 psysbkp2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 24 02:10:04 psysbkp2 kernel: txg_sync      D 0000000000000005     0  1884      2 0x00000000
Apr 24 02:10:04 psysbkp2 kernel: ffff880408895b70 0000000000000046 ffff880408895b38 ffff880408895b34
Apr 24 02:10:04 psysbkp2 kernel: 0000000000000000 ffff88043fc24b00 ffff880028316840 0000000000000400
Apr 24 02:10:04 psysbkp2 kernel: ffff88040cf7c5f8 ffff880408895fd8 000000000000fbc8 ffff88040cf7c5f8
Apr 24 02:10:04 psysbkp2 kernel: Call Trace:
Apr 24 02:10:04 psysbkp2 kernel: [<ffffffff815286c3>] io_schedule+0x73/0xc0
Apr 24 02:10:04 psysbkp2 kernel: [<ffffffffa0cdc69c>] cv_wait_common+0xac/0x1c0 [spl]
Apr 24 02:10:04 psysbkp2 kernel: [<ffffffffa0e412f0>] ? zio_execute+0x0/0x140 [zfs]
Apr 24 02:10:04 psysbkp2 kernel: [<ffffffff8109b290>] ? autoremove_wake_function+0x0/0x40
Apr 24 02:10:04 psysbkp2 kernel: [<ffffffffa0cdc7c8>] __cv_wait_io+0x18/0x20 [spl]
Apr 24 02:10:04 psysbkp2 kernel: [<ffffffffa0e4152b>] zio_wait+0xfb/0x1b0 [zfs]
Apr 24 02:10:04 psysbkp2 kernel: [<ffffffffa0dd27e3>] dsl_pool_sync+0xb3/0x440 [zfs]
Apr 24 02:10:04 psysbkp2 kernel: [<ffffffffa0de664b>] spa_sync+0x40b/0xae0 [zfs]
Apr 24 02:10:04 psysbkp2 kernel: [<ffffffffa0dfab74>] txg_sync_thread+0x384/0x5e0 [zfs]
Apr 24 02:10:04 psysbkp2 kernel: [<ffffffff81059329>] ? set_user_nice+0xc9/0x130
Apr 24 02:10:04 psysbkp2 kernel: [<ffffffffa0dfa7f0>] ? txg_sync_thread+0x0/0x5e0 [zfs]
Apr 24 02:10:04 psysbkp2 kernel: [<ffffffffa0cd4478>] thread_generic_wrapper+0x68/0x80 [spl]
Apr 24 02:10:04 psysbkp2 kernel: [<ffffffffa0cd4410>] ? thread_generic_wrapper+0x0/0x80 [spl]
Apr 24 02:10:04 psysbkp2 kernel: [<ffffffff8109aee6>] kthread+0x96/0xa0
Apr 24 02:10:04 psysbkp2 kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20
Apr 24 02:10:04 psysbkp2 kernel: [<ffffffff8109ae50>] ? kthread+0x0/0xa0
Apr 24 02:10:04 psysbkp2 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
Apr 24 06:30:04 psysbkp2 kernel: INFO: task txg_sync:1884 blocked for more than 120 seconds.
Apr 24 06:30:04 psysbkp2 kernel:      Tainted: P           ---------------    2.6.32-431.11.2.el6.x86_64 #1
Apr 24 06:30:04 psysbkp2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 24 06:30:04 psysbkp2 kernel: txg_sync      D 0000000000000006     0  1884      2 0x00000000
Apr 24 06:30:04 psysbkp2 kernel: ffff880408895b70 0000000000000046 0000000000000001 ffff880434f92a30
Apr 24 06:30:06 psysbkp2 kernel: 0000000000000000 0000000000000000 ffff880408895af0 ffffffff81065e02
Apr 24 06:30:06 psysbkp2 kernel: ffff88040cf7c5f8 ffff880408895fd8 000000000000fbc8 ffff88040cf7c5f8
Apr 24 06:30:06 psysbkp2 kernel: Call Trace:
Apr 24 06:30:06 psysbkp2 kernel: [<ffffffff81065e02>] ? default_wake_function+0x12/0x20
Apr 24 06:30:06 psysbkp2 kernel: [<ffffffff810a7091>] ? ktime_get_ts+0xb1/0xf0
Apr 24 06:30:06 psysbkp2 kernel: [<ffffffff815286c3>] io_schedule+0x73/0xc0
Apr 24 06:30:06 psysbkp2 kernel: [<ffffffffa0cdc69c>] cv_wait_common+0xac/0x1c0 [spl]
Apr 24 06:30:06 psysbkp2 kernel: [<ffffffffa0e412f0>] ? zio_execute+0x0/0x140 [zfs]
Apr 24 06:30:06 psysbkp2 kernel: [<ffffffff8109b290>] ? autoremove_wake_function+0x0/0x40
Apr 24 06:30:06 psysbkp2 kernel: [<ffffffffa0cdc7c8>] __cv_wait_io+0x18/0x20 [spl]
Apr 24 06:30:07 psysbkp2 kernel: [<ffffffffa0e4152b>] zio_wait+0xfb/0x1b0 [zfs]
Apr 24 06:30:07 psysbkp2 kernel: [<ffffffffa0dd27e3>] dsl_pool_sync+0xb3/0x440 [zfs]
Apr 24 06:30:07 psysbkp2 kernel: [<ffffffffa0de664b>] spa_sync+0x40b/0xae0 [zfs]
Apr 24 06:30:07 psysbkp2 kernel: [<ffffffffa0dfab74>] txg_sync_thread+0x384/0x5e0 [zfs]
Apr 24 06:30:07 psysbkp2 kernel: [<ffffffff81059329>] ? set_user_nice+0xc9/0x130
Apr 24 06:30:07 psysbkp2 kernel: [<ffffffffa0dfa7f0>] ? txg_sync_thread+0x0/0x5e0 [zfs]
Apr 24 06:30:07 psysbkp2 kernel: [<ffffffffa0cd4478>] thread_generic_wrapper+0x68/0x80 [spl]
Apr 24 06:30:07 psysbkp2 kernel: [<ffffffffa0cd4410>] ? thread_generic_wrapper+0x0/0x80 [spl]
Apr 24 06:30:07 psysbkp2 kernel: [<ffffffff8109aee6>] kthread+0x96/0xa0
Apr 24 06:30:07 psysbkp2 kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20
Apr 24 06:30:07 psysbkp2 kernel: [<ffffffff8109ae50>] ? kthread+0x0/0xa0
Apr 24 06:30:07 psysbkp2 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
Apr 24 08:02:05 psysbkp2 kernel: INFO: task txg_sync:1884 blocked for more than 120 seconds.
Apr 24 08:02:05 psysbkp2 kernel:      Tainted: P           ---------------    2.6.32-431.11.2.el6.x86_64 #1
Apr 24 08:02:05 psysbkp2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 24 08:02:05 psysbkp2 kernel: txg_sync      D 0000000000000007     0  1884      2 0x00000000
Apr 24 08:02:05 psysbkp2 kernel: ffff880408895b70 0000000000000046 0000000000000001 ffff880434f92a30
Apr 24 08:02:05 psysbkp2 kernel: 0000000000000000 0000000000000000 ffff880408895af0 ffffffff81065e02
Apr 24 08:02:05 psysbkp2 kernel: ffff88040cf7c5f8 ffff880408895fd8 000000000000fbc8 ffff88040cf7c5f8
Apr 24 08:02:05 psysbkp2 kernel: Call Trace:
Apr 24 08:02:05 psysbkp2 kernel: [<ffffffff81065e02>] ? default_wake_function+0x12/0x20
Apr 24 08:02:05 psysbkp2 kernel: [<ffffffff810a7091>] ? ktime_get_ts+0xb1/0xf0
Apr 24 08:02:05 psysbkp2 kernel: [<ffffffff815286c3>] io_schedule+0x73/0xc0
Apr 24 08:02:05 psysbkp2 kernel: [<ffffffffa0cdc69c>] cv_wait_common+0xac/0x1c0 [spl]
Apr 24 08:02:05 psysbkp2 kernel: [<ffffffffa0e412f0>] ? zio_execute+0x0/0x140 [zfs]
Apr 24 08:02:05 psysbkp2 kernel: [<ffffffff8109b290>] ? autoremove_wake_function+0x0/0x40
Apr 24 08:02:05 psysbkp2 kernel: [<ffffffffa0cdc7c8>] __cv_wait_io+0x18/0x20 [spl]
Apr 24 08:02:05 psysbkp2 kernel: [<ffffffffa0e4152b>] zio_wait+0xfb/0x1b0 [zfs]
Apr 24 08:02:05 psysbkp2 kernel: [<ffffffffa0dd27e3>] dsl_pool_sync+0xb3/0x440 [zfs]
Apr 24 08:02:05 psysbkp2 kernel: [<ffffffffa0de664b>] spa_sync+0x40b/0xae0 [zfs]
Apr 24 08:02:05 psysbkp2 kernel: [<ffffffffa0dfab74>] txg_sync_thread+0x384/0x5e0 [zfs]
Apr 24 08:02:05 psysbkp2 kernel: [<ffffffff81059329>] ? set_user_nice+0xc9/0x130
Apr 24 08:02:05 psysbkp2 kernel: [<ffffffffa0dfa7f0>] ? txg_sync_thread+0x0/0x5e0 [zfs]
Apr 24 08:02:05 psysbkp2 kernel: [<ffffffffa0cd4478>] thread_generic_wrapper+0x68/0x80 [spl]
Apr 24 08:02:05 psysbkp2 kernel: [<ffffffffa0cd4410>] ? thread_generic_wrapper+0x0/0x80 [spl]
Apr 24 08:02:05 psysbkp2 kernel: [<ffffffff8109aee6>] kthread+0x96/0xa0
Apr 24 08:02:05 psysbkp2 kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20
Apr 24 08:02:05 psysbkp2 kernel: [<ffffffff8109ae50>] ? kthread+0x0/0xa0
Apr 24 08:02:05 psysbkp2 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20

@behlendorf
Copy link
Contributor

There's a decent chance this is also a manifestation of #2523 which has not yet been addressed. @jby by chance have you observed this same issue in the 0.6.3 code?

@jby
Copy link
Author

jby commented Nov 18, 2014

Yes, but only a couple of times - and all within the same hour.

I built and installed master on April 22 (commit de39ec1) and have had the

INFO: task txg_sync:1130 blocked for more than 120 seconds.

happen 10 times within an hour early in the morning on October 18.

That's it.

I've been planning to upgrade to an official release, but since it's been running pretty well since I installed from master I've tried to leave it alone.

behlendorf pushed a commit to openzfs/spl that referenced this issue Dec 19, 2014
It is known that mutexes in Linux are not safe when using them to
synchronize the freeing of object in which the mutex is embedded:

http://lwn.net/Articles/575477/

The known places in ZFS which are suspected to suffer from the race
condition are zio->io_lock and dbuf->db_mtx.

* zio uses zio->io_lock and zio->io_cv to synchronize freeing
  between zio_wait() and zio_done().
* dbuf uses dbuf->db_mtx to protect reference counting.

This patch fixes this kind of race by forcing serialization on
mutex_exit() with a spin lock, making the mutex safe by sacrificing
a bit of performance and memory overhead.

This issue most commonly manifests itself as a deadlock in the zio
pipeline caused by a process spinning on the damaged mutex.  Similar
deadlocks have been reported for the dbuf->db_mtx mutex.  And it can
also cause a NULL dereference or bad paging request under the right
circumstances.

This issue any many like it are linked off the openzfs/zfs#2523
issue.  Specifically this fix resolves at least the following
outstanding issues:

openzfs/zfs#401
openzfs/zfs#2523
openzfs/zfs#2679
openzfs/zfs#2684
openzfs/zfs#2704
openzfs/zfs#2708
openzfs/zfs#2517
openzfs/zfs#2827
openzfs/zfs#2850
openzfs/zfs#2891
openzfs/zfs#2897
openzfs/zfs#2247
openzfs/zfs#2939

Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes #421
@behlendorf
Copy link
Contributor

This issue which is a duplicate of #2523 was resolved by the following commit. Full details can be found in the commit message and related lwn article.

openzfs/spl@a3c1eb7 mutex: force serialization on mutex_exit() to fix races

dajhorn added a commit to zfsonlinux/pkg-spl that referenced this issue Dec 20, 2014
Commit: openzfs/zfs@a3c1eb7
From: Chunwei Chen <tuxoko@gmail.com>
Date: Fri, 19 Dec 2014 11:31:59 +0800
Subject: mutex: force serialization on mutex_exit() to fix races

It is known that mutexes in Linux are not safe when using them to
synchronize the freeing of object in which the mutex is embedded:

http://lwn.net/Articles/575477/

The known places in ZFS which are suspected to suffer from the race
condition are zio->io_lock and dbuf->db_mtx.

* zio uses zio->io_lock and zio->io_cv to synchronize freeing
  between zio_wait() and zio_done().
* dbuf uses dbuf->db_mtx to protect reference counting.

This patch fixes this kind of race by forcing serialization on
mutex_exit() with a spin lock, making the mutex safe by sacrificing
a bit of performance and memory overhead.

This issue most commonly manifests itself as a deadlock in the zio
pipeline caused by a process spinning on the damaged mutex.  Similar
deadlocks have been reported for the dbuf->db_mtx mutex.  And it can
also cause a NULL dereference or bad paging request under the right
circumstances.

This issue any many like it are linked off the openzfs/zfs#2523
issue.  Specifically this fix resolves at least the following
outstanding issues:

openzfs/zfs#401
openzfs/zfs#2523
openzfs/zfs#2679
openzfs/zfs#2684
openzfs/zfs#2704
openzfs/zfs#2708
openzfs/zfs#2517
openzfs/zfs#2827
openzfs/zfs#2850
openzfs/zfs#2891
openzfs/zfs#2897
openzfs/zfs#2247
openzfs/zfs#2939

Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Backported-by: Darik Horn <dajhorn@vanadac.com>
Closes #421

Conflicts:
        include/sys/mutex.h
behlendorf pushed a commit to openzfs/spl that referenced this issue Dec 23, 2014
It is known that mutexes in Linux are not safe when using them to
synchronize the freeing of object in which the mutex is embedded:

http://lwn.net/Articles/575477/

The known places in ZFS which are suspected to suffer from the race
condition are zio->io_lock and dbuf->db_mtx.

* zio uses zio->io_lock and zio->io_cv to synchronize freeing
  between zio_wait() and zio_done().
* dbuf uses dbuf->db_mtx to protect reference counting.

This patch fixes this kind of race by forcing serialization on
mutex_exit() with a spin lock, making the mutex safe by sacrificing
a bit of performance and memory overhead.

This issue most commonly manifests itself as a deadlock in the zio
pipeline caused by a process spinning on the damaged mutex.  Similar
deadlocks have been reported for the dbuf->db_mtx mutex.  And it can
also cause a NULL dereference or bad paging request under the right
circumstances.

This issue any many like it are linked off the openzfs/zfs#2523
issue.  Specifically this fix resolves at least the following
outstanding issues:

openzfs/zfs#401
openzfs/zfs#2523
openzfs/zfs#2679
openzfs/zfs#2684
openzfs/zfs#2704
openzfs/zfs#2708
openzfs/zfs#2517
openzfs/zfs#2827
openzfs/zfs#2850
openzfs/zfs#2891
openzfs/zfs#2897
openzfs/zfs#2247
openzfs/zfs#2939

Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes #421
@ryao
Copy link
Contributor

ryao commented Feb 10, 2015

@behlendorf There is a class of issues that looked like the #2523 issue, but are a different problem that affects NUMA systems according to #3091. In reviewing the issues marked fixed by the #2523 fix, this one appears to be distinct.

@jby Did you try using a newer version of ZoL that had the patch that fixed #2523 and did the issue go away when you did?

@ryao
Copy link
Contributor

ryao commented Feb 10, 2015

After looking again, the problem is that the stack traces just don't resolve the RIP value. This looks exactly like #3091.

@jby
Copy link
Author

jby commented Feb 17, 2015

I've been running zfs-0.6.3-1.2.el6.x86_64 for quite a while now, without any problems.
My previous comment still stands about when I last had a crash.

@behlendorf
Copy link
Contributor

@ryao #3091 definitely is a legitimate issue, however from what I've seen reported thus far it is far less likely to occur that #2523 which has been fixed. Since it's very hard to tell the different between the two issues I suspect most of the existing issues I closer were caused by #2523. However, it's entirely possible a few people saw both problems. However, I thank we can and should track any remaining issues similar to this under #3091.

ryao pushed a commit to ryao/spl that referenced this issue Feb 19, 2015
It is known that mutexes in Linux are not safe when using them to
synchronize the freeing of object in which the mutex is embedded:

http://lwn.net/Articles/575477/

The known places in ZFS which are suspected to suffer from the race
condition are zio->io_lock and dbuf->db_mtx.

* zio uses zio->io_lock and zio->io_cv to synchronize freeing
  between zio_wait() and zio_done().
* dbuf uses dbuf->db_mtx to protect reference counting.

This patch fixes this kind of race by forcing serialization on
mutex_exit() with a spin lock, making the mutex safe by sacrificing
a bit of performance and memory overhead.

This issue most commonly manifests itself as a deadlock in the zio
pipeline caused by a process spinning on the damaged mutex.  Similar
deadlocks have been reported for the dbuf->db_mtx mutex.  And it can
also cause a NULL dereference or bad paging request under the right
circumstances.

This issue any many like it are linked off the openzfs/zfs#2523
issue.  Specifically this fix resolves at least the following
outstanding issues:

openzfs/zfs#401
openzfs/zfs#2523
openzfs/zfs#2679
openzfs/zfs#2684
openzfs/zfs#2704
openzfs/zfs#2708
openzfs/zfs#2517
openzfs/zfs#2827
openzfs/zfs#2850
openzfs/zfs#2891
openzfs/zfs#2897
openzfs/zfs#2247
openzfs/zfs#2939

Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes openzfs#421

Conflicts:
	include/sys/mutex.h
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants