task txg_sync:19098 blocked for more than 120 seconds. #5692

krichter722 · 2017-01-29T10:51:45Z

System information

Type	Version/Name
Distribution Name	Ubuntu
Distribution Version	16.10
Linux Kernel	4.8.0-34-generic
Architecture	amd64
ZFS Version	0.7.0-rc3_2_gbb7ffda
SPL Version	0.7.0-rc3

Describe the problem you're observing

After a while (10 to 30 minutes) of heavy read and write operations, the I/O of all processes changes between a state where I/O works slowly and doesn't proceed or proceeds in not noticeable portions for seconds up to several minutes. This results in freezes of GUI and console programs.

In iotop I see that read and write rates rarely exceed some 100 KB/s. If the system rests for several minutes with no heavy I/O requested suddenly the read and write rates to up to the usual ~100 MB/s and the delays disappear/the situation recovers. After causing a high load of read and write requests the system gets into the state and recovers with the same symptoms.

During the changes between slow and no proceeding in I/O I see > 100 z_fr_iss_[n] threads in iotop taking 100 % of I/O capacity. Since my HDD is quite loud (during intensive head movement) I can hear when the system start recovering. Is it really a good idea to have so many threads running, given the fact that they need to be managed and can be used for more than one task by exchanging thread stacks?

Describe how to reproduce the problem

Start 2 or more VirtualBox VMs for example.

Include any warning/errors/backtraces from the system logs

dmesg contains

[Jan29 11:13] INFO: task txg_sync:19098 blocked for more than 120 seconds.
[  +0,000004]       Tainted: P           OE   4.8.0-34-generic #36-Ubuntu
[  +0,000001] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  +0,000002] txg_sync        D ffff9072ed32bb48     0 19098      2 0x00000000
[  +0,000003]  ffff9072ed32bb48 00ff9072ed32bb90 ffff9073ace71d00 ffff9073136d9d00
[  +0,000002]  ffff90732598c1c0 ffff9072ed32c000 ffff9073bf259300 7fffffffffffffff
[  +0,000002]  ffff9071dd8b1b28 0000000000000001 ffff9072ed32bb60 ffffffffb4697b15
[  +0,000002] Call Trace:
[  +0,000005]  [<ffffffffb4697b15>] schedule+0x35/0x80
[  +0,000002]  [<ffffffffb469b2ca>] schedule_timeout+0x22a/0x3f0
[  +0,000046]  [<ffffffffc0ae0d50>] ? zio_reexecute+0x380/0x380 [zfs]
[  +0,000002]  [<ffffffffb3efcb21>] ? ktime_get+0x41/0xb0
[  +0,000002]  [<ffffffffb4697304>] io_schedule_timeout+0xa4/0x110
[  +0,000006]  [<ffffffffc08c1b32>] cv_wait_common+0xb2/0x130 [spl]
[  +0,000002]  [<ffffffffb3ec7560>] ? wake_atomic_t_function+0x60/0x60
[  +0,000004]  [<ffffffffc08c1c08>] __cv_wait_io+0x18/0x20 [spl]
[  +0,000035]  [<ffffffffc0ae38ec>] zio_wait+0xec/0x190 [zfs]
[  +0,000030]  [<ffffffffc0a825f3>] spa_sync+0x453/0xd70 [zfs]
[  +0,000032]  [<ffffffffc0a93894>] txg_sync_thread+0x2c4/0x480 [zfs]
[  +0,000032]  [<ffffffffc0a935d0>] ? txg_delay+0x160/0x160 [zfs]
[  +0,000004]  [<ffffffffc08bc6d1>] thread_generic_wrapper+0x71/0x80 [spl]
[  +0,000003]  [<ffffffffc08bc660>] ? __thread_exit+0x20/0x20 [spl]
[  +0,000002]  [<ffffffffb3ea4128>] kthread+0xd8/0xf0
[  +0,000002]  [<ffffffffb469c4df>] ret_from_fork+0x1f/0x40
[  +0,000001]  [<ffffffffb3ea4050>] ? kthread_create_on_node+0x1e0/0x1e0

logged every 2 minutes (+/- some ms).

I build spl and zfs from source.

I have a 1 partition pool with 1 cache partition associated.

The text was updated successfully, but these errors were encountered:

behlendorf · 2017-01-30T18:52:28Z

@krichter722 thank you for the detailed issue. In general, the large number of z_fr_iss threads which get created allow ZFS to more efficiently handle the outstanding work. However, there may be cases like you're seeing which can be improved.

From the description you've provided it sounds like you've run in to the issue described in #5449. Internally ZFS is allowing too many free requests to be assigned to a single TXG. This effectively stalls the writes until all the frees are processed.

If you're comfortable with git and building ZFS from source I'd encourage you to try applying the patch in #5449 to verify it resolves the issue.

avnik · 2017-01-31T10:04:39Z

@behlendorf can traces which I post to zfs-discuss couple days ago be related to same problem? http://list.zfsonlinux.org/pipermail/zfs-discuss/2017-January/027254.html

Replies in list hint me on hw issues, but it may be different symptoms of same problem also.

behlendorf · 2017-01-31T18:21:30Z

Yes, it's possible hard to say for certain.

krichter722 · 2017-02-01T22:15:19Z

From the description you've provided it sounds like you've run in to the issue described in #5449. Internally ZFS is allowing too many free requests to be assigned to a single TXG. This effectively stalls the writes until all the frees are processed.

If you're comfortable with git and building ZFS from source I'd encourage you to try applying the patch in #5449 to verify it resolves the issue.

spl-0.7.0-rc3-2-g97048200 and zfs-0.7.0-rc3-57-g544b8053db don't exhibit this behaviour after a superficial investigation which would have cause the issue in the previously used version. Thank you.

behlendorf · 2017-02-01T22:31:44Z

@krichter722 thanks for verifying the patch does address the issue. Then I'm going to close this issue.

avnik · 2017-02-02T19:44:07Z

@behlendorf also fix my case possible.

(Although is was not very clean experiment, I upgrade kernel to 4.9.6 same time)

UPDATE: It behave more smoothly, but printk's still shows

krichter722 · 2017-02-09T03:14:53Z

spl-0.7.0-rc3-2-g97048200 and zfs-0.7.0-rc3-57-g544b8053db don't exhibit this behaviour after a superficial investigation which would have cause the issue in the previously used version. Thank you.

So much for superficial investigation. The mentioned versions only delay the described behaviour, but under heavy load I still experience the same behaviour and see the same stacktrace in dmesg. I assume that ZFS is supposed to perform adequate even under heavy load (I'm exaggerating with 3 VMs, browser, mail client, video player and NetBeans IDE running in parallel, now) on a not overly sophisticated setup (one HDD partition with one SSD cache on one machine which already contradicts the recommendation to use entire devices), but FYI the issue persists.

I guess that ext4 would handle this well - maybe not with the same I/O rates and without any of ZFS's features, but it wouldn't starve I/O of processes that much.

@avnik did you ever see the issue again in the meantime?

avnik · 2017-02-09T07:19:47Z

@krichter722 Also seen them, but fewer, and they stop softlocking system after I picked 0a252da Also I experience low linear read performance (is not enough for 720p video playback)

behlendorf · 2017-02-09T18:16:02Z

It sounds as if under certain workload we're allowing to much I/O to be added to a transaction group. @krichter722 just so I understand, in spite of the stacks the system does recover after it clears the I/O backlog, correct? @avnik are you saying that after 0a252da performance has degraded?

avnik · 2017-02-09T19:02:44Z

@behlendorf in my case performance on linear read (video playback) degraded after move from xfs on same hw.

0a252da improve situation which we discuss here, so big I/O backlog no more freeze whole system. (Sorry for not so clear previous message)

behlendorf · 2017-02-09T19:28:54Z

@avnik thanks for the clarification. You might try setting the recordsize property to 1M to improve your sequential read performance for large files. Keep in mind you'll need to copy the file for it to make use of the larger block size.

krichter722 · 2017-02-09T19:31:49Z

@krichter722 just so I understand, in spite of the stacks the system does recover after it clears the I/O backlog, correct?

@behlendorf The system recovers always and I didn't experience any hard lockups. Some applications crash because they have some timeout error or experience other trouble (e.g. VirtualBox sends an ATA bus error the guest OS sets its filesystem read-only), but that's not ZFS's business.

I'm using zfs-0.7.0-rc3-81-g5f0517cadd now where the issue occurs less frequent as described above.

krichter722 · 2017-02-10T20:38:13Z

You might try setting the recordsize property to 1M to improve your sequential read performance for large files. Keep in mind you'll need to copy the file for it to make use of the larger block size.

Doesn't help in my case (in case that was directed to me as well) when using VirtualBox vdi images (after changing the property and then copying the files).

behlendorf added the Type: Performance Performance improvement or performance problem label Jan 30, 2017

behlendorf closed this as completed Feb 1, 2017

krichter722 mentioned this issue Mar 6, 2017

I/O starvation during VirtualBox VM installation #5867

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

task txg_sync:19098 blocked for more than 120 seconds. #5692

task txg_sync:19098 blocked for more than 120 seconds. #5692

krichter722 commented Jan 29, 2017 •

edited

Loading

behlendorf commented Jan 30, 2017

avnik commented Jan 31, 2017

behlendorf commented Jan 31, 2017

krichter722 commented Feb 1, 2017

behlendorf commented Feb 1, 2017

avnik commented Feb 2, 2017 •

edited

Loading

krichter722 commented Feb 9, 2017 •

edited

Loading

avnik commented Feb 9, 2017

behlendorf commented Feb 9, 2017

avnik commented Feb 9, 2017 •

edited

Loading

behlendorf commented Feb 9, 2017

krichter722 commented Feb 9, 2017

krichter722 commented Feb 10, 2017

task txg_sync:19098 blocked for more than 120 seconds. #5692

task txg_sync:19098 blocked for more than 120 seconds. #5692

Comments

krichter722 commented Jan 29, 2017 • edited Loading

System information

Describe the problem you're observing

Describe how to reproduce the problem

Include any warning/errors/backtraces from the system logs

behlendorf commented Jan 30, 2017

avnik commented Jan 31, 2017

behlendorf commented Jan 31, 2017

krichter722 commented Feb 1, 2017

behlendorf commented Feb 1, 2017

avnik commented Feb 2, 2017 • edited Loading

krichter722 commented Feb 9, 2017 • edited Loading

avnik commented Feb 9, 2017

behlendorf commented Feb 9, 2017

avnik commented Feb 9, 2017 • edited Loading

behlendorf commented Feb 9, 2017

krichter722 commented Feb 9, 2017

krichter722 commented Feb 10, 2017

krichter722 commented Jan 29, 2017 •

edited

Loading

avnik commented Feb 2, 2017 •

edited

Loading

krichter722 commented Feb 9, 2017 •

edited

Loading

avnik commented Feb 9, 2017 •

edited

Loading