-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
task txg_sync:19098 blocked for more than 120 seconds. #5692
Comments
@krichter722 thank you for the detailed issue. In general, the large number of From the description you've provided it sounds like you've run in to the issue described in #5449. Internally ZFS is allowing too many free requests to be assigned to a single TXG. This effectively stalls the writes until all the frees are processed. If you're comfortable with git and building ZFS from source I'd encourage you to try applying the patch in #5449 to verify it resolves the issue. |
@behlendorf can traces which I post to zfs-discuss couple days ago be related to same problem? http://list.zfsonlinux.org/pipermail/zfs-discuss/2017-January/027254.html Replies in list hint me on hw issues, but it may be different symptoms of same problem also. |
Yes, it's possible hard to say for certain. |
spl-0.7.0-rc3-2-g97048200 and zfs-0.7.0-rc3-57-g544b8053db don't exhibit this behaviour after a superficial investigation which would have cause the issue in the previously used version. Thank you. |
@krichter722 thanks for verifying the patch does address the issue. Then I'm going to close this issue. |
@behlendorf also fix my case possible. (Although is was not very clean experiment, I upgrade kernel to 4.9.6 same time) UPDATE: It behave more smoothly, but printk's still shows |
So much for superficial investigation. The mentioned versions only delay the described behaviour, but under heavy load I still experience the same behaviour and see the same stacktrace in I guess that ext4 would handle this well - maybe not with the same I/O rates and without any of ZFS's features, but it wouldn't starve I/O of processes that much. @avnik did you ever see the issue again in the meantime? |
@krichter722 Also seen them, but fewer, and they stop softlocking system after I picked 0a252da Also I experience low linear read performance (is not enough for 720p video playback) |
It sounds as if under certain workload we're allowing to much I/O to be added to a transaction group. @krichter722 just so I understand, in spite of the stacks the system does recover after it clears the I/O backlog, correct? @avnik are you saying that after 0a252da performance has degraded? |
@behlendorf in my case performance on linear read (video playback) degraded after move from xfs on same hw. 0a252da improve situation which we discuss here, so big I/O backlog no more freeze whole system. (Sorry for not so clear previous message) |
@avnik thanks for the clarification. You might try setting the |
@behlendorf The system recovers always and I didn't experience any hard lockups. Some applications crash because they have some timeout error or experience other trouble (e.g. VirtualBox sends an ATA bus error the guest OS sets its filesystem read-only), but that's not ZFS's business. I'm using zfs-0.7.0-rc3-81-g5f0517cadd now where the issue occurs less frequent as described above. |
Doesn't help in my case (in case that was directed to me as well) when using VirtualBox vdi images (after changing the property and then copying the files). |
System information
Describe the problem you're observing
After a while (10 to 30 minutes) of heavy read and write operations, the I/O of all processes changes between a state where I/O works slowly and doesn't proceed or proceeds in not noticeable portions for seconds up to several minutes. This results in freezes of GUI and console programs.
In
iotop
I see that read and write rates rarely exceed some 100 KB/s. If the system rests for several minutes with no heavy I/O requested suddenly the read and write rates to up to the usual ~100 MB/s and the delays disappear/the situation recovers. After causing a high load of read and write requests the system gets into the state and recovers with the same symptoms.During the changes between slow and no proceeding in I/O I see > 100
z_fr_iss_[n]
threads iniotop
taking 100 % of I/O capacity. Since my HDD is quite loud (during intensive head movement) I can hear when the system start recovering. Is it really a good idea to have so many threads running, given the fact that they need to be managed and can be used for more than one task by exchanging thread stacks?Describe how to reproduce the problem
Start 2 or more VirtualBox VMs for example.
Include any warning/errors/backtraces from the system logs
dmesg
containslogged every 2 minutes (+/- some ms).
I build spl and zfs from source.
I have a 1 partition pool with 1 cache partition associated.
The text was updated successfully, but these errors were encountered: