-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make ZIL operations on zvols use _by_dnode routines #6058
Conversation
@ryao, thanks for your PR! By analyzing the history of the files in this pull request, we identified @behlendorf, @ahrens and @tuxoko to be potential reviewers. |
I am working on bigger changes, but this should help #4880. |
f489809
to
33c316d
Compare
@behlendorf The failure is #5195, which is a pre-existing issue. It is unrelated to the patch here. Also, preliminary testing at Prophetstor with a SLOG device shows a 30% improvement in IOPS from this patch alone. Edit: I neglected to notice that I had also backported the changes to use by_dnode for zvol_write and zvol_read. They had made little difference in my initial tests (without a slog), but I kept them in my local branch. The dnode_hold in dmu_write_bio definitely would have had some contention with the dnode_hold in the ZIL path. I do not have access to the machine with SLOG that was used at work to isolate this change from that, but I imagine the actual improvement is somewhat less than 30%. Nevertheless, this is a step in the right direction. I believe that the improvement comes mainly from enabling ZIL batch processing to iterate faster. |
@sempervictus If you have a synchronous IO intensive workload, this might give you a small boost. You might want to try testing it. |
Oh yeah, going into the weekend stack for sure. Thank you sir :-)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ryao definitely progress is the right direction! There was discussion in PR4802 of taking this farther and replacing the dbuf_t
with a dnode_t
in the zvol_state_t
. This would remove the need for the _by_dbuf
wrappers, and it might buy you a little more performance since you'll be able to eliminate the DB_DNODE_ENTER/EXIT
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ryao let's do this by switching the dbuf_t
to a dnode_t
in the zvol_state_t
and converting all the existing holds.
@ryao if you want these improvements to make 0.7.0 can you address the review feedback. We're trying to wrap up the release. |
@behlendorf I like the idea of switching to the _by_dnode routines. This is now refreshed. |
0efd61c
to
3f8d153
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks good. This ends up being a nice simplification too.
@ryao one last rebase to resolve a merge conflict and this should be good to go. |
@behlendorf It is rebased. |
@ryao thanks, I'll get this merged after the buildbot finishes its work. |
Those test failures dont look too good, crash seems to be coming from:
which seems related to this :) |
@sempervictus I can see why you thought this might be related since it paniced in |
Yeah... That zil PR should probably be backed out or fixed ASAP - my builders are all OK, and first ztest is usually fine, but zloops are all hanging with no stack trace. Often on fletcher inc test according to log (well, last output is fletcher, next one would be fletcher inc). Something in there is not playing well on Linux in ways more subtle than initial automated testing could detect. It also conflicts in what appear to be key areas with @ryao's zil PR, which as I understand it has been beaten up on Linux pretty heavily. Maybe we switch em out and do the conflict resolution from upstream to zol as opposed to making @ryao change his code to match something that's broken?
|
@sempervictus are you able to reproduce those |
module/zfs/dmu.c
Outdated
@@ -965,6 +965,9 @@ int | |||
dmu_read_by_dnode(dnode_t *dn, uint64_t offset, uint64_t size, void *buf, | |||
uint32_t flags) | |||
{ | |||
if (size == 0) | |||
return (0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this added? I don't see why it should be needed here or in the write path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous dmu_read_uio_dbuf()
function did it. Upon review, it turns out to be unnecessary. I'll remove it.
This continues what was started in 0eef1bd by fully converting zvols to avoid unnecessary dnode_hold() calls. This saves a small amount of CPU time and slightly improves latencies of operations on zvols. Signed-off-by: Richard Yao <richard.yao@prophetstor.com>
@behlendorf The failures seem to have disappeared. Why they appeared when I pushed this initally is something of a mystery, but we are passing now. I have made all of the changes you requested. It should be fine to merge now. |
Thanks @ryao, I'll get this merged. It turns out the original failure reported by @sempervictus wasn't introduced by this PR. I've seen it one other time against master. I believe it was either introduced by commit 1b7c1e5 or possibly 38240eb. As soon as we can determine which commit actually introduced it we're going to want to either revert that change or if it's straight forward fix it. I also resubmitted the two failed test runs for the updated version of this patch since they hit unrelated issues. I just wanted to confirm that before merging. |
Description
This continues what was started in
0eef1bd by fully converting zvols to
avoid unnecessary dnode_hold() calls. This saves a small amount of CPU
time and slightly improves latencies onsynchronous operations on zvols.
Motivation and Context
I am working on zvol performance at work. While working on larger changes, I noticed unnecessary overhead that is easy to eliminate. Eliminating it brings some small latency improvements from reduced overhead.
How Has This Been Tested?
It has been tested against prophetstor's internal 0.6.5.9 branch using fio:
The test hardware is 12x M630-DC SSDs, 256GB of RAM and dual Intel(R) Xeon(R) CPU E5-2630 v4 CPUs (20 cores). I tested on a 500GB zvol. I was careful to wait until the cache was hot enough that we weren't reading metadata before evaluating impact.
The branch is using a variation of #5824 (that predates my first day at my new employer last week), so the numbers are not directly applicable to head, but average latencies dropped by ~15%.
Types of changes
Checklist:
Signed-off-by
.