-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zfs list slow #450
Comments
That's a lot of snapshots, but 20 minutes does seem unreasonably long. Can you try running |
It seem's that no zfs process is cpu bound, you will find an extract of iostat -kx 5 avg-cpu: %user %nice %system %iowait %steal %idle Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util can it have anything to do with this bug : http://wesunsolve.net/bugid/id/6755389 ? |
It sounds like it may be caused by a similar issue. However, the Linux port does already contain the fixes referenced in that issue. We'll need to dig in to it to determine exactly what's causing the slow down. |
Pawel Jakub Dawidek of the FreeBSD port took a look at this same issue and came up with the following patch. It still needs to be reviewed and ported to Linux but it looks very promising. http://people.freebsd.org/~pjd/patches/zfs_list_name.2.patch |
Effectively it look's promising, keep on your good work |
Looks good to me too. |
My zfs list processes are IO bound. They do this a few times per second while they appear to be reading data (almost a megabyte every few seconds) from the disk according to atop: ioctl(6, 0x5a15, 0x7fffd7e20c80) = 0 |
That I/O would likely be ZFS prefetching the additional list data. It should be reasonably straight forward to port Pawel's patch to disable this behavior if you want to try that. |
@Rudd-O Please try this patch and see if it resolves you issue. It's a port of Pawel's FreeBSD change for ZoL, note it doesn't impact the default behavior of |
@dajhorn You might want to look at this as well for zfs-auto-snapshot. If simply the list of snapshot names is enough than this should considerably speed things up. The change is safe enough so assuming it tests well for other it could be merged in soon. |
@behlendorf The zfs-auto-snapshot for ZoL does not call |
Some more background ... two identical spec. systems: $ time sudo zfs list -t snapshot | wc 142 142 7375 real 0m6.451s user 0m0.036s sys 0m0.408s # time sudo zfs list -t snapshot | wc 148 740 14217 real 0m0.253s user 0m0.012s sys 0m0.188s The first system has a lot more data in its snapshots. The first system is ~8% CPU to 'zfs' and ~1% in SYS (watching top). gdb --args zfs list -t snapshot ) run CTRL + C ) bt #0 0x00007ffff6f8f747 in ioctl () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007ffff747b4da in ?? () from /lib/libzfs.so.1 #2 0x00007ffff747f222 in zfs_iter_snapshots () from /lib/libzfs.so.1 #3 0x00000000004047d0 in ?? () #4 0x00007ffff747f0a0 in zfs_iter_filesystems () from /lib/libzfs.so.1 #5 0x00000000004047e8 in ?? () #6 0x00007ffff747f0a0 in zfs_iter_filesystems () from /lib/libzfs.so.1 #7 0x00000000004047e8 in ?? () #8 0x00007ffff747b099 in zfs_iter_root () from /lib/libzfs.so.1 ... NB version 31 refers to "support for improved 'zfs list' performance.": |
The patch won'twork for me because I need at least one property (namely the snapshot date which I use to sort the sync ops). BTW github.com/Rudd-O/zfs-tools check the zreplicate command. I am so proud of it. |
Thanks a lot, the patch really helps Now zfs list -t snapshot -o name -r pool/xxxx returns in severals seconds, Great job |
FreeBSD #xxx: Dramatically optimize listing snapshots when user requests only snapshot names and wants to sort them by name, ie. when executes: # zfs list -t snapshot -o name -s name Because only name is needed we don't have to read all snapshot properties. Below you can find how long does it take to list 34509 snapshots from a single disk pool before and after this change with cold and warm cache: before: # time zfs list -t snapshot -o name -s name > /dev/null cold cache: 525s warm cache: 218s after: # time zfs list -t snapshot -o name -s name > /dev/null cold cache: 1.7s warm cache: 1.1s NOTE: This patch only appears in FreeBSD. If/when Illumos picks up the change we may want to drop this patch and adopt their version. However, for now this addresses a real issue. Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#450
Thanks for the above information, it really helped improve my script speeds. Getting volume snapshot information went from 7 minutes (for example on a small system with 2,600 snapshots) down to 30 seconds. I think it's important to note, however, that this is not a ZoL (ZFS on Linux) issue, rather it is a ZFS issue. I see this same snapshot list speed on Solaris ZFS systems. When there is a large number of snapshots (over 1000 on a pool) scripts managing the snaps take longer and average system output is significantly reduced. BK |
I'd be curious to hear how your original scripts perform on the latest Solaris. They claim significant speed ups here in zpool version 31. But I've never seen any hard data. |
Sorry if I mislead you, perhaps I should have been more clear that the Solaris machine was a Nexenta system. So it's also running version 28. BK |
@byteharmony You could ask @mmatuska what the upstream status of that patch is. He tends to do the work to get FreeBSD improvements to ZFS sent back to Illumos, which would be needed before it goes into Nexenta. |
zfs list is very slow if it needs to retrieve other properties than name. See openzfs/zfs#450 for reference. This change uses only the name of the snapshot and uses awk to extract the snapshot date from the name and sort to sort it: 1. get all snapshot names, sorted by name 2. use grep to filter out any snapshot, that isn't correctly prefixed (maybe a check for the correct date format could also be added here) 3. use awk to extract the date and put the date before the snapshot 4. use sort to reverse sort this, first by date, then by snapshot name 5. use awk to remove the date, so that just the snapshot name is left This significally speeds it up on my system (/root/zfs-auto-snapshot is the changed one, running with -n to get only the times for snapshot retrieval): root@tatooine:/root# time /root/zfs-auto-snapshot -d -v // --label=frequent --keep=4 -n Debug: Including rpool for regular snapshot. Debug: Including rpool/DATA for regular snapshot. Debug: Including rpool/DATA/ldap for recursive snapshot. Debug: Including rpool/DATA/postgresql for recursive snapshot. Debug: Including rpool/DATA/vm for regular snapshot. Debug: Including rpool/DATA/vm/alderaan.example.net for recursive snapshot. Debug: Including rpool/DATA/vm/bespin.example.net for recursive snapshot. Debug: Including rpool/DATA/vm/coruscant.example.net for recursive snapshot. Debug: Including rpool/DATA/vm/dagobah.example.net for recursive snapshot. Debug: Including rpool/DATA/vm/dantooine.example.net for recursive snapshot. Debug: Including rpool/DATA/vm/dev.example.net for recursive snapshot. Debug: Including rpool/DATA/vm/monitor.example.net for recursive snapshot. Debug: Including rpool/DATA/vm/office.example.net for recursive snapshot. Debug: Including rpool/DATA/vm/test.example.net for recursive snapshot. Debug: Including rpool/ROOT for recursive snapshot. Debug: Excluding rpool/ROOT/ubuntu-1 because rpool/ROOT includes it recursively. Doing regular snapshots of rpool rpool/DATA rpool/DATA/vm Doing recursive snapshots of rpool/DATA/ldap rpool/DATA/postgresql rpool/DATA/vm/alderaan.example.net rpool/DATA/vm/bespin.example.net rpool/DATA/vm/coruscant.example.net rpool/DATA/vm/dagobah.example.net rpool/DATA/vm/dantooine.example.net rpool/DATA/vm/dev.example.net rpool/DATA/vm/monitor.example.net rpool/DATA/vm/office.example.net rpool/DATA/vm/test.example.net rpool/ROOT Doing a dry run. Not running these commands... zfs snapshot -o com.sun:auto-snapshot-desc='-' 'rpool@zfs-auto-snap_frequent-2013-04-16-2344' zfs destroy 'rpool@zfs-auto-snap_frequent-2013-04-16-2315' zfs snapshot -o com.sun:auto-snapshot-desc='-' 'rpool/DATA@zfs-auto-snap_frequent-2013-04-16-2344' zfs destroy 'rpool/DATA@zfs-auto-snap_frequent-2013-04-16-2315' zfs snapshot -o com.sun:auto-snapshot-desc='-' 'rpool/DATA/vm@zfs-auto-snap_frequent-2013-04-16-2344' zfs destroy 'rpool/DATA/vm@zfs-auto-snap_frequent-2013-04-16-2315' zfs snapshot -o com.sun:auto-snapshot-desc='-' -r 'rpool/DATA/ldap@zfs-auto-snap_frequent-2013-04-16-2344' zfs destroy -r 'rpool/DATA/ldap@zfs-auto-snap_frequent-2013-04-16-2315' zfs snapshot -o com.sun:auto-snapshot-desc='-' -r 'rpool/DATA/postgresql@zfs-auto-snap_frequent-2013-04-16-2344' zfs destroy -r 'rpool/DATA/postgresql@zfs-auto-snap_frequent-2013-04-16-2315' zfs snapshot -o com.sun:auto-snapshot-desc='-' -r 'rpool/DATA/vm/alderaan.example.net@zfs-auto-snap_frequent-2013-04-16-2344' zfs destroy -r 'rpool/DATA/vm/alderaan.example.net@zfs-auto-snap_frequent-2013-04-16-2315' zfs snapshot -o com.sun:auto-snapshot-desc='-' -r 'rpool/DATA/vm/bespin.example.net@zfs-auto-snap_frequent-2013-04-16-2344' zfs destroy -r 'rpool/DATA/vm/bespin.example.net@zfs-auto-snap_frequent-2013-04-16-2315' zfs snapshot -o com.sun:auto-snapshot-desc='-' -r 'rpool/DATA/vm/coruscant.example.net@zfs-auto-snap_frequent-2013-04-16-2344' zfs destroy -r 'rpool/DATA/vm/coruscant.example.net@zfs-auto-snap_frequent-2013-04-16-2315' zfs snapshot -o com.sun:auto-snapshot-desc='-' -r 'rpool/DATA/vm/dagobah.example.net@zfs-auto-snap_frequent-2013-04-16-2344' zfs destroy -r 'rpool/DATA/vm/dagobah.example.net@zfs-auto-snap_frequent-2013-04-16-2315' zfs snapshot -o com.sun:auto-snapshot-desc='-' -r 'rpool/DATA/vm/dantooine.example.net@zfs-auto-snap_frequent-2013-04-16-2344' zfs destroy -r 'rpool/DATA/vm/dantooine.example.net@zfs-auto-snap_frequent-2013-04-16-2315' zfs snapshot -o com.sun:auto-snapshot-desc='-' -r 'rpool/DATA/vm/dev.example.net@zfs-auto-snap_frequent-2013-04-16-2344' zfs destroy -r 'rpool/DATA/vm/dev.example.net@zfs-auto-snap_frequent-2013-04-16-2315' zfs snapshot -o com.sun:auto-snapshot-desc='-' -r 'rpool/DATA/vm/monitor.example.net@zfs-auto-snap_frequent-2013-04-16-2344' zfs destroy -r 'rpool/DATA/vm/monitor.example.net@zfs-auto-snap_frequent-2013-04-16-2315' zfs snapshot -o com.sun:auto-snapshot-desc='-' -r 'rpool/DATA/vm/office.example.net@zfs-auto-snap_frequent-2013-04-16-2344' zfs destroy -r 'rpool/DATA/vm/office.example.net@zfs-auto-snap_frequent-2013-04-16-2315' zfs snapshot -o com.sun:auto-snapshot-desc='-' -r 'rpool/DATA/vm/test.example.net@zfs-auto-snap_frequent-2013-04-16-2344' zfs destroy -r 'rpool/DATA/vm/test.example.net@zfs-auto-snap_frequent-2013-04-16-2315' zfs snapshot -o com.sun:auto-snapshot-desc='-' -r 'rpool/ROOT@zfs-auto-snap_frequent-2013-04-16-2344' zfs destroy -r 'rpool/ROOT@zfs-auto-snap_frequent-2013-04-16-2315' @zfs-auto-snap_frequent-2013-04-16-2344, 15 created, 15 destroyed, 0 warnings. real 0m6.936s user 0m0.076s sys 0m0.184s root@tatooine:/root# time /sbin/zfs-auto-snapshot -d -v // --label=frequent --keep=4 -n Debug: Including rpool for regular snapshot. Debug: Including rpool/DATA for regular snapshot. Debug: Including rpool/DATA/ldap for recursive snapshot. Debug: Including rpool/DATA/postgresql for recursive snapshot. Debug: Including rpool/DATA/vm for regular snapshot. Debug: Including rpool/DATA/vm/alderaan.example.net for recursive snapshot. Debug: Including rpool/DATA/vm/bespin.example.net for recursive snapshot. Debug: Including rpool/DATA/vm/coruscant.example.net for recursive snapshot. Debug: Including rpool/DATA/vm/dagobah.example.net for recursive snapshot. Debug: Including rpool/DATA/vm/dantooine.example.net for recursive snapshot. Debug: Including rpool/DATA/vm/dev.example.net for recursive snapshot. Debug: Including rpool/DATA/vm/monitor.example.net for recursive snapshot. Debug: Including rpool/DATA/vm/office.example.net for recursive snapshot. Debug: Including rpool/DATA/vm/test.example.net for recursive snapshot. Debug: Including rpool/ROOT for recursive snapshot. Debug: Excluding rpool/ROOT/ubuntu-1 because rpool/ROOT includes it recursively. Doing regular snapshots of rpool rpool/DATA rpool/DATA/vm Doing recursive snapshots of rpool/DATA/ldap rpool/DATA/postgresql rpool/DATA/vm/alderaan.example.net rpool/DATA/vm/bespin.example.net rpool/DATA/vm/coruscant.example.net rpool/DATA/vm/dagobah.example.net rpool/DATA/vm/dantooine.example.net rpool/DATA/vm/dev.example.net rpool/DATA/vm/monitor.example.net rpool/DATA/vm/office.example.net rpool/DATA/vm/test.example.net rpool/ROOT Doing a dry run. Not running these commands... zfs snapshot -o com.sun:auto-snapshot-desc='-' 'rpool@zfs-auto-snap_frequent-2013-04-16-2348' zfs destroy 'rpool@zfs-auto-snap_frequent-2013-04-16-2315' zfs snapshot -o com.sun:auto-snapshot-desc='-' 'rpool/DATA@zfs-auto-snap_frequent-2013-04-16-2348' zfs destroy 'rpool/DATA@zfs-auto-snap_frequent-2013-04-16-2315' zfs snapshot -o com.sun:auto-snapshot-desc='-' 'rpool/DATA/vm@zfs-auto-snap_frequent-2013-04-16-2348' zfs destroy 'rpool/DATA/vm@zfs-auto-snap_frequent-2013-04-16-2315' zfs snapshot -o com.sun:auto-snapshot-desc='-' -r 'rpool/DATA/ldap@zfs-auto-snap_frequent-2013-04-16-2348' zfs destroy -r 'rpool/DATA/ldap@zfs-auto-snap_frequent-2013-04-16-2315' zfs snapshot -o com.sun:auto-snapshot-desc='-' -r 'rpool/DATA/postgresql@zfs-auto-snap_frequent-2013-04-16-2348' zfs destroy -r 'rpool/DATA/postgresql@zfs-auto-snap_frequent-2013-04-16-2315' zfs snapshot -o com.sun:auto-snapshot-desc='-' -r 'rpool/DATA/vm/alderaan.example.net@zfs-auto-snap_frequent-2013-04-16-2348' zfs destroy -r 'rpool/DATA/vm/alderaan.example.net@zfs-auto-snap_frequent-2013-04-16-2315' zfs snapshot -o com.sun:auto-snapshot-desc='-' -r 'rpool/DATA/vm/bespin.example.net@zfs-auto-snap_frequent-2013-04-16-2348' zfs destroy -r 'rpool/DATA/vm/bespin.example.net@zfs-auto-snap_frequent-2013-04-16-2315' zfs snapshot -o com.sun:auto-snapshot-desc='-' -r 'rpool/DATA/vm/coruscant.example.net@zfs-auto-snap_frequent-2013-04-16-2348' zfs destroy -r 'rpool/DATA/vm/coruscant.example.net@zfs-auto-snap_frequent-2013-04-16-2315' zfs snapshot -o com.sun:auto-snapshot-desc='-' -r 'rpool/DATA/vm/dagobah.example.net@zfs-auto-snap_frequent-2013-04-16-2348' zfs destroy -r 'rpool/DATA/vm/dagobah.example.net@zfs-auto-snap_frequent-2013-04-16-2315' zfs snapshot -o com.sun:auto-snapshot-desc='-' -r 'rpool/DATA/vm/dantooine.example.net@zfs-auto-snap_frequent-2013-04-16-2348' zfs destroy -r 'rpool/DATA/vm/dantooine.example.net@zfs-auto-snap_frequent-2013-04-16-2315' zfs snapshot -o com.sun:auto-snapshot-desc='-' -r 'rpool/DATA/vm/dev.example.net@zfs-auto-snap_frequent-2013-04-16-2348' zfs destroy -r 'rpool/DATA/vm/dev.example.net@zfs-auto-snap_frequent-2013-04-16-2315' zfs snapshot -o com.sun:auto-snapshot-desc='-' -r 'rpool/DATA/vm/monitor.example.net@zfs-auto-snap_frequent-2013-04-16-2348' zfs destroy -r 'rpool/DATA/vm/monitor.example.net@zfs-auto-snap_frequent-2013-04-16-2315' zfs snapshot -o com.sun:auto-snapshot-desc='-' -r 'rpool/DATA/vm/office.example.net@zfs-auto-snap_frequent-2013-04-16-2348' zfs destroy -r 'rpool/DATA/vm/office.example.net@zfs-auto-snap_frequent-2013-04-16-2315' zfs snapshot -o com.sun:auto-snapshot-desc='-' -r 'rpool/DATA/vm/test.example.net@zfs-auto-snap_frequent-2013-04-16-2348' zfs destroy -r 'rpool/DATA/vm/test.example.net@zfs-auto-snap_frequent-2013-04-16-2315' zfs snapshot -o com.sun:auto-snapshot-desc='-' -r 'rpool/ROOT@zfs-auto-snap_frequent-2013-04-16-2348' zfs destroy -r 'rpool/ROOT@zfs-auto-snap_frequent-2013-04-16-2315' @zfs-auto-snap_frequent-2013-04-16-2348, 15 created, 15 destroyed, 0 warnings. real 3m30.995s user 0m0.152s sys 0m0.792s root@tatooine:/root# I'm not an awk god, so I tried a bit until I got a working version. There might be better ways. I also don't know if this catches every adge case, but it is a start for improvement. :)
C type coercion rules require that negative numbers be converted into positive numbers via wraparound such that a negative -1 becomes a positive 1. This causes vn_getf to return a file handle when it should return NULL whenever a positive file descriptor existed with the same value. We should check for a negative file descriptor and return NULL instead. This was caught by ClusterHQ's unit testing. Reference: http://stackoverflow.com/questions/50605/signed-to-unsigned-conversion-in-c-is-it-always-safe Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Andriy Gapon <avg@FreeBSD.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#450
…aster Merge remote-tracking branch '6.0/stage' into 'master'
Hi,
the command zfs list -t snapshot -r pool/xxxx is very very slow.
i have approx 1200 snapshot under pool/xxxx the command take's approx 20 minutes.
Any clues ?
Thanks Lee
The text was updated successfully, but these errors were encountered: