-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix ZFS_DEBUG_MODIFY assert in arc_buf_try_copy_decompressed_data #10943
Fix ZFS_DEBUG_MODIFY assert in arc_buf_try_copy_decompressed_data #10943
Conversation
The assert does not account for the case where there is a single buffer in the chain that is decompressed and has a valid checksum. Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks reasonable to me. But it would be good to understand when this is possible. As I understand it you've only observed this with an encrypted pool?
That is correct. I don't know what is different about FreeBSD. Perhaps abd? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like there is more to this problem than what is being addressed here. It appears that we've called arc_buf_try_copy_decompressed_data
with a buffer which has been frozen and this function is designed to overwrite the data. I don't feel like we need to relax the assertion without understanding why we have a frozen arc_buf_t. It feels like the real fix should be to thaw the arc_buf_t before calling this function but we should understand where the real bug lives.
if (zfs_flags & ZFS_DEBUG_MODIFY) { | ||
if (nbufs) | ||
EQUIV(!copied, hdr->b_l1hdr.b_freeze_cksum == NULL); | ||
else if (!ARC_BUF_COMPRESSED(buf) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
!ARC_BUF_COMPRESSED(buf)
is unnecessary since that is already asserted on line 1712.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like there is more to this problem than what is being addressed here. It appears that we've called
arc_buf_try_copy_decompressed_data
with a buffer which has been frozen and this function is designed to overwrite the data. I don't feel like we need to relax the assertion without understanding why we have a frozen arc_buf_t. It feels like the real fix should be to thaw the arc_buf_t before calling this function but we should understand where the real bug lives.
Any suggestions on diagnosing that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a stack trace from the assertion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@grwilson This is with latest master. Let me know if there's any other state you'd like to see. Thanks.
#0 __curthread () at /usr/home/matt/devel/freebsd/sys/amd64/include/pcpu_aux.h:55
#1 doadump (textdump=-2121375832) at /usr/home/matt/devel/freebsd/sys/kern/kern_shutdown.c:394
#2 0xffffffff8049fcbe in db_fncall_generic (addr=<optimized out>, rv=<optimized out>, nargs=0,
args=<optimized out>) at /usr/home/matt/devel/freebsd/sys/ddb/db_command.c:610
#3 db_fncall (dummy1=<optimized out>, dummy2=<optimized out>, dummy3=<optimized out>, dummy4=<optimized out>)
at /usr/home/matt/devel/freebsd/sys/ddb/db_command.c:658
#4 0xffffffff8049f7f0 in db_command (last_cmdp=<optimized out>, cmd_table=<optimized out>, dopager=1)
at /usr/home/matt/devel/freebsd/sys/ddb/db_command.c:482
#5 0xffffffff8049f54d in db_command_loop () at /usr/home/matt/devel/freebsd/sys/ddb/db_command.c:535
#6 0xffffffff804a2866 in db_trap (type=<optimized out>, code=<optimized out>)
at /usr/home/matt/devel/freebsd/sys/ddb/db_main.c:270
#7 0xffffffff80c22a24 in kdb_trap (type=3, code=0, tf=<optimized out>)
at /usr/home/matt/devel/freebsd/sys/kern/subr_kdb.c:699
#8 0xffffffff8101df9e in trap (frame=0xfffffe08730aee70)
at /usr/home/matt/devel/freebsd/sys/amd64/amd64/trap.c:576
#9 <signal handler called>
#10 kdb_enter (why=0xffffffff811ec400 "panic", msg=<optimized out>)
at /usr/home/matt/devel/freebsd/sys/kern/subr_kdb.c:486
#11 0xffffffff80bd728e in vpanic (fmt=<optimized out>, ap=<optimized out>)
at /usr/home/matt/devel/freebsd/sys/kern/kern_shutdown.c:902
#12 0xffffffff8235c0ef in spl_panic (file=0xffffffff8275d461 "/home/matt/devel/ZoF/module/zfs/arc.c",
func=0xffffffff8279036c "arc_buf_try_copy_decompressed_data", line=1732,
fmt=0xffffffff82742983 "(!copied) is equivalent to (hdr->b_l1hdr.b_freeze_cksum == NULL)")
at /home/matt/devel/ZoF/module/os/freebsd/spl/spl_misc.c:107
#13 0xffffffff823ef79d in arc_buf_try_copy_decompressed_data (buf=0xfffff8008df33f00)
at /home/matt/devel/ZoF/module/zfs/arc.c:1732
#14 0xffffffff823d61aa in arc_buf_fill (buf=0xfffff8008df33f00, spa=0xfffffe0872fc8000, zb=0xfffffe08730af288,
flags=(unknown: 0)) at /home/matt/devel/ZoF/module/zfs/arc.c:2149
#15 0xffffffff823d55a5 in arc_untransform (buf=0xfffff8008df33f00, spa=0xfffffe0872fc8000,
zb=0xfffffe08730af288, in_place=0) at /home/matt/devel/ZoF/module/zfs/arc.c:2207
#16 0xffffffff82403fb4 in dbuf_read (db=0xfffff8008dac8990, zio=0x0, flags=10)
at /home/matt/devel/ZoF/module/zfs/dbuf.c:1637
#17 0xffffffff824278fe in dmu_buf_hold (os=0xfffff8008df5e000, object=1, offset=0, tag=0xffffffff8277e210,
dbp=0xfffffe08730af3d8, flags=1) at /home/matt/devel/ZoF/module/zfs/dmu.c:243
#18 0xffffffff825f629f in zap_lockdir (os=0xfffff8008df5e000, obj=1, tx=0x0, lti=RW_READER, fatreader=1,
adding=0, tag=0xffffffff8277e210, zapp=0xfffffe08730af478)
at /home/matt/devel/ZoF/module/zfs/zap_micro.c:613
#19 0xffffffff825f7c96 in zap_lookup_norm (os=0xfffff8008df5e000, zapobj=1,
name=0xffffffff827596ca "normalization", integer_size=8, num_integers=1, buf=0xfffffe08730af5d8,
mt=(unknown: 0), realname=0x0, rn_len=0, ncp=0x0) at /home/matt/devel/ZoF/module/zfs/zap_micro.c:1015
#20 0xffffffff825f7beb in zap_lookup (os=0xfffff8008df5e000, zapobj=1, name=0xffffffff827596ca "normalization",
integer_size=8, num_integers=1, buf=0xfffffe08730af5d8) at /home/matt/devel/ZoF/module/zfs/zap_micro.c:962
#21 0xffffffff8238ac49 in zfs_get_zplprop (os=0xfffff8008df5e000, prop=ZFS_PROP_NORMALIZE,
value=0xfffffe08730af5d8) at /home/matt/devel/ZoF/module/os/freebsd/zfs/zfs_vfsops.c:2203
#22 0xffffffff82622923 in nvl_add_zplprop (os=0xfffff8008df5e000, props=0xfffff80008d98f60,
prop=ZFS_PROP_NORMALIZE) at /home/matt/devel/ZoF/module/zfs/zfs_ioctl.c:2136
#23 0xffffffff8261c9f5 in zfs_ioc_objset_zplprops (zc=0xfffffe0876608000)
at /home/matt/devel/ZoF/module/zfs/zfs_ioctl.c:2175
#24 0xffffffff82613228 in zfsdev_ioctl_common (vecnum=19, zc=0xfffffe0876608000, flag=0)
at /home/matt/devel/ZoF/module/zfs/zfs_ioctl.c:7547
#25 0xffffffff8236b1d4 in zfsdev_ioctl (dev=0xfffff8008df8dc00, zcmd=3222821395, arg=0xfffffe08730afa20 "\017",
flag=3, td=0xfffffe0872d79000) at /home/matt/devel/ZoF/module/os/freebsd/zfs/kmod_core.c:170
#26 0xffffffff80a7b41c in devfs_ioctl (ap=0xfffffe08730af878)
at /usr/home/matt/devel/freebsd/sys/fs/devfs/devfs_vnops.c:944
#27 0xffffffff80ccc432 in vn_ioctl (fp=0xfffff8000c0066e0, com=<optimized out>, data=0xfffffe08730afa20,
active_cred=0xfffff800ad0f4c00, td=0x304f) at /usr/home/matt/devel/freebsd/sys/kern/vfs_vnops.c:1602
#28 0xffffffff80a7ba8e in devfs_ioctl_f (fp=0xffffffff81c8cbc8 <cnputs_mtx>, com=128, data=0xfffffe08730aee30,
cred=0xffffffff811d0a35, td=0xfffffe0872d79000)
at /usr/home/matt/devel/freebsd/sys/fs/devfs/devfs_vnops.c:875
#29 0xffffffff80c4a01e in fo_ioctl (fp=<optimized out>, com=3222821395, data=0x1c200020,
active_cred=0xffffffff811d0a35, td=0xfffffe0872d79000) at /usr/home/matt/devel/freebsd/sys/sys/file.h:340
#30 kern_ioctl (td=<optimized out>, fd=<optimized out>, com=<optimized out>,
data=0x1c200020 <error: Cannot access memory at address 0x1c200020>)
at /usr/home/matt/devel/freebsd/sys/kern/sys_generic.c:802
#31 0xffffffff80c49ce7 in sys_ioctl (td=0xfffffe0872d79000, uap=0xfffffe0872d793e8)
at /usr/home/matt/devel/freebsd/sys/kern/sys_generic.c:710
#32 0xffffffff8101ede0 in syscallenter (td=<optimized out>)
at /usr/home/matt/devel/freebsd/sys/amd64/amd64/../../kern/subr_syscall.c:162
#33 amd64_syscall (td=0xfffffe0872d79000, traced=0) at /usr/home/matt/devel/freebsd/sys/amd64/amd64/trap.c:1156
#34 <signal handler called>
#35 0x000000080084ccba in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffffffc688
#14 0xffffffff823d61aa in arc_buf_fill (buf=0xfffff8008df33f00, spa=0xfffffe0872fc8000, zb=0xfffffe08730af288,
flags=(unknown: 0)) at /home/matt/devel/ZoF/module/zfs/arc.c:2149
2149 if (arc_buf_try_copy_decompressed_data(buf)) {
2144 /*
2145 * Try copying the data from another buf which already has a
2146 * decompressed version. If that's not possible, it's time to
2147 * bite the bullet and decompress the data from the hdr.
2148 */
2149 if (arc_buf_try_copy_decompressed_data(buf)) {
2150 /* Skip byteswapping and checksumming (already done) */
2151 return (0);
2152 } else {
2153 error = zio_decompress_data(HDR_GET_COMPRESS(hdr),
#13 0xffffffff823ef79d in arc_buf_try_copy_decompressed_data (buf=0xfffff8008df33f00)
at /home/matt/devel/ZoF/module/zfs/arc.c:1732
1732 EQUIV(!copied, hdr->b_l1hdr.b_freeze_cksum == NULL);
1722 copied = B_TRUE;
1723 break;
1724 }
1725 }
1726
1727 /*
1728 * There were no decompressed bufs, so there should not be a
1729 * checksum on the hdr either.
1730 */
1731 if (zfs_flags & ZFS_DEBUG_MODIFY)
1732 EQUIV(!copied, hdr->b_l1hdr.b_freeze_cksum == NULL);
1733
1734 return (copied);
1735 }
1736
1737 /*
1738 * Allocates an ARC buf header that's in an evicted & L2-cached state.
1739 * This is used during l2arc reconstruction to make empty ARC buffers
1740 * which circumvent the regular disk->arc->l2arc path and instead come
1741 * into being in the reverse order, i.e. l2arc->arc.
$1 = 0
$2 = (zio_cksum_t *) 0xfffff80007617520
quit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@grwilson poke
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you check to see if the dbuf
that is passed into arc_untransform
is the same as dbuf->db_dnode_handle->dnh_dnode->dn_dbuf
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#17 0xffffffff824278fe in dmu_buf_hold (os=0xfffff8000d530800, object=1, offset=0, tag=0xffffffff8277e1f6,
dbp=0xfffffe08721ab3d8, flags=1) at /home/matt/devel/ZoF/module/zfs/dmu.c:243
243 err = dbuf_read(db, NULL, db_flags);
(kgdb) down
#16 0xffffffff82403fb4 in dbuf_read (db=0xfffff80187b6a4c8, zio=0x0, flags=10)
at /home/matt/devel/ZoF/module/zfs/dbuf.c:1637
1637 err = arc_untransform(db->db_buf, spa, &zb, B_FALSE);
(kgdb) list
1632 zbookmark_phys_t zb;
1633
1634 SET_BOOKMARK(&zb, dmu_objset_id(db->db_objset),
1635 db->db.db_object, db->db_level, db->db_blkid);
1636 dbuf_fix_old_data(db, spa_syncing_txg(spa));
1637 err = arc_untransform(db->db_buf, spa, &zb, B_FALSE);
1638 dbuf_set_data(db, db->db_buf);
1639 }
1640 mutex_exit(&db->db_mtx);
1641 if (err == 0 && prefetch) {
(kgdb) p db->db_dnode_handle->dnh_dnode->dn_dbuf
$3 = (struct dmu_buf_impl *) 0xfffff80187b6a660
(kgdb) p db
$4 = (dmu_buf_impl_t *) 0xfffff80187b6a4c8
`dbuf_stats_hash_table_data` can take much longer than it needs to by repeatedly bzeroing its buffer when in fact the buffer only needs to be NULL terminated. Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
@grwilson poke |
Closing at the request of the author. |
The assert does not account for the case where there is a single
buffer in the chain that is decompressed and has a valid
checksum.
The assert would be incorrectly triggered on FreeBSD by:
Signed-off-by: Matt Macy mmacy@FreeBSD.org
Motivation and Context
Description
How Has This Been Tested?
Types of changes
Checklist:
Signed-off-by
.