-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPLError: 1121:0:(dnode_sync.c:484:dnode_sync_free()) VERIFY3(dn->dn_bonus == ((void *)0)) failed (ffff880812c9f1c0 = #2728
Comments
@seletskiy You've hit an assertion in the code which by design basically halts the thread for further debugging. Normally these are disabled in production builds but you've clearly enabled them. In this case the ASSERT looks like it's uncovered a subtle inconsistency in the code. The Someone will need to spend some time with the code to determine if these is a real issue here or if the assertion is just wrong. But after a quick glance I suspect the ASSERT is just wrong. There may still be remaining holds on the bonus block and dnodes which would cause this. The
|
@behlendorf: Thanks for the answer! Yep, we are using debug version on some of the servers (maybe it will bring some good for the project). I've came to the similar conclusions regarding |
@seletskiy I think the debugging provides us enough to go on. We just need to determine how we should reconcile this inconsistency in the code. |
I think there is a bug and assertion is correct. We are doing some heavy But there is a problem. Inspecting It's still unclear why ZFS complaining about already exist snapshot on the end of recieve operation (looks like invalid order of commiting changes for me). |
Oh, sorry. I think you can ignore my latest comment on this issue. Looks like this userspace error and that stacktrace are not linked together. Please, see: #2739. |
I throw in a "me too". I hit this assertion during some Lustre testing. I took a snapshot of the Lustre MDT, wrote some files to the Lustre filesystem, unmounted Lustre, rolled back the MDT to the snapshot, restarted Lustre, and crashed:
|
I encountered this ASSERT while running the lustre sanity test 22: unpack tar archive as non-root user.
|
Crash's version of the above stack, cleaned up PID: 14243 TASK: ffff8800269ecae0 CPU: 0 COMMAND: "txg_sync" |
More information from the sanity test 22 instance above: dnode_sync stack frame:
dnode passed to dnode_sync(); note the dn_zfetch.zf_dnode points back to the dnode, and dn_bonus value below matches the dn_bonus value in the VERIFY3 output.
(above dnode)->dn_phys
(above dnode)->dn_bonus
likely dmu_tx_t *tx (only pointer in the frame that points to a valid tx)
the root_dir of the pool the tx points to
|
dn_bonustype = 44 (aka DMU_OT_SA) |
In the sanity test example above, dnode_syc()=>dnode_sync_free()=>dnode_evict_dbufs() cannot evict dn->dn_bonus because dn_bonus->db_holds.rc_count == 2. The entire refcount structure's values are above. Looking into whether I can tell why the refcount is not 0. |
Closing as stale. |
zfs recv
andzfs snapshot
workload caused this to happend.After that stacktrace system becomes unstable and I see following stacktraces (multiple times, so it's probably not a deadlock):
Using zfs 0.6.3.
The text was updated successfully, but these errors were encountered: