Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

list a zfs directory hang #1930

Closed
lidaof opened this issue Dec 5, 2013 · 7 comments
Closed

list a zfs directory hang #1930

lidaof opened this issue Dec 5, 2013 · 7 comments

Comments

@lidaof
Copy link

lidaof commented Dec 5, 2013

Hi All,

I recently did an system upgrade and seems got some problem with my system
I was not able to list some of the directory, and there are lots of errors in syslog shown below.

I am using Ubuntu 12.04.3 LTS. Any one have some suggestions to fix it?
Please let me know if you need more information. Many thanks.

Dec 5 09:27:16 corona kernel: [ 956.594281] VERIFY(BSWAP_32(sa_hdr_phys->sa_magic) == SA_MAGIC) failed
Dec 5 09:27:16 corona kernel: [ 956.594720] SPLError: 3729:0:(sa.c:1303:sa_build_index()) SPL PANIC
Dec 5 09:27:16 corona kernel: [ 956.595076] SPL: Showing stack for process 3729
Dec 5 09:27:16 corona kernel: [ 956.595081] Pid: 3729, comm: ls Tainted: P O 3.2.0-23-generic #36-Ubuntu
Dec 5 09:27:16 corona kernel: [ 956.595083] Call Trace:
Dec 5 09:27:16 corona kernel: [ 956.595103] [] spl_debug_dumpstack+0x27/0x40 [spl]
Dec 5 09:27:16 corona kernel: [ 956.595111] [] spl_debug_bug+0x82/0xe0 [spl]
Dec 5 09:27:16 corona kernel: [ 956.595153] [] sa_build_index+0x10e/0x110 [zfs]
Dec 5 09:27:16 corona kernel: [ 956.595187] [] sa_handle_get_from_db+0xda/0x120 [zfs]
Dec 5 09:27:16 corona kernel: [ 956.595224] [] zfs_znode_sa_init.isra.7+0x9f/0xd0 [zfs]
Dec 5 09:27:16 corona kernel: [ 956.595260] [] zfs_znode_alloc+0xdc/0x540 [zfs]
Dec 5 09:27:16 corona kernel: [ 956.595296] [] ? zio_wait+0x12d/0x1c0 [zfs]
Dec 5 09:27:16 corona kernel: [ 956.595317] [] ? dbuf_read+0x337/0x860 [zfs]
Dec 5 09:27:16 corona kernel: [ 956.595337] [] ? dbuf_create+0x325/0x370 [zfs]
Dec 5 09:27:16 corona kernel: [ 956.595345] [] ? mutex_lock+0x1d/0x50
Dec 5 09:27:16 corona kernel: [ 956.595352] [] ? default_spin_lock_flags+0x9/0x10
Dec 5 09:27:16 corona kernel: [ 956.595375] [] ? dmu_object_info_from_dnode+0x144/0x1b0 [zfs]
Dec 5 09:27:16 corona kernel: [ 956.595411] [] zfs_zget+0x168/0x200 [zfs]
Dec 5 09:27:16 corona kernel: [ 956.595447] [] ? zap_lookup_norm+0xd1/0x1c0 [zfs]
Dec 5 09:27:16 corona kernel: [ 956.595482] [] zfs_dirent_lock+0x4c3/0x5d0 [zfs]
Dec 5 09:27:16 corona kernel: [ 956.595518] [] zfs_dirlook+0x8b/0x300 [zfs]
Dec 5 09:27:16 corona kernel: [ 956.595554] [] ? zfs_zaccess+0x9d/0x430 [zfs]
Dec 5 09:27:16 corona kernel: [ 956.595565] [] ? tsd_exit+0x2a0/0x2d0 [spl]
Dec 5 09:27:16 corona kernel: [ 956.595601] [] zfs_lookup+0x2e1/0x330 [zfs]
Dec 5 09:27:16 corona kernel: [ 956.595636] [] zpl_lookup+0x78/0xf0 [zfs]
Dec 5 09:27:16 corona kernel: [ 956.595641] [] ? _raw_spin_lock+0xe/0x20
Dec 5 09:27:16 corona kernel: [ 956.595646] [] d_alloc_and_lookup+0x45/0x90
Dec 5 09:27:16 corona kernel: [ 956.595653] [] ? d_lookup+0x35/0x60
Dec 5 09:27:16 corona kernel: [ 956.595657] [] do_lookup+0x202/0x310
Dec 5 09:27:16 corona kernel: [ 956.595661] [] ? dput+0x1e6/0x290
Dec 5 09:27:16 corona kernel: [ 956.595665] [] path_lookupat+0x11c/0x750
Dec 5 09:27:16 corona kernel: [ 956.595673] [] ? __strncpy_from_user+0x27/0x60
Dec 5 09:27:16 corona kernel: [ 956.595677] [] do_path_lookup+0x31/0xc0
Dec 5 09:27:16 corona kernel: [ 956.595681] [] user_path_at_empty+0x59/0xa0
Dec 5 09:27:16 corona kernel: [ 956.595717] [] ? zfs_getattr_fast+0xd9/0x160 [zfs]
Dec 5 09:27:16 corona kernel: [ 956.595721] [] ? _raw_spin_lock+0xe/0x20
Dec 5 09:27:16 corona kernel: [ 956.595728] [] ? cp_new_stat+0xf8/0x110
Dec 5 09:27:16 corona kernel: [ 956.595732] [] user_path_at+0x11/0x20
Dec 5 09:27:16 corona kernel: [ 956.595736] [] vfs_fstatat+0x3a/0x70
Dec 5 09:27:16 corona kernel: [ 956.595740] [] vfs_lstat+0x1e/0x20
Dec 5 09:27:16 corona kernel: [ 956.595744] [] sys_newlstat+0x1a/0x40
Dec 5 09:27:16 corona kernel: [ 956.595750] [] system_call_fastpath+0x16/0x1b

@behlendorf
Copy link
Contributor

This may be related to #1890. Sorry, I don't have a quick fix for you, although I suspect this only impacts a very small numbers of files. If you can identify and them and quarantine them for now you can avoid this issue. This isn't a new issue, just a rare one, so I doubt it was related to the system upgrade.

@lidaof
Copy link
Author

lidaof commented Dec 5, 2013

Thanks for your response.
Previously I am using Ubuntu 12.04 LTS.
The reason I do system upgrade is that this problem also happens before I did upgrade.

Now I did zpool upgrade, after reboot twice, I didn't see this error for now at least.
I hope this issue won't come again :)
Thanks.

@lidaof
Copy link
Author

lidaof commented Dec 6, 2013

Update. This issue happened again after did zpool upgrade.

@lidaof
Copy link
Author

lidaof commented Dec 11, 2013

@behlendorf do you know how to find those problematic directories? This happened more frequently caused us not be able to work. Thanks.

@behlendorf
Copy link
Contributor

@lidaof My only quick suggestion is to try the following patch. It detects the error and instead of making it fatal returns EINVAL to the higher layers. This may allow you just to get EINVAL errors for the offending files instead of crashing the node.

diff --git a/module/zfs/sa.c b/module/zfs/sa.c
index 117d386..eaedb53 100644
--- a/module/zfs/sa.c
+++ b/module/zfs/sa.c
@@ -1300,7 +1300,11 @@ sa_build_index(sa_handle_t *hdl, sa_buf_type_t buftype)
        /* only check if not old znode */
        if (IS_SA_BONUSTYPE(bonustype) && sa_hdr_phys->sa_magic != SA_MAGIC &&
            sa_hdr_phys->sa_magic != 0) {
-               VERIFY(BSWAP_32(sa_hdr_phys->sa_magic) == SA_MAGIC);
+               if (BSWAP_32(sa_hdr_phys->sa_magic) != SA_MAGIC) {
+                       mutex_exit(&sa->sa_lock);
+                       return (EINVAL);
+               }
+
                sa_byteswap(hdl, buftype);
        }

@lidaof
Copy link
Author

lidaof commented Dec 20, 2013

@behlendorf Thank you very much.
I actually quite don't understand what's this...I guess it's a code difference.
Where the file sa.c locate and how could I apply the changes?
Do you happen to have some tutorial on this? Would be great. sorry for the trouble and many thanks again.

@behlendorf behlendorf removed this from the 0.6.4 milestone Oct 29, 2014
@behlendorf
Copy link
Contributor

There have been several fixes applied to master regarding corrupted SA which could have caused this issue. Since those problems have been resolved I'm closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants