Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segfault at sys_readv () from /lib64/libvma.so #969

Open
1 of 2 tasks
syspro4 opened this issue Nov 8, 2021 · 6 comments
Open
1 of 2 tasks

segfault at sys_readv () from /lib64/libvma.so #969

syspro4 opened this issue Nov 8, 2021 · 6 comments

Comments

@syspro4
Copy link

syspro4 commented Nov 8, 2021

Subject

segfault at sys_readv () from /lib64/libvma.so

Issue type

  • Bug report
  • Feature request

Configuration:

  • Product version : libvma9.4
  • OS: Oracle Linux 8.3
  • OFED: MLNX_OFED_LINUX-5.4-1.0.3.0
  • Hardware: Mellanox Technologies MT27700 Family [ConnectX-4]

Actual behavior:

While running LD_PRELOAD with glusterd (GlusterFS) I see a segfault at sys_readv(). I enabled debug mode while compiling but I do not able to see the exact crash location inside libvma code. Following is the command I used for configuring debug build.

[root@dev-mc libvma]# ./configure --with-ofed=/usr --prefix=/usr --libdir=/usr/lib64 --includedir=/usr/include --docdir=/usr/share/doc/libvma --sysconfdir=/etc --enable-debug

Crash:

#0 0x00007f92909093f0 in sys_readv () from /lib64/libvma.so
#1 0x00007f919ecc7217 in __socket_ssl_readv (this=this@entry=0x7f9194004570, opvector=opvector@entry=0x7f9194004d08, opcount=opcount@entry=1) at socket.c:568
#2 0x00007f919ecc74ea in __socket_cached_read (opcount=1, opvector=0x7f9194004d08, this=0x7f9194004570) at socket.c:652
#3 __socket_rwv (this=this@entry=0x7f9194004570, vector=, count=count@entry=1, pending_vector=pending_vector@entry=0x7f9194004d48, pending_count=pending_count@entry=0x7f9194004d54, bytes=bytes@entry=0x0, write=0) at socket.c:734
#4 0x00007f919ecc84ab in __socket_readv (bytes=0x0, pending_count=0x7f9194004d54, pending_vector=0x7f9194004d48, count=1, vector=, this=0x7f9194004570) at socket.c:2354
#5 __socket_proto_state_machine (this=this@entry=0x7f9194004570, pollin=pollin@entry=0x7f919dfa8ef0) at socket.c:2354
#6 0x00007f919eccbda4 in socket_proto_state_machine (pollin=0x7f919dfa8ef0, this=0x7f9194004570) at socket.c:2542
#7 socket_event_poll_in (notify_handled=true, this=0x7f9194004570) at socket.c:2542
#8 socket_event_handler (event_thread_died=0 '\000', poll_err=, poll_out=, poll_in=, data=0x7f9194004570, gen=1, idx=2, fd=56) at socket.c:2948
#9 socket_event_handler (fd=fd@entry=56, idx=idx@entry=2, gen=gen@entry=1, data=data@entry=0x7f9194004570, poll_in=, poll_out=, poll_err=, event_thread_died=0 '\000') at socket.c:2868
#10 0x00007f92903099dc in event_dispatch_epoll_handler (event=0x7f919dfa8f94, event_pool=0x5598f07fab20) at event-epoll.c:692
#11 event_dispatch_epoll_worker (data=0x5598f0e03170) at event-epoll.c:803

Expected behavior:

libvma should not segfault while running with GlusterFS.

Steps to reproduce:

  1. Start glusterd in foreground with LD_PRELOAD:

    LD_PRELOAD=libvma.so /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO -N

  2. Run gluster cli command to configure gluster volume

    gluster volume info

  3. After running the cli command the glusterd gets a segfault.
@igor-ivanov
Copy link
Collaborator

Hello @syspro4
Thank you for reporting the issue.
I think that this issue might happen because of symbol sys_readv conflict. It exists in glusterfs and libvma

glusterfs: https://github.com/gluster/glusterfs/blob/2ff6e2d5e217ab555ff63026017151edf2ba1adf/rpc/rpc-transport/socket/src/socket.c#L557

libvma:

sys_readv_fn sys_readv;

Solution will be planned.

@syspro4
Copy link
Author

syspro4 commented Nov 9, 2021

Thanks for the reply!
I will change the gluster code and replace glusterfs->sys_readv to new_sys_readv() and try to use libvma.

@syspro4
Copy link
Author

syspro4 commented Nov 29, 2021

I fixed gluserfs->sys_readv to new_sys_readv() and now I can start glusterd with libvma.
But now it fails to spawn new process (glusterfsd). glusterfsd is a daemon process which does actual IOs to the underlying file system. Does libvma supports fork/execvp() system call?

In the log I see following error messages:

[2021-11-29 22:37:07.547610 +0000] I [glusterfsd.c:2418:daemonize] 0-glusterfs: Pid of current running process is 6511
[2021-11-29 22:37:10.928985 +0000] I [socket.c:929:__socket_server_bind] 0-socket.glusterfsd: closing (AF_UNIX) reuse check socket 103
[2021-11-29 22:37:10.929176 +0000] E [MSGID: 101187] [event-epoll.c:429:event_register_epoll] 0-epoll: failed to add fd to epoll [{fd=102}, {epoll_fd=52}, {errno=9}, {error=Bad file descriptor}]
[2021-11-29 22:37:10.929196 +0000] W [socket.c:3779:socket_listen] 0-socket.glusterfsd: could not register socket 102 with events; closing socket
[2021-11-29 22:37:10.929218 +0000] W [rpcsvc.c:1993:rpcsvc_create_listener] 0-rpc-service: listening on transport failed

Thanks

@igor-ivanov
Copy link
Collaborator

igor-ivanov commented Nov 30, 2021

Nice to see that sys_readv issue can be overcome.
libvma supports fork()/exec() case. See 24bd173
and
related test as https://github.com/Mellanox/libvma/tree/master/tests/simple_fork
VMA_TRACELEVEL=4 can be used to display VMA output.

@syspro4
Copy link
Author

syspro4 commented Dec 1, 2021

Thanks for the reply.
But I am getting error while running the Gluster services (glusterd & glusterfsd) in demonize mode while using libvma.
I always get same error:

[event-epoll.c:429:event_register_epoll] 0-epoll: failed to add fd to epoll [{fd=102}, {epoll_fd=52}, {errno=9}, {error=Bad file descriptor}]

Is it possible that while forking()/execing() some FDs are getting closed & hence the epoll_ctl(,EPOLL_CTL_ADD, fd, ) call is failing?

@igor-ivanov
Copy link
Collaborator

  1. I would like to inform that current master should not have symbol conflict initially reported.
  2. About segfault at sys_readv () from /lib64/libvma.so #969 (comment)
    Do you know if Gluster application uses flow described at VMA does not support more then 1 epfd listed #816?
    Could you try VMA_TRACELEVEL=4 and see suspicuos VMA output around [event-epoll.c:429:event_register_epoll] 0-epoll: failed to add fd to epoll [{fd=102}, {epoll_fd=52}, {errno=9}, {error=Bad file descriptor}]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants