Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock between two receive threads when Netconf server crashes #200

Closed
parkrish opened this issue Jun 3, 2016 · 10 comments
Closed

Deadlock between two receive threads when Netconf server crashes #200

parkrish opened this issue Jun 3, 2016 · 10 comments

Comments

@parkrish
Copy link

parkrish commented Jun 3, 2016

Hi,
I have three threads in my Netconf client program, Two threads are involved in sending/receiving Netconf requests. The third thread is a notification thread for receiving notifications.

When , Netconf server crashes, The Notification thread exits as expected (Because of fix for issue, Notification thread never exits on netconf server crash #193 ).

However ,one of the receive threads detects the server failure and attempts to send nc_session_close and it gets blocked at ncntf_dispatch_stop.

(gdb) bt
#0 __lll_lock_wait ()

at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135

#1 0x00007fddde4174d4 in _L_lock_952 ()

from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007fddde417336 in __GI___pthread_mutex_lock (mutex=0x12f4798)

at ../nptl/pthread_mutex_lock.c:114

#3 0x00007fddde8511e8 in ncntf_dispatch_stop () from /usr/lib/libnetconf.so.0
#4 0x00007fddde847598 in nc_session_close () from /usr/lib/libnetconf.so.0
#5 0x00007fddde84792e in nc_session_send.isra.4.part ()

from /usr/lib/libnetconf.so.0
#6 0x00007fddde84651b in nc_session_send_reply ()

from /usr/lib/libnetconf.so.0
#7 0x00007fddde846fb1 in nc_session_recv_reply ()

from /usr/lib/libnetconf.so.0
#8 0x00007fddde849cc3 in nc_session_send_recv () from /usr/lib/libnetconf.so.0

The other thread also gets blocked waiting for lock..

(gdb) bt
#0 __lll_lock_wait ()

at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135

#1 0x00007fddde4174d4 in _L_lock_952 ()

from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007fddde417336 in __GI___pthread_mutex_lock (mutex=0x12f46f8)

at ../nptl/pthread_mutex_lock.c:114

#3 0x00007fddde847eef in nc_session_send_rpc () from /usr/lib/libnetconf.so.0
#4 0x00007fddde849c2b in nc_session_send_recv () from /usr/lib/libnetconf.so.0

Based on code flow,instead of notification thread, if any of the other two threads happen to detect failure and initiate nc_session_close, all three threads would be got into deadlock as that thread would have fetched the lock but would have got blocked at ncntf_dispatch_stop.

I guess, we may have to set session->ntf_active to 0(May be in nc_session_close), to get away from this issue.
Can you please look into this problem and provide a solution ?

Regards,
Parameswaran

@rkrejci
Copy link
Contributor

rkrejci commented Jun 7, 2016

Probably duplicates #199, please check if the problem is solved with the current master (35d8dc7)

@rkrejci rkrejci closed this as completed Jun 7, 2016
@parkrish
Copy link
Author

Hi ,

Thanks for looking into the problem.I tested with latest master code.
Unfortunately my problem is not solved yet.

There is still deadlock between the notification thread and the send/receive thread because of two locks mut_ntf and mut_session.

Notify thread holds mut_ntf and is waiting for mut_session lock.

#0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1 0x00007f8322cfd4d4 in _L_lock_952 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007f8322cfd336 in __GI___pthread_mutex_lock (mutex=0x269d078) at ../nptl/pthread_mutex_lock.c:114
#3 0x00007f832312e53a in nc_session_close (session=0x269cff0, reason=NC_SESSION_TERM_DROPPED) at src/session.c:1225
#4 0x00007f832312fcf6 in nc_session_receive (session=0x269cff0, timeout=0, msg=0x7f831c6e0e60) at src/session.c:2131
#5 0x00007f832313074d in nc_session_recv_msg (session=0x269cff0, timeout=0, msg=0x7f831c6e0e60) at src/session.c:2363
#6 0x00007f8323130eba in nc_session_recv_notif (session=0x269cff0, timeout=0, ntf=0x7f831c6e0ea0) at src/session.c:2542
#7 0x00007f832313daf3 in ncntf_dispatch_receive (session=0x269cff0, process_ntf=0x7f8323ce189c <notification_receiver>) at src/notifications.c:2681

Send /receive thread holding mut_session and waiting for mut_ntf

(gdb) bt
#0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1 0x00007f8322cfd4d4 in _L_lock_952 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007f8322cfd336 in __GI___pthread_mutex_lock (mutex=0x269d118) at ../nptl/pthread_mutex_lock.c:114
#3 0x00007f832312e2ba in ncntf_dispatch_stop (session=0x269cff0) at src/session.c:1194
#4 0x00007f832312e582 in nc_session_close (session=0x269cff0, reason=NC_SESSION_TERM_DROPPED) at src/session.c:1234
#5 0x00007f832312ec57 in nc_session_send (session=0x269cff0, msg=0x1ea8a20) at src/session.c:1524
#6 0x00007f8323131f55 in nc_session_send_reply (session=0x269cff0, rpc=0x0, reply=0x25adf80) at src/session.c:2949
#7 0x00007f83231306a8 in nc_session_receive (session=0x269cff0, timeout=100, msg=0x7fffd7866f58) at src/session.c:2345
#8 0x00007f832313074d in nc_session_recv_msg (session=0x269cff0, timeout=100, msg=0x7fffd7866f58) at src/session.c:2363
#9 0x00007f83231308bf in nc_session_recv_reply (session=0x269cff0, timeout=-1, reply=0x7fffd7867028) at src/session.c:2409
#10 0x00007f832313228a in nc_session_send_recv (session=0x269cff0, rpc=0x2c10f40, reply=0x7fffd7867028) at src/session.c:3036

Regards,
Parameswaran

@rkrejci
Copy link
Contributor

rkrejci commented Jun 15, 2016

Hi, there is now a separate branch called deadlockfix with a patch. Could you please try if that patch solves the issue?

@parkrish
Copy link
Author

parkrish commented Jun 16, 2016

Hi ,

Tested the code from latest deadlockfix branch and the issue is resolved.
Thanks for the support.
When could we possibly have a release with this fix ?

Regards,
Parameswaran

@rkrejci
Copy link
Contributor

rkrejci commented Jun 16, 2016

ok, I'll wait for response in #199 and if the fix doesn't break it, I'll merge it into the master.

@parkrish
Copy link
Author

Thanks.Will there be a new Release from the master, post the deadlock merge, any time sooner ?

Regards,
Parameswaran

@rkrejci
Copy link
Contributor

rkrejci commented Jun 17, 2016

What do you mean by "Release"?

@parkrish
Copy link
Author

Thanks.By "Release" I meant release branch like 0.9.0, 0.10.0 etc

@rkrejci
Copy link
Contributor

rkrejci commented Jun 23, 2016

By that meaining, the master branch is actually 1.0.0 - we do not add new features (changing API), just fixing the reported bugs (our focus is now on libyang, libnetconf2 and Netopeer2).

@parkrish
Copy link
Author

Thank you for the information

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants