Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

changes for making snmp socket non-blocking #3604

Merged

Conversation

sudhanshukumar22
Copy link
Contributor

Patch 1:
0006-changes-for-making-snmp-socket-non-blocking.patch
Description: The changes have been done to make the snmp socket
non-blocking before calling snmp_read()
FRR Pull request: FRRouting/frr#5134

Problem Description/Summary :
vtysh hangs on first try to enter after a reboot with BGP dynamic peers

Expected Behavior :
VTYSH should not hang.
When we debug more into bgpd docker by doing gdb on its threads, we find the below thread of bgpd, which is causing the issue.
Thread 1 (Thread 0x7f1e1ec46d40 (LWP 47)):

0x00007f1e1d762593 in recvfrom () from /lib/x86_64-linux-gnu/libpthread.so.0
0x00007f1e1aadd09b in netsnmp_tcpbase_recv () from /usr/lib/x86_64-linux-gnu/libnetsnmp.so.30
0x00007f1e1aad9617 in netsnmp_transport_recv () from /usr/lib/x86_64-linux-gnu/libnetsnmp.so.30
0x00007f1e1aab2c07 in _sess_read () from /usr/lib/x86_64-linux-gnu/libnetsnmp.so.30
0x00007f1e1aab3a29 in snmp_sess_read2 () from /usr/lib/x86_64-linux-gnu/libnetsnmp.so.30
0x00007f1e1aab3a7b in snmp_read2 () from /usr/lib/x86_64-linux-gnu/libnetsnmp.so.30
0x00007f1e1aab3acf in snmp_read () from /usr/lib/x86_64-linux-gnu/libnetsnmp.so.30
0x00007f1e1b44d7ec in agentx_read (t=0x7fffa75f0080) at lib/agentx.c:63
0x00007f1e1e7d6451 in thread_call (thread=0x7fffa75f0080) at lib/thread.c:1620
0x00007f1e1e770699 in frr_run (master=0x559396ea60f0) at lib/libfrr.c:1011
0x0000559395b4d953 in main (argc=5, argv=0x7fffa75f02b8) at bgpd/bgp_main.c:492

(gdb) bt

0x00007f830c89d210 in __read_nocancel () from /lib/x86_64-linux-gnu/libpthread.so.0
0x000056450e1e8238 in vtysh_client_run (vclient=0x56450e4a8b40 <vtysh_client+24768>, line=0x56450e21add0 enable, callback=0x0, cbarg=0x0) at vtysh/vtysh.c:216
0x000056450e1e8c6b in vtysh_client_run_all (head_client=0x56450e4a8b40 <vtysh_client+24768>, line=0x56450e21add0 enable, continue_on_err=0, callback=0x0, cbarg=0x0) at vtysh/vtysh.c:356
0x000056450e1e8ddb in vtysh_client_execute (head_client=0x56450e4a8b40 <vtysh_client+24768>, line=0x56450e21add0 enable) at vtysh/vtysh.c:393
0x000056450e1e9c82 in vtysh_execute_func (line=0x56450e21add0 enable, pager=0) at vtysh/vtysh.c:598
0x000056450e1e9dee in vtysh_execute_no_pager (line=0x56450e21add0 enable) at vtysh/vtysh.c:619
0x000056450e1f7d48 in vtysh_read_file (confp=0x56451000a9d0, top_cfg=1) at vtysh/vtysh_config.c:494
0x000056450e1f7ef2 in vtysh_read_config (config_default_dir=0x56450e4edc20 <frr_config> /etc/frr/frr.conf, top_cfg=1) at vtysh/vtysh_config.c:522
0x000056450e1e5de4 in vtysh_apply_top_level_config () at vtysh/vtysh_main.c:301
0x000056450e1e7842 in main (argc=2, argv=0x7ffc81e6f598, env=0x7ffc81e6f5b0) at vtysh/vtysh_main.c:692

The fix has been taken from the following link.
https://sourceforge.net/p/net-snmp/patches/1348/

Patch 2:
0007-prevent-dead-fd-poll-data-port-fix-from-frr.patch
Port a fix from FRR community
donaldsharp/frr@39c93f3
If we have a case where have created a fd for i/o and we have removed the
handling thread but still have the fd in the poll data structure, there
existed a case where we would get the handle this fd return from poll but we
would immediately do nothing with it because we didn't have a thread to hand
the event to.

This leads to an infinite loop. Prevent the infinite loop
from happening and log the problem.

- What I did

- How I did it

- How to verify it

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

@pavel-shirshov
Copy link
Contributor

Can you please split this PR into two? One PR, one patch?

…ket-issue, origin/bgp-snmp-socket-issue)

Author: sudhanshukumar22 <sudhanshu.kumar@broadcom.com>
Date:   Tue Oct 15 02:31:20 2019 -0700

    lib: changes for making snmp socket non-blocking

    Description: The changes have been done to make the snmp socket
    non-blocking before calling snmp_read()
    FRR Pull request: FRRouting/frr#5134

    Problem Description/Summary :
    vtysh hangs on first try to enter after a reboot with BGP dynamic peers

    Expected Behavior :
    VTYSH should not hang.
    When we debug more into bgpd docker by doing gdb on its threads, we find the below thread of bgpd, which is causing the issue.
    Thread 1 (Thread 0x7f1e1ec46d40 (LWP 47)):

    0x00007f1e1d762593 in recvfrom () from /lib/x86_64-linux-gnu/libpthread.so.0
    0x00007f1e1aadd09b in netsnmp_tcpbase_recv () from /usr/lib/x86_64-linux-gnu/libnetsnmp.so.30
    0x00007f1e1aad9617 in netsnmp_transport_recv () from /usr/lib/x86_64-linux-gnu/libnetsnmp.so.30
    0x00007f1e1aab2c07 in _sess_read () from /usr/lib/x86_64-linux-gnu/libnetsnmp.so.30
    0x00007f1e1aab3a29 in snmp_sess_read2 () from /usr/lib/x86_64-linux-gnu/libnetsnmp.so.30
    0x00007f1e1aab3a7b in snmp_read2 () from /usr/lib/x86_64-linux-gnu/libnetsnmp.so.30
    0x00007f1e1aab3acf in snmp_read () from /usr/lib/x86_64-linux-gnu/libnetsnmp.so.30
    0x00007f1e1b44d7ec in agentx_read (t=0x7fffa75f0080) at lib/agentx.c:63
    0x00007f1e1e7d6451 in thread_call (thread=0x7fffa75f0080) at lib/thread.c:1620
    0x00007f1e1e770699 in frr_run (master=0x559396ea60f0) at lib/libfrr.c:1011
    0x0000559395b4d953 in main (argc=5, argv=0x7fffa75f02b8) at bgpd/bgp_main.c:492

    (gdb) bt

    0x00007f830c89d210 in __read_nocancel () from /lib/x86_64-linux-gnu/libpthread.so.0
    0x000056450e1e8238 in vtysh_client_run (vclient=0x56450e4a8b40 <vtysh_client+24768>, line=0x56450e21add0 enable, callback=0x0, cbarg=0x0) at vtysh/vtysh.c:216
    0x000056450e1e8c6b in vtysh_client_run_all (head_client=0x56450e4a8b40 <vtysh_client+24768>, line=0x56450e21add0 enable, continue_on_err=0, callback=0x0, cbarg=0x0) at vtysh/vtysh.c:356
    0x000056450e1e8ddb in vtysh_client_execute (head_client=0x56450e4a8b40 <vtysh_client+24768>, line=0x56450e21add0 enable) at vtysh/vtysh.c:393
    0x000056450e1e9c82 in vtysh_execute_func (line=0x56450e21add0 enable, pager=0) at vtysh/vtysh.c:598
    0x000056450e1e9dee in vtysh_execute_no_pager (line=0x56450e21add0 enable) at vtysh/vtysh.c:619
    0x000056450e1f7d48 in vtysh_read_file (confp=0x56451000a9d0, top_cfg=1) at vtysh/vtysh_config.c:494
    0x000056450e1f7ef2 in vtysh_read_config (config_default_dir=0x56450e4edc20 <frr_config> /etc/frr/frr.conf, top_cfg=1) at vtysh/vtysh_config.c:522
    0x000056450e1e5de4 in vtysh_apply_top_level_config () at vtysh/vtysh_main.c:301
    0x000056450e1e7842 in main (argc=2, argv=0x7ffc81e6f598, env=0x7ffc81e6f5b0) at vtysh/vtysh_main.c:692

    The fix has been taken from the following link.
    https://sourceforge.net/p/net-snmp/patches/1348/
@sudhanshukumar22
Copy link
Contributor Author

Can you please split this PR into two? One PR, one patch?

Hi Pavel,
I modified this PR to include one patch.
For 2nd patch, I have created a new PR as below.
https://github.com/Azure/sonic-buildimage/pull/3614/files

@sudhanshukumar22 sudhanshukumar22 changed the title changes for making snmp socket non-blocking and port a fix from FRR changes for making snmp socket non-blocking Oct 16, 2019
@pavel-shirshov pavel-shirshov merged commit 48e4cf6 into sonic-net:master Oct 16, 2019
praveen-li pushed a commit to praveen-li/sonic-buildimage that referenced this pull request Feb 9, 2021
…ket-issue, origin/bgp-snmp-socket-issue) (sonic-net#3604)

Author: sudhanshukumar22 <sudhanshu.kumar@broadcom.com>
Date:   Tue Oct 15 02:31:20 2019 -0700

    lib: changes for making snmp socket non-blocking

    Description: The changes have been done to make the snmp socket
    non-blocking before calling snmp_read()
    FRR Pull request: FRRouting/frr#5134

    Problem Description/Summary :
    vtysh hangs on first try to enter after a reboot with BGP dynamic peers

    Expected Behavior :
    VTYSH should not hang.
    When we debug more into bgpd docker by doing gdb on its threads, we find the below thread of bgpd, which is causing the issue.
    Thread 1 (Thread 0x7f1e1ec46d40 (LWP 47)):

    0x00007f1e1d762593 in recvfrom () from /lib/x86_64-linux-gnu/libpthread.so.0
    0x00007f1e1aadd09b in netsnmp_tcpbase_recv () from /usr/lib/x86_64-linux-gnu/libnetsnmp.so.30
    0x00007f1e1aad9617 in netsnmp_transport_recv () from /usr/lib/x86_64-linux-gnu/libnetsnmp.so.30
    0x00007f1e1aab2c07 in _sess_read () from /usr/lib/x86_64-linux-gnu/libnetsnmp.so.30
    0x00007f1e1aab3a29 in snmp_sess_read2 () from /usr/lib/x86_64-linux-gnu/libnetsnmp.so.30
    0x00007f1e1aab3a7b in snmp_read2 () from /usr/lib/x86_64-linux-gnu/libnetsnmp.so.30
    0x00007f1e1aab3acf in snmp_read () from /usr/lib/x86_64-linux-gnu/libnetsnmp.so.30
    0x00007f1e1b44d7ec in agentx_read (t=0x7fffa75f0080) at lib/agentx.c:63
    0x00007f1e1e7d6451 in thread_call (thread=0x7fffa75f0080) at lib/thread.c:1620
    0x00007f1e1e770699 in frr_run (master=0x559396ea60f0) at lib/libfrr.c:1011
    0x0000559395b4d953 in main (argc=5, argv=0x7fffa75f02b8) at bgpd/bgp_main.c:492

    (gdb) bt

    0x00007f830c89d210 in __read_nocancel () from /lib/x86_64-linux-gnu/libpthread.so.0
    0x000056450e1e8238 in vtysh_client_run (vclient=0x56450e4a8b40 <vtysh_client+24768>, line=0x56450e21add0 enable, callback=0x0, cbarg=0x0) at vtysh/vtysh.c:216
    0x000056450e1e8c6b in vtysh_client_run_all (head_client=0x56450e4a8b40 <vtysh_client+24768>, line=0x56450e21add0 enable, continue_on_err=0, callback=0x0, cbarg=0x0) at vtysh/vtysh.c:356
    0x000056450e1e8ddb in vtysh_client_execute (head_client=0x56450e4a8b40 <vtysh_client+24768>, line=0x56450e21add0 enable) at vtysh/vtysh.c:393
    0x000056450e1e9c82 in vtysh_execute_func (line=0x56450e21add0 enable, pager=0) at vtysh/vtysh.c:598
    0x000056450e1e9dee in vtysh_execute_no_pager (line=0x56450e21add0 enable) at vtysh/vtysh.c:619
    0x000056450e1f7d48 in vtysh_read_file (confp=0x56451000a9d0, top_cfg=1) at vtysh/vtysh_config.c:494
    0x000056450e1f7ef2 in vtysh_read_config (config_default_dir=0x56450e4edc20 <frr_config> /etc/frr/frr.conf, top_cfg=1) at vtysh/vtysh_config.c:522
    0x000056450e1e5de4 in vtysh_apply_top_level_config () at vtysh/vtysh_main.c:301
    0x000056450e1e7842 in main (argc=2, argv=0x7ffc81e6f598, env=0x7ffc81e6f5b0) at vtysh/vtysh_main.c:692

    The fix has been taken from the following link.
    https://sourceforge.net/p/net-snmp/patches/1348/

Signed-off-by: Zhenggen Xu <zxu@linkedin.com>

RB=2119458
G=lnos-reviewers
R=pchaudhary,pmao,vapatil,samaity,zxu
A=
praveen-li pushed a commit to praveen-li/sonic-buildimage that referenced this pull request Feb 9, 2021
…snmp-socket-issue, origin/bgp-snmp-socket-issue) (sonic-net#3604)"

This reverts commit 5119971.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants