Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zebra is ignoring IPv6 link-local addresses on recent Linux kernels #19

Closed
NetDEF-CI opened this issue Dec 19, 2016 · 5 comments
Closed
Assignees
Milestone

Comments

@NetDEF-CI
Copy link
Collaborator

Issue by rwestphal
Thursday Dec 08, 2016 at 17:52 GMT
Originally opened as https://github.com/opensourcerouting/cumulus-private_quagga/issues/9


Regression introduced by commit 6a3b35 ("Zebra: Handle IPv6 address status during initialization").

More specifically, link-local addresses might not meet the following condition on some kernel versions:
https://github.com/opensourcerouting/cumulus-private_quagga/commit/6a3b35#diff-219fcf863f5de80131a9fd6c16919be1R663

Kernel version where this problem is known to happen: 4.4.0-51-generic #72-Ubuntu
Kernel version where this problem is known not to happen: 3.13.0-77-generic #121-Ubuntu

@eqvinox eqvinox added this to the 2.0-rc1 milestone Dec 20, 2016
@donaldsharp
Copy link
Member

This bug was introduced by a patch from Vivek, I'll ask him to look at it real quick

@rwestphal
Copy link
Member

Apparently this issue is a false alarm.

Vivek's change to ignore IPv6 addresses with the IFA_F_DADFAILED/IFA_F_TENTATIVE flags is correct as these addresses can not be used nor bound to.

I did some more testing here and the problem only happens on a very specific scenario. When you run a VirtualBox VM using a bridged network adapter based on a wlan interface, all IPv6 addresses added to this virtual adapter fail in the IPv6 DAD check (and thus zebra will correctly ignore them). The same doesn't happen if we use the bridge mode based on a wired ethernet interface.

Please see the output below (enp0s3 is bridged on a wlan interface and enp0s8 is bridged on a wired ethernet interface):

# ip -6 addr add 3000::1/64 dev enp0s3
# ip -6 addr add 4000::1/64 dev enp0s8
# ip -6 addr show scope global
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
    inet6 3000::1/64 scope global tentative dadfailed 
       valid_lft forever preferred_lft forever
3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
    inet6 4000::1/64 scope global 
       valid_lft forever preferred_lft forever

Something really obscure is going on here but the problem is definitely not in zebra. And using different kernel versions doesn't change anything, just the test environment.

@dsahern
Copy link

dsahern commented Feb 2, 2017

My results vary on a ubuntu 16.10 VM I just installed:

dsa@ubuntu-1610:~$ uname -a
Linux ubuntu-1610 4.8.0-22-generic #24-Ubuntu SMP Sat Oct 8 09:15:00 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

This interface is bridged to my Wi-Fi and pulls a dhcp address from the router (actual addresses obscured, but it is a valid address):
dsa@ubuntu-1610:~$ ip addr sh enp0s8
3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:3d:4d:e3 brd ff:ff:ff:ff:ff:ff
inet 172.16.xx.xx/24 brd 172.16.xx.255 scope global enp0s8
valid_lft forever preferred_lft forever
inet6 2601:282:800:xxxx:xxxx:xxxx:xxxx:xxxx/64 scope global mngtmpaddr dynamic
valid_lft 600sec preferred_lft 300sec
inet6 fe80::a00:27ff:fe3d:4de3/64 scope link
valid_lft forever preferred_lft forever

This is using Virtual Box on a Mac running 10.12.2.

@donaldsharp
Copy link
Member

@rwestphal ok to close this bug then?

@rwestphal
Copy link
Member

@donaldsharp sure.

cfra referenced this issue in opensourcerouting/frr Nov 29, 2018
Add check for mpls module to more places
@louberger louberger mentioned this issue May 1, 2019
ton31337 pushed a commit that referenced this issue Oct 17, 2020
When zebra is running with debugs turned on there
is a use after free reported by the address sanitizer:

2020/10/16 12:58:02 ZEBRA: rib_delnode: (0:254):4.5.6.16/32: rn 0x60b000026f20, re 0x6080000131a0, removing
2020/10/16 12:58:02 ZEBRA: rib_meta_queue_add: (0:254):4.5.6.16/32: queued rn 0x60b000026f20 into sub-queue 3
=================================================================
==3101430==ERROR: AddressSanitizer: heap-use-after-free on address 0x608000011d28 at pc 0x555555705ab6 bp 0x7fffffffdab0 sp 0x7fffffffdaa8
READ of size 8 at 0x608000011d28 thread T0
    #0 0x555555705ab5 in re_list_const_first zebra/rib.h:222
    #1 0x555555705b54 in re_list_first zebra/rib.h:222
    #2 0x555555711a4f in process_subq_route zebra/zebra_rib.c:2248
    #3 0x555555711d2e in process_subq zebra/zebra_rib.c:2286
    #4 0x555555711ec7 in meta_queue_process zebra/zebra_rib.c:2320
    #5 0x7ffff74701f7 in work_queue_run lib/workqueue.c:291
    #6 0x7ffff7450e9c in thread_call lib/thread.c:1581
    #7 0x7ffff738eaf7 in frr_run lib/libfrr.c:1099
    #8 0x55555561a578 in main zebra/main.c:455
    #9 0x7ffff7079cc9 in __libc_start_main ../csu/libc-start.c:308
    #10 0x5555555e3429 in _start (/usr/lib/frr/zebra+0x8f429)
0x608000011d28 is located 8 bytes inside of 88-byte region [0x608000011d20,0x608000011d78)
freed by thread T0 here:
    #0 0x7ffff768bb6f in __interceptor_free (/lib/x86_64-linux-gnu/libasan.so.6+0xa9b6f)
    #1 0x7ffff739ccad in qfree lib/memory.c:129
    #2 0x555555709ee4 in rib_gc_dest zebra/zebra_rib.c:746
    #3 0x55555570ca76 in rib_process zebra/zebra_rib.c:1240
    #4 0x555555711a05 in process_subq_route zebra/zebra_rib.c:2245
    #5 0x555555711d2e in process_subq zebra/zebra_rib.c:2286
    #6 0x555555711ec7 in meta_queue_process zebra/zebra_rib.c:2320
    #7 0x7ffff74701f7 in work_queue_run lib/workqueue.c:291
    #8 0x7ffff7450e9c in thread_call lib/thread.c:1581
    #9 0x7ffff738eaf7 in frr_run lib/libfrr.c:1099
    #10 0x55555561a578 in main zebra/main.c:455
    #11 0x7ffff7079cc9 in __libc_start_main ../csu/libc-start.c:308
previously allocated by thread T0 here:
    #0 0x7ffff768c037 in calloc (/lib/x86_64-linux-gnu/libasan.so.6+0xaa037)
    #1 0x7ffff739cb98 in qcalloc lib/memory.c:110
    #2 0x555555712ace in zebra_rib_create_dest zebra/zebra_rib.c:2515
    #3 0x555555712c6c in rib_link zebra/zebra_rib.c:2576
    #4 0x555555712faa in rib_addnode zebra/zebra_rib.c:2607
    #5 0x555555715bf0 in rib_add_multipath_nhe zebra/zebra_rib.c:3012
    #6 0x555555715f56 in rib_add_multipath zebra/zebra_rib.c:3049
    #7 0x55555571788b in rib_add zebra/zebra_rib.c:3327
    #8 0x5555555e584a in connected_up zebra/connected.c:254
    #9 0x5555555e42ff in connected_announce zebra/connected.c:94
    #10 0x5555555e4fd3 in connected_update zebra/connected.c:195
    #11 0x5555555e61ad in connected_add_ipv4 zebra/connected.c:340
    #12 0x5555555f26f5 in netlink_interface_addr zebra/if_netlink.c:1213
    #13 0x55555560f756 in netlink_information_fetch zebra/kernel_netlink.c:350
    #14 0x555555612e49 in netlink_parse_info zebra/kernel_netlink.c:941
    #15 0x55555560f9f1 in kernel_read zebra/kernel_netlink.c:402
    #16 0x7ffff7450e9c in thread_call lib/thread.c:1581
    #17 0x7ffff738eaf7 in frr_run lib/libfrr.c:1099
    #18 0x55555561a578 in main zebra/main.c:455
    #19 0x7ffff7079cc9 in __libc_start_main ../csu/libc-start.c:308
SUMMARY: AddressSanitizer: heap-use-after-free zebra/rib.h:222 in re_list_const_first

This is happening because we are using the dest pointer after a call into
rib_gc_dest.  In process_subq_route, we call rib_process() and if the
dest is deleted dest pointer is now garbage.  We must reload the
dest pointer in this case.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
ton31337 pushed a commit that referenced this issue Oct 18, 2020
When zebra is running with debugs turned on there
is a use after free reported by the address sanitizer:

2020/10/16 12:58:02 ZEBRA: rib_delnode: (0:254):4.5.6.16/32: rn 0x60b000026f20, re 0x6080000131a0, removing
2020/10/16 12:58:02 ZEBRA: rib_meta_queue_add: (0:254):4.5.6.16/32: queued rn 0x60b000026f20 into sub-queue 3
=================================================================
==3101430==ERROR: AddressSanitizer: heap-use-after-free on address 0x608000011d28 at pc 0x555555705ab6 bp 0x7fffffffdab0 sp 0x7fffffffdaa8
READ of size 8 at 0x608000011d28 thread T0
    #0 0x555555705ab5 in re_list_const_first zebra/rib.h:222
    #1 0x555555705b54 in re_list_first zebra/rib.h:222
    #2 0x555555711a4f in process_subq_route zebra/zebra_rib.c:2248
    #3 0x555555711d2e in process_subq zebra/zebra_rib.c:2286
    #4 0x555555711ec7 in meta_queue_process zebra/zebra_rib.c:2320
    #5 0x7ffff74701f7 in work_queue_run lib/workqueue.c:291
    #6 0x7ffff7450e9c in thread_call lib/thread.c:1581
    #7 0x7ffff738eaf7 in frr_run lib/libfrr.c:1099
    #8 0x55555561a578 in main zebra/main.c:455
    #9 0x7ffff7079cc9 in __libc_start_main ../csu/libc-start.c:308
    #10 0x5555555e3429 in _start (/usr/lib/frr/zebra+0x8f429)
0x608000011d28 is located 8 bytes inside of 88-byte region [0x608000011d20,0x608000011d78)
freed by thread T0 here:
    #0 0x7ffff768bb6f in __interceptor_free (/lib/x86_64-linux-gnu/libasan.so.6+0xa9b6f)
    #1 0x7ffff739ccad in qfree lib/memory.c:129
    #2 0x555555709ee4 in rib_gc_dest zebra/zebra_rib.c:746
    #3 0x55555570ca76 in rib_process zebra/zebra_rib.c:1240
    #4 0x555555711a05 in process_subq_route zebra/zebra_rib.c:2245
    #5 0x555555711d2e in process_subq zebra/zebra_rib.c:2286
    #6 0x555555711ec7 in meta_queue_process zebra/zebra_rib.c:2320
    #7 0x7ffff74701f7 in work_queue_run lib/workqueue.c:291
    #8 0x7ffff7450e9c in thread_call lib/thread.c:1581
    #9 0x7ffff738eaf7 in frr_run lib/libfrr.c:1099
    #10 0x55555561a578 in main zebra/main.c:455
    #11 0x7ffff7079cc9 in __libc_start_main ../csu/libc-start.c:308
previously allocated by thread T0 here:
    #0 0x7ffff768c037 in calloc (/lib/x86_64-linux-gnu/libasan.so.6+0xaa037)
    #1 0x7ffff739cb98 in qcalloc lib/memory.c:110
    #2 0x555555712ace in zebra_rib_create_dest zebra/zebra_rib.c:2515
    #3 0x555555712c6c in rib_link zebra/zebra_rib.c:2576
    #4 0x555555712faa in rib_addnode zebra/zebra_rib.c:2607
    #5 0x555555715bf0 in rib_add_multipath_nhe zebra/zebra_rib.c:3012
    #6 0x555555715f56 in rib_add_multipath zebra/zebra_rib.c:3049
    #7 0x55555571788b in rib_add zebra/zebra_rib.c:3327
    #8 0x5555555e584a in connected_up zebra/connected.c:254
    #9 0x5555555e42ff in connected_announce zebra/connected.c:94
    #10 0x5555555e4fd3 in connected_update zebra/connected.c:195
    #11 0x5555555e61ad in connected_add_ipv4 zebra/connected.c:340
    #12 0x5555555f26f5 in netlink_interface_addr zebra/if_netlink.c:1213
    #13 0x55555560f756 in netlink_information_fetch zebra/kernel_netlink.c:350
    #14 0x555555612e49 in netlink_parse_info zebra/kernel_netlink.c:941
    #15 0x55555560f9f1 in kernel_read zebra/kernel_netlink.c:402
    #16 0x7ffff7450e9c in thread_call lib/thread.c:1581
    #17 0x7ffff738eaf7 in frr_run lib/libfrr.c:1099
    #18 0x55555561a578 in main zebra/main.c:455
    #19 0x7ffff7079cc9 in __libc_start_main ../csu/libc-start.c:308
SUMMARY: AddressSanitizer: heap-use-after-free zebra/rib.h:222 in re_list_const_first

This is happening because we are using the dest pointer after a call into
rib_gc_dest.  In process_subq_route, we call rib_process() and if the
dest is deleted dest pointer is now garbage.  We must reload the
dest pointer in this case.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
chiragshah6 pushed a commit to chiragshah6/frr that referenced this issue Oct 27, 2020
When zebra is running with debugs turned on there
is a use after free reported by the address sanitizer:

2020/10/16 12:58:02 ZEBRA: rib_delnode: (0:254):4.5.6.16/32: rn 0x60b000026f20, re 0x6080000131a0, removing
2020/10/16 12:58:02 ZEBRA: rib_meta_queue_add: (0:254):4.5.6.16/32: queued rn 0x60b000026f20 into sub-queue 3
=================================================================
==3101430==ERROR: AddressSanitizer: heap-use-after-free on address 0x608000011d28 at pc 0x555555705ab6 bp 0x7fffffffdab0 sp 0x7fffffffdaa8
READ of size 8 at 0x608000011d28 thread T0
    #0 0x555555705ab5 in re_list_const_first zebra/rib.h:222
    FRRouting#1 0x555555705b54 in re_list_first zebra/rib.h:222
    FRRouting#2 0x555555711a4f in process_subq_route zebra/zebra_rib.c:2248
    FRRouting#3 0x555555711d2e in process_subq zebra/zebra_rib.c:2286
    FRRouting#4 0x555555711ec7 in meta_queue_process zebra/zebra_rib.c:2320
    FRRouting#5 0x7ffff74701f7 in work_queue_run lib/workqueue.c:291
    FRRouting#6 0x7ffff7450e9c in thread_call lib/thread.c:1581
    FRRouting#7 0x7ffff738eaf7 in frr_run lib/libfrr.c:1099
    FRRouting#8 0x55555561a578 in main zebra/main.c:455
    FRRouting#9 0x7ffff7079cc9 in __libc_start_main ../csu/libc-start.c:308
    FRRouting#10 0x5555555e3429 in _start (/usr/lib/frr/zebra+0x8f429)
0x608000011d28 is located 8 bytes inside of 88-byte region [0x608000011d20,0x608000011d78)
freed by thread T0 here:
    #0 0x7ffff768bb6f in __interceptor_free (/lib/x86_64-linux-gnu/libasan.so.6+0xa9b6f)
    FRRouting#1 0x7ffff739ccad in qfree lib/memory.c:129
    FRRouting#2 0x555555709ee4 in rib_gc_dest zebra/zebra_rib.c:746
    FRRouting#3 0x55555570ca76 in rib_process zebra/zebra_rib.c:1240
    FRRouting#4 0x555555711a05 in process_subq_route zebra/zebra_rib.c:2245
    FRRouting#5 0x555555711d2e in process_subq zebra/zebra_rib.c:2286
    FRRouting#6 0x555555711ec7 in meta_queue_process zebra/zebra_rib.c:2320
    FRRouting#7 0x7ffff74701f7 in work_queue_run lib/workqueue.c:291
    FRRouting#8 0x7ffff7450e9c in thread_call lib/thread.c:1581
    FRRouting#9 0x7ffff738eaf7 in frr_run lib/libfrr.c:1099
    FRRouting#10 0x55555561a578 in main zebra/main.c:455
    FRRouting#11 0x7ffff7079cc9 in __libc_start_main ../csu/libc-start.c:308
previously allocated by thread T0 here:
    #0 0x7ffff768c037 in calloc (/lib/x86_64-linux-gnu/libasan.so.6+0xaa037)
    FRRouting#1 0x7ffff739cb98 in qcalloc lib/memory.c:110
    FRRouting#2 0x555555712ace in zebra_rib_create_dest zebra/zebra_rib.c:2515
    FRRouting#3 0x555555712c6c in rib_link zebra/zebra_rib.c:2576
    FRRouting#4 0x555555712faa in rib_addnode zebra/zebra_rib.c:2607
    FRRouting#5 0x555555715bf0 in rib_add_multipath_nhe zebra/zebra_rib.c:3012
    FRRouting#6 0x555555715f56 in rib_add_multipath zebra/zebra_rib.c:3049
    FRRouting#7 0x55555571788b in rib_add zebra/zebra_rib.c:3327
    FRRouting#8 0x5555555e584a in connected_up zebra/connected.c:254
    FRRouting#9 0x5555555e42ff in connected_announce zebra/connected.c:94
    FRRouting#10 0x5555555e4fd3 in connected_update zebra/connected.c:195
    FRRouting#11 0x5555555e61ad in connected_add_ipv4 zebra/connected.c:340
    FRRouting#12 0x5555555f26f5 in netlink_interface_addr zebra/if_netlink.c:1213
    FRRouting#13 0x55555560f756 in netlink_information_fetch zebra/kernel_netlink.c:350
    FRRouting#14 0x555555612e49 in netlink_parse_info zebra/kernel_netlink.c:941
    FRRouting#15 0x55555560f9f1 in kernel_read zebra/kernel_netlink.c:402
    FRRouting#16 0x7ffff7450e9c in thread_call lib/thread.c:1581
    FRRouting#17 0x7ffff738eaf7 in frr_run lib/libfrr.c:1099
    FRRouting#18 0x55555561a578 in main zebra/main.c:455
    FRRouting#19 0x7ffff7079cc9 in __libc_start_main ../csu/libc-start.c:308
SUMMARY: AddressSanitizer: heap-use-after-free zebra/rib.h:222 in re_list_const_first

This is happening because we are using the dest pointer after a call into
rib_gc_dest.  In process_subq_route, we call rib_process() and if the
dest is deleted dest pointer is now garbage.  We must reload the
dest pointer in this case.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
louis-6wind added a commit to louis-6wind/frr that referenced this issue Dec 15, 2020
Temporal fix

Thread 2.1 "bgpd" received signal SIGSEGV, Segmentation fault.
0x00007ffff7b14180 in route_top (table=0x0) at lib/table.c:401
401		if (table->top == NULL)
(gdb) bt
\#0  0x00007ffff7b14180 in route_top (table=0x0) at lib/table.c:401
\#1  0x0000555555657286 in bgp_table_top (table=0x55555629c440) at ./bgpd/bgp_table.h:203
\#2  0x0000555555666dd0 in bgp_soft_reconfig_table_flag (srta=0x55555bc68fd0, flag=false) at bgpd/bgp_route.c:4669
\#3  0x0000555555666f5e in bgp_soft_reconfig_table_thread_cancel (nsrta=0x0, bgp=0x5555562767a0) at bgpd/bgp_route.c:4698
\FRRouting#4  0x00005555556e9463 in bgp_delete (bgp=0x5555562767a0) at bgpd/bgpd.c:3482
\FRRouting#5  0x00005555556f9ae5 in bgp_router_destroy (args=0x7fffffff6b90) at bgpd/bgp_nb_config.c:176
\FRRouting#6  0x00007ffff7ad985d in nb_callback_destroy (context=0x7fffffff7180, nb_node=0x555555c0c580, event=NB_EV_APPLY, dnode=0x5555563cdbf0, errmsg=0x7fffffff7190 "", errmsg_len=8192) at lib/northbound.c:970
\FRRouting#7  0x00007ffff7ada17a in nb_callback_configuration (context=0x7fffffff7180, event=NB_EV_APPLY, change=0x55555d5aa560, errmsg=0x7fffffff7190 "", errmsg_len=8192) at lib/northbound.c:1195
\FRRouting#8  0x00007ffff7ada564 in nb_transaction_process (event=NB_EV_APPLY, transaction=0x55556a6ed510, errmsg=0x7fffffff7190 "", errmsg_len=8192) at lib/northbound.c:1312
\FRRouting#9  0x00007ffff7ad900b in nb_candidate_commit_apply (transaction=0x55556a6ed510, save_transaction=true, transaction_id=0x0, errmsg=0x7fffffff7190 "", errmsg_len=8192) at lib/northbound.c:745
\FRRouting#10 0x00007ffff7ad912e in nb_candidate_commit (context=0x7fffffff7180, candidate=0x555555bddd00, save_transaction=true, comment=0x0, transaction_id=0x0, errmsg=0x7fffffff7190 "", errmsg_len=8192) at lib/northbound.c:777
\FRRouting#11 0x00007ffff7ae0249 in nb_cli_classic_commit (vty=0x555557b62790) at lib/northbound_cli.c:64
\FRRouting#12 0x00007ffff7ae0cce in nb_cli_apply_changes (vty=0x555557b62790, xpath_base_fmt=0x7fffffffb730 "/frr-routing:routing/control-plane-protocols/control-plane-protocol[type='frr-bgp:bgp'][name='bgp'][vrf='default']/frr-bgp:bgp") at lib/northbound_cli.c:281
\FRRouting#13 0x00005555556a01e6 in no_router_bgp (self=0x555555a28140 <no_router_bgp_cmd>, vty=0x555557b62790, argc=3, argv=0x555560be1bd0) at bgpd/bgp_vty.c:1466
\FRRouting#14 0x00007ffff7a90ebc in cmd_execute_command_real (vline=0x55556635c140, filter=FILTER_RELAXED, vty=0x555557b62790, cmd=0x0) at lib/command.c:938
\FRRouting#15 0x00007ffff7a91031 in cmd_execute_command (vline=0x55556635c140, vty=0x555557b62790, cmd=0x0, vtysh=0) at lib/command.c:997
\FRRouting#16 0x00007ffff7a91586 in cmd_execute (vty=0x555557b62790, cmd=0x555557b68f20 "no router bgp", matched=0x0, vtysh=0) at lib/command.c:1162
\FRRouting#17 0x00007ffff7b228f9 in vty_command (vty=0x555557b62790, buf=0x555557b68f20 "no router bgp") at lib/vty.c:517
\FRRouting#18 0x00007ffff7b2465b in vty_execute (vty=0x555557b62790) at lib/vty.c:1282
\FRRouting#19 0x00007ffff7b2656e in vtysh_read (thread=0x7fffffffe2e0) at lib/vty.c:2120
\FRRouting#20 0x00007ffff7b1bd23 in thread_call (thread=0x7fffffffe2e0) at lib/thread.c:1681
\FRRouting#21 0x00007ffff7ac7fc2 in frr_run (master=0x555555a6aab0) at lib/libfrr.c:1110
\FRRouting#22 0x00005555555d88b2 in main (argc=4, argv=0x7fffffffe518) at bgpd/bgp_main.c:523

Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
sworleys pushed a commit to sworleys/frr that referenced this issue Mar 19, 2021
When zebra is running with debugs turned on there
is a use after free reported by the address sanitizer:

2020/10/16 12:58:02 ZEBRA: rib_delnode: (0:254):4.5.6.16/32: rn 0x60b000026f20, re 0x6080000131a0, removing
2020/10/16 12:58:02 ZEBRA: rib_meta_queue_add: (0:254):4.5.6.16/32: queued rn 0x60b000026f20 into sub-queue 3
=================================================================
==3101430==ERROR: AddressSanitizer: heap-use-after-free on address 0x608000011d28 at pc 0x555555705ab6 bp 0x7fffffffdab0 sp 0x7fffffffdaa8
READ of size 8 at 0x608000011d28 thread T0
    #0 0x555555705ab5 in re_list_const_first zebra/rib.h:222
    #1 0x555555705b54 in re_list_first zebra/rib.h:222
    #2 0x555555711a4f in process_subq_route zebra/zebra_rib.c:2248
    FRRouting#3 0x555555711d2e in process_subq zebra/zebra_rib.c:2286
    FRRouting#4 0x555555711ec7 in meta_queue_process zebra/zebra_rib.c:2320
    FRRouting#5 0x7ffff74701f7 in work_queue_run lib/workqueue.c:291
    FRRouting#6 0x7ffff7450e9c in thread_call lib/thread.c:1581
    FRRouting#7 0x7ffff738eaf7 in frr_run lib/libfrr.c:1099
    FRRouting#8 0x55555561a578 in main zebra/main.c:455
    FRRouting#9 0x7ffff7079cc9 in __libc_start_main ../csu/libc-start.c:308
    FRRouting#10 0x5555555e3429 in _start (/usr/lib/frr/zebra+0x8f429)
0x608000011d28 is located 8 bytes inside of 88-byte region [0x608000011d20,0x608000011d78)
freed by thread T0 here:
    #0 0x7ffff768bb6f in __interceptor_free (/lib/x86_64-linux-gnu/libasan.so.6+0xa9b6f)
    #1 0x7ffff739ccad in qfree lib/memory.c:129
    #2 0x555555709ee4 in rib_gc_dest zebra/zebra_rib.c:746
    FRRouting#3 0x55555570ca76 in rib_process zebra/zebra_rib.c:1240
    FRRouting#4 0x555555711a05 in process_subq_route zebra/zebra_rib.c:2245
    FRRouting#5 0x555555711d2e in process_subq zebra/zebra_rib.c:2286
    FRRouting#6 0x555555711ec7 in meta_queue_process zebra/zebra_rib.c:2320
    FRRouting#7 0x7ffff74701f7 in work_queue_run lib/workqueue.c:291
    FRRouting#8 0x7ffff7450e9c in thread_call lib/thread.c:1581
    FRRouting#9 0x7ffff738eaf7 in frr_run lib/libfrr.c:1099
    FRRouting#10 0x55555561a578 in main zebra/main.c:455
    FRRouting#11 0x7ffff7079cc9 in __libc_start_main ../csu/libc-start.c:308
previously allocated by thread T0 here:
    #0 0x7ffff768c037 in calloc (/lib/x86_64-linux-gnu/libasan.so.6+0xaa037)
    #1 0x7ffff739cb98 in qcalloc lib/memory.c:110
    #2 0x555555712ace in zebra_rib_create_dest zebra/zebra_rib.c:2515
    FRRouting#3 0x555555712c6c in rib_link zebra/zebra_rib.c:2576
    FRRouting#4 0x555555712faa in rib_addnode zebra/zebra_rib.c:2607
    FRRouting#5 0x555555715bf0 in rib_add_multipath_nhe zebra/zebra_rib.c:3012
    FRRouting#6 0x555555715f56 in rib_add_multipath zebra/zebra_rib.c:3049
    FRRouting#7 0x55555571788b in rib_add zebra/zebra_rib.c:3327
    FRRouting#8 0x5555555e584a in connected_up zebra/connected.c:254
    FRRouting#9 0x5555555e42ff in connected_announce zebra/connected.c:94
    FRRouting#10 0x5555555e4fd3 in connected_update zebra/connected.c:195
    FRRouting#11 0x5555555e61ad in connected_add_ipv4 zebra/connected.c:340
    FRRouting#12 0x5555555f26f5 in netlink_interface_addr zebra/if_netlink.c:1213
    FRRouting#13 0x55555560f756 in netlink_information_fetch zebra/kernel_netlink.c:350
    FRRouting#14 0x555555612e49 in netlink_parse_info zebra/kernel_netlink.c:941
    FRRouting#15 0x55555560f9f1 in kernel_read zebra/kernel_netlink.c:402
    FRRouting#16 0x7ffff7450e9c in thread_call lib/thread.c:1581
    FRRouting#17 0x7ffff738eaf7 in frr_run lib/libfrr.c:1099
    FRRouting#18 0x55555561a578 in main zebra/main.c:455
    FRRouting#19 0x7ffff7079cc9 in __libc_start_main ../csu/libc-start.c:308
SUMMARY: AddressSanitizer: heap-use-after-free zebra/rib.h:222 in re_list_const_first

This is happening because we are using the dest pointer after a call into
rib_gc_dest.  In process_subq_route, we call rib_process() and if the
dest is deleted dest pointer is now garbage.  We must reload the
dest pointer in this case.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
ranjanyash54 pushed a commit to ranjanyash54/frr that referenced this issue Aug 18, 2021
lib: fix cmgd commit-apply delay.
taspelund pushed a commit to taspelund/frr that referenced this issue Dec 15, 2022
ASAN reported the following memleak:
```
Direct leak of 40 byte(s) in 1 object(s) allocated from:
    #0 0x4d4342 in calloc (/usr/lib/frr/bgpd+0x4d4342)
    FRRouting#1 0xbc3d68 in qcalloc /home/sharpd/frr8/lib/memory.c:116:27
    FRRouting#2 0xb869f7 in list_new /home/sharpd/frr8/lib/linklist.c:64:9
    FRRouting#3 0x5a38bc in bgp_evpn_remote_ip_hash_alloc /home/sharpd/frr8/bgpd/bgp_evpn.c:6789:24
    FRRouting#4 0xb358d3 in hash_get /home/sharpd/frr8/lib/hash.c:162:13
    FRRouting#5 0x593d39 in bgp_evpn_remote_ip_hash_add /home/sharpd/frr8/bgpd/bgp_evpn.c:6881:7
    FRRouting#6 0x59dbbd in install_evpn_route_entry_in_vni_common /home/sharpd/frr8/bgpd/bgp_evpn.c:3049:2
    FRRouting#7 0x59cfe0 in install_evpn_route_entry_in_vni_ip /home/sharpd/frr8/bgpd/bgp_evpn.c:3126:8
    FRRouting#8 0x59c6f0 in install_evpn_route_entry /home/sharpd/frr8/bgpd/bgp_evpn.c:3318:8
    FRRouting#9 0x59bb52 in install_uninstall_route_in_vnis /home/sharpd/frr8/bgpd/bgp_evpn.c:3888:10
    FRRouting#10 0x59b6d2 in bgp_evpn_install_uninstall_table /home/sharpd/frr8/bgpd/bgp_evpn.c:4019:5
    FRRouting#11 0x578857 in install_uninstall_evpn_route /home/sharpd/frr8/bgpd/bgp_evpn.c:4051:9
    FRRouting#12 0x58ada6 in bgp_evpn_import_route /home/sharpd/frr8/bgpd/bgp_evpn.c:6049:9
    FRRouting#13 0x713794 in bgp_update /home/sharpd/frr8/bgpd/bgp_route.c:4842:3
    FRRouting#14 0x583fa0 in process_type2_route /home/sharpd/frr8/bgpd/bgp_evpn.c:4518:9
    FRRouting#15 0x5824ba in bgp_nlri_parse_evpn /home/sharpd/frr8/bgpd/bgp_evpn.c:5732:8
    FRRouting#16 0x6ae6a2 in bgp_nlri_parse /home/sharpd/frr8/bgpd/bgp_packet.c:363:10
    FRRouting#17 0x6be6fa in bgp_update_receive /home/sharpd/frr8/bgpd/bgp_packet.c:2020:15
    FRRouting#18 0x6b7433 in bgp_process_packet /home/sharpd/frr8/bgpd/bgp_packet.c:2929:11
    FRRouting#19 0xd00146 in thread_call /home/sharpd/frr8/lib/thread.c:2006:2
```

The list itself was not being cleaned up when the final list entry was
removed, so make sure we do that instead of leaking memory.

Signed-off-by: Trey Aspelund <taspelund@nvidia.com>
pguibert6WIND added a commit to pguibert6WIND/frr that referenced this issue Oct 7, 2024
The following ASAN issue has been observed:

> ERROR: AddressSanitizer: heap-use-after-free on address 0x6160000acba4 at pc 0x55910c5694d0 bp 0x7ffe3a8ac850 sp 0x7ffe3a8ac840
> READ of size 4 at 0x6160000acba4 thread T0
>         #0 0x55910c5694cf in ctx_info_from_zns zebra/zebra_dplane.c:3315
>     FRRouting#1 0x55910c569696 in dplane_ctx_ns_init zebra/zebra_dplane.c:3331
>     FRRouting#2 0x55910c56bf61 in dplane_ctx_nexthop_init zebra/zebra_dplane.c:3680
>     FRRouting#3 0x55910c5711ca in dplane_nexthop_update_internal zebra/zebra_dplane.c:4490
>     FRRouting#4 0x55910c571c5c in dplane_nexthop_delete zebra/zebra_dplane.c:4717
>     FRRouting#5 0x55910c61e90e in zebra_nhg_uninstall_kernel zebra/zebra_nhg.c:3413
>     FRRouting#6 0x55910c615d8a in zebra_nhg_decrement_ref zebra/zebra_nhg.c:1919
>     FRRouting#7 0x55910c6404db in route_entry_update_nhe zebra/zebra_rib.c:454
>     FRRouting#8 0x55910c64c904 in rib_re_nhg_free zebra/zebra_rib.c:2822
>     FRRouting#9 0x55910c655be2 in rib_unlink zebra/zebra_rib.c:4212
>     FRRouting#10 0x55910c6430f9 in zebra_rtable_node_cleanup zebra/zebra_rib.c:968
>     FRRouting#11 0x7f26f275b8a9 in route_node_free lib/table.c:75
>     FRRouting#12 0x7f26f275bae4 in route_table_free lib/table.c:111
>     FRRouting#13 0x7f26f275b749 in route_table_finish lib/table.c:46
>     FRRouting#14 0x55910c65db17 in zebra_router_free_table zebra/zebra_router.c:191
>     FRRouting#15 0x55910c65dfb5 in zebra_router_terminate zebra/zebra_router.c:244
>     FRRouting#16 0x55910c4f40db in zebra_finalize zebra/main.c:249
>     FRRouting#17 0x7f26f2777108 in event_call lib/event.c:2011
>     FRRouting#18 0x7f26f264180e in frr_run lib/libfrr.c:1212
>     FRRouting#19 0x55910c4f49cb in main zebra/main.c:531
>     FRRouting#20 0x7f26f2029d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>     FRRouting#21 0x7f26f2029e3f in __libc_start_main_impl ../csu/libc-start.c:392
>     FRRouting#22 0x55910c4b0114 in _start (/usr/lib/frr/zebra+0x1ae114)

It happens with FRR using the kernel. During shutdown, the
namespace identifier is attempted to be obtained by zebra, in an
attempt to prepare zebra dataplane nexthop messages.

Fix this by accessing the ns structure.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
pguibert6WIND added a commit to pguibert6WIND/frr that referenced this issue Oct 7, 2024
The following ASAN issue has been observed:

> ERROR: AddressSanitizer: heap-use-after-free on address 0x6160000acba4 at pc 0x55910c5694d0 bp 0x7ffe3a8ac850 sp 0x7ffe3a8ac840
> READ of size 4 at 0x6160000acba4 thread T0
>         #0 0x55910c5694cf in ctx_info_from_zns zebra/zebra_dplane.c:3315
>     FRRouting#1 0x55910c569696 in dplane_ctx_ns_init zebra/zebra_dplane.c:3331
>     FRRouting#2 0x55910c56bf61 in dplane_ctx_nexthop_init zebra/zebra_dplane.c:3680
>     FRRouting#3 0x55910c5711ca in dplane_nexthop_update_internal zebra/zebra_dplane.c:4490
>     FRRouting#4 0x55910c571c5c in dplane_nexthop_delete zebra/zebra_dplane.c:4717
>     FRRouting#5 0x55910c61e90e in zebra_nhg_uninstall_kernel zebra/zebra_nhg.c:3413
>     FRRouting#6 0x55910c615d8a in zebra_nhg_decrement_ref zebra/zebra_nhg.c:1919
>     FRRouting#7 0x55910c6404db in route_entry_update_nhe zebra/zebra_rib.c:454
>     FRRouting#8 0x55910c64c904 in rib_re_nhg_free zebra/zebra_rib.c:2822
>     FRRouting#9 0x55910c655be2 in rib_unlink zebra/zebra_rib.c:4212
>     FRRouting#10 0x55910c6430f9 in zebra_rtable_node_cleanup zebra/zebra_rib.c:968
>     FRRouting#11 0x7f26f275b8a9 in route_node_free lib/table.c:75
>     FRRouting#12 0x7f26f275bae4 in route_table_free lib/table.c:111
>     FRRouting#13 0x7f26f275b749 in route_table_finish lib/table.c:46
>     FRRouting#14 0x55910c65db17 in zebra_router_free_table zebra/zebra_router.c:191
>     FRRouting#15 0x55910c65dfb5 in zebra_router_terminate zebra/zebra_router.c:244
>     FRRouting#16 0x55910c4f40db in zebra_finalize zebra/main.c:249
>     FRRouting#17 0x7f26f2777108 in event_call lib/event.c:2011
>     FRRouting#18 0x7f26f264180e in frr_run lib/libfrr.c:1212
>     FRRouting#19 0x55910c4f49cb in main zebra/main.c:531
>     FRRouting#20 0x7f26f2029d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>     FRRouting#21 0x7f26f2029e3f in __libc_start_main_impl ../csu/libc-start.c:392
>     FRRouting#22 0x55910c4b0114 in _start (/usr/lib/frr/zebra+0x1ae114)

It happens with FRR using the kernel. During shutdown, the
namespace identifier is attempted to be obtained by zebra, in an
attempt to prepare zebra dataplane nexthop messages.

Fix this by accessing the ns structure.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
pguibert6WIND added a commit to pguibert6WIND/frr that referenced this issue Oct 7, 2024
The following ASAN issue has been observed:

> ERROR: AddressSanitizer: heap-use-after-free on address 0x6160000acba4 at pc 0x55910c5694d0 bp 0x7ffe3a8ac850 sp 0x7ffe3a8ac840
> READ of size 4 at 0x6160000acba4 thread T0
>         #0 0x55910c5694cf in ctx_info_from_zns zebra/zebra_dplane.c:3315
>     FRRouting#1 0x55910c569696 in dplane_ctx_ns_init zebra/zebra_dplane.c:3331
>     FRRouting#2 0x55910c56bf61 in dplane_ctx_nexthop_init zebra/zebra_dplane.c:3680
>     FRRouting#3 0x55910c5711ca in dplane_nexthop_update_internal zebra/zebra_dplane.c:4490
>     FRRouting#4 0x55910c571c5c in dplane_nexthop_delete zebra/zebra_dplane.c:4717
>     FRRouting#5 0x55910c61e90e in zebra_nhg_uninstall_kernel zebra/zebra_nhg.c:3413
>     FRRouting#6 0x55910c615d8a in zebra_nhg_decrement_ref zebra/zebra_nhg.c:1919
>     FRRouting#7 0x55910c6404db in route_entry_update_nhe zebra/zebra_rib.c:454
>     FRRouting#8 0x55910c64c904 in rib_re_nhg_free zebra/zebra_rib.c:2822
>     FRRouting#9 0x55910c655be2 in rib_unlink zebra/zebra_rib.c:4212
>     FRRouting#10 0x55910c6430f9 in zebra_rtable_node_cleanup zebra/zebra_rib.c:968
>     FRRouting#11 0x7f26f275b8a9 in route_node_free lib/table.c:75
>     FRRouting#12 0x7f26f275bae4 in route_table_free lib/table.c:111
>     FRRouting#13 0x7f26f275b749 in route_table_finish lib/table.c:46
>     FRRouting#14 0x55910c65db17 in zebra_router_free_table zebra/zebra_router.c:191
>     FRRouting#15 0x55910c65dfb5 in zebra_router_terminate zebra/zebra_router.c:244
>     FRRouting#16 0x55910c4f40db in zebra_finalize zebra/main.c:249
>     FRRouting#17 0x7f26f2777108 in event_call lib/event.c:2011
>     FRRouting#18 0x7f26f264180e in frr_run lib/libfrr.c:1212
>     FRRouting#19 0x55910c4f49cb in main zebra/main.c:531
>     FRRouting#20 0x7f26f2029d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>     FRRouting#21 0x7f26f2029e3f in __libc_start_main_impl ../csu/libc-start.c:392
>     FRRouting#22 0x55910c4b0114 in _start (/usr/lib/frr/zebra+0x1ae114)

It happens with FRR using the kernel. During shutdown, the
namespace identifier is attempted to be obtained by zebra, in an
attempt to prepare zebra dataplane nexthop messages.

Fix this by accessing the ns structure.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
pguibert6WIND added a commit to pguibert6WIND/frr that referenced this issue Oct 8, 2024
The following ASAN issue has been observed:

> ERROR: AddressSanitizer: heap-use-after-free on address 0x6160000acba4 at pc 0x55910c5694d0 bp 0x7ffe3a8ac850 sp 0x7ffe3a8ac840
> READ of size 4 at 0x6160000acba4 thread T0
>         #0 0x55910c5694cf in ctx_info_from_zns zebra/zebra_dplane.c:3315
>     FRRouting#1 0x55910c569696 in dplane_ctx_ns_init zebra/zebra_dplane.c:3331
>     FRRouting#2 0x55910c56bf61 in dplane_ctx_nexthop_init zebra/zebra_dplane.c:3680
>     FRRouting#3 0x55910c5711ca in dplane_nexthop_update_internal zebra/zebra_dplane.c:4490
>     FRRouting#4 0x55910c571c5c in dplane_nexthop_delete zebra/zebra_dplane.c:4717
>     FRRouting#5 0x55910c61e90e in zebra_nhg_uninstall_kernel zebra/zebra_nhg.c:3413
>     FRRouting#6 0x55910c615d8a in zebra_nhg_decrement_ref zebra/zebra_nhg.c:1919
>     FRRouting#7 0x55910c6404db in route_entry_update_nhe zebra/zebra_rib.c:454
>     FRRouting#8 0x55910c64c904 in rib_re_nhg_free zebra/zebra_rib.c:2822
>     FRRouting#9 0x55910c655be2 in rib_unlink zebra/zebra_rib.c:4212
>     FRRouting#10 0x55910c6430f9 in zebra_rtable_node_cleanup zebra/zebra_rib.c:968
>     FRRouting#11 0x7f26f275b8a9 in route_node_free lib/table.c:75
>     FRRouting#12 0x7f26f275bae4 in route_table_free lib/table.c:111
>     FRRouting#13 0x7f26f275b749 in route_table_finish lib/table.c:46
>     FRRouting#14 0x55910c65db17 in zebra_router_free_table zebra/zebra_router.c:191
>     FRRouting#15 0x55910c65dfb5 in zebra_router_terminate zebra/zebra_router.c:244
>     FRRouting#16 0x55910c4f40db in zebra_finalize zebra/main.c:249
>     FRRouting#17 0x7f26f2777108 in event_call lib/event.c:2011
>     FRRouting#18 0x7f26f264180e in frr_run lib/libfrr.c:1212
>     FRRouting#19 0x55910c4f49cb in main zebra/main.c:531
>     FRRouting#20 0x7f26f2029d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>     FRRouting#21 0x7f26f2029e3f in __libc_start_main_impl ../csu/libc-start.c:392
>     FRRouting#22 0x55910c4b0114 in _start (/usr/lib/frr/zebra+0x1ae114)

It happens with FRR using the kernel. During shutdown, the
namespace identifier is attempted to be obtained by zebra, in an
attempt to prepare zebra dataplane nexthop messages.

Fix this by accessing the ns structure.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
pguibert6WIND added a commit to pguibert6WIND/frr that referenced this issue Oct 8, 2024
The following ASAN issue has been observed:

> ERROR: AddressSanitizer: heap-use-after-free on address 0x6160000acba4 at pc 0x55910c5694d0 bp 0x7ffe3a8ac850 sp 0x7ffe3a8ac840
> READ of size 4 at 0x6160000acba4 thread T0
>         #0 0x55910c5694cf in ctx_info_from_zns zebra/zebra_dplane.c:3315
>     FRRouting#1 0x55910c569696 in dplane_ctx_ns_init zebra/zebra_dplane.c:3331
>     FRRouting#2 0x55910c56bf61 in dplane_ctx_nexthop_init zebra/zebra_dplane.c:3680
>     FRRouting#3 0x55910c5711ca in dplane_nexthop_update_internal zebra/zebra_dplane.c:4490
>     FRRouting#4 0x55910c571c5c in dplane_nexthop_delete zebra/zebra_dplane.c:4717
>     FRRouting#5 0x55910c61e90e in zebra_nhg_uninstall_kernel zebra/zebra_nhg.c:3413
>     FRRouting#6 0x55910c615d8a in zebra_nhg_decrement_ref zebra/zebra_nhg.c:1919
>     FRRouting#7 0x55910c6404db in route_entry_update_nhe zebra/zebra_rib.c:454
>     FRRouting#8 0x55910c64c904 in rib_re_nhg_free zebra/zebra_rib.c:2822
>     FRRouting#9 0x55910c655be2 in rib_unlink zebra/zebra_rib.c:4212
>     FRRouting#10 0x55910c6430f9 in zebra_rtable_node_cleanup zebra/zebra_rib.c:968
>     FRRouting#11 0x7f26f275b8a9 in route_node_free lib/table.c:75
>     FRRouting#12 0x7f26f275bae4 in route_table_free lib/table.c:111
>     FRRouting#13 0x7f26f275b749 in route_table_finish lib/table.c:46
>     FRRouting#14 0x55910c65db17 in zebra_router_free_table zebra/zebra_router.c:191
>     FRRouting#15 0x55910c65dfb5 in zebra_router_terminate zebra/zebra_router.c:244
>     FRRouting#16 0x55910c4f40db in zebra_finalize zebra/main.c:249
>     FRRouting#17 0x7f26f2777108 in event_call lib/event.c:2011
>     FRRouting#18 0x7f26f264180e in frr_run lib/libfrr.c:1212
>     FRRouting#19 0x55910c4f49cb in main zebra/main.c:531
>     FRRouting#20 0x7f26f2029d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>     FRRouting#21 0x7f26f2029e3f in __libc_start_main_impl ../csu/libc-start.c:392
>     FRRouting#22 0x55910c4b0114 in _start (/usr/lib/frr/zebra+0x1ae114)

It happens with FRR using the kernel. During shutdown, the
namespace identifier is attempted to be obtained by zebra, in an
attempt to prepare zebra dataplane nexthop messages.

Fix this by accessing the ns structure.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
pguibert6WIND added a commit to pguibert6WIND/frr that referenced this issue Oct 8, 2024
The following ASAN issue has been observed:

> ERROR: AddressSanitizer: heap-use-after-free on address 0x6160000acba4 at pc 0x55910c5694d0 bp 0x7ffe3a8ac850 sp 0x7ffe3a8ac840
> READ of size 4 at 0x6160000acba4 thread T0
>         #0 0x55910c5694cf in ctx_info_from_zns zebra/zebra_dplane.c:3315
>     FRRouting#1 0x55910c569696 in dplane_ctx_ns_init zebra/zebra_dplane.c:3331
>     FRRouting#2 0x55910c56bf61 in dplane_ctx_nexthop_init zebra/zebra_dplane.c:3680
>     FRRouting#3 0x55910c5711ca in dplane_nexthop_update_internal zebra/zebra_dplane.c:4490
>     FRRouting#4 0x55910c571c5c in dplane_nexthop_delete zebra/zebra_dplane.c:4717
>     FRRouting#5 0x55910c61e90e in zebra_nhg_uninstall_kernel zebra/zebra_nhg.c:3413
>     FRRouting#6 0x55910c615d8a in zebra_nhg_decrement_ref zebra/zebra_nhg.c:1919
>     FRRouting#7 0x55910c6404db in route_entry_update_nhe zebra/zebra_rib.c:454
>     FRRouting#8 0x55910c64c904 in rib_re_nhg_free zebra/zebra_rib.c:2822
>     FRRouting#9 0x55910c655be2 in rib_unlink zebra/zebra_rib.c:4212
>     FRRouting#10 0x55910c6430f9 in zebra_rtable_node_cleanup zebra/zebra_rib.c:968
>     FRRouting#11 0x7f26f275b8a9 in route_node_free lib/table.c:75
>     FRRouting#12 0x7f26f275bae4 in route_table_free lib/table.c:111
>     FRRouting#13 0x7f26f275b749 in route_table_finish lib/table.c:46
>     FRRouting#14 0x55910c65db17 in zebra_router_free_table zebra/zebra_router.c:191
>     FRRouting#15 0x55910c65dfb5 in zebra_router_terminate zebra/zebra_router.c:244
>     FRRouting#16 0x55910c4f40db in zebra_finalize zebra/main.c:249
>     FRRouting#17 0x7f26f2777108 in event_call lib/event.c:2011
>     FRRouting#18 0x7f26f264180e in frr_run lib/libfrr.c:1212
>     FRRouting#19 0x55910c4f49cb in main zebra/main.c:531
>     FRRouting#20 0x7f26f2029d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>     FRRouting#21 0x7f26f2029e3f in __libc_start_main_impl ../csu/libc-start.c:392
>     FRRouting#22 0x55910c4b0114 in _start (/usr/lib/frr/zebra+0x1ae114)

It happens with FRR using the kernel. During shutdown, the
namespace identifier is attempted to be obtained by zebra, in an
attempt to prepare zebra dataplane nexthop messages.

Fix this by accessing the ns structure.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
pguibert6WIND added a commit to pguibert6WIND/frr that referenced this issue Oct 8, 2024
The following ASAN issue has been observed:

> ERROR: AddressSanitizer: heap-use-after-free on address 0x6160000acba4 at pc 0x55910c5694d0 bp 0x7ffe3a8ac850 sp 0x7ffe3a8ac840
> READ of size 4 at 0x6160000acba4 thread T0
>         #0 0x55910c5694cf in ctx_info_from_zns zebra/zebra_dplane.c:3315
>     FRRouting#1 0x55910c569696 in dplane_ctx_ns_init zebra/zebra_dplane.c:3331
>     FRRouting#2 0x55910c56bf61 in dplane_ctx_nexthop_init zebra/zebra_dplane.c:3680
>     FRRouting#3 0x55910c5711ca in dplane_nexthop_update_internal zebra/zebra_dplane.c:4490
>     FRRouting#4 0x55910c571c5c in dplane_nexthop_delete zebra/zebra_dplane.c:4717
>     FRRouting#5 0x55910c61e90e in zebra_nhg_uninstall_kernel zebra/zebra_nhg.c:3413
>     FRRouting#6 0x55910c615d8a in zebra_nhg_decrement_ref zebra/zebra_nhg.c:1919
>     FRRouting#7 0x55910c6404db in route_entry_update_nhe zebra/zebra_rib.c:454
>     FRRouting#8 0x55910c64c904 in rib_re_nhg_free zebra/zebra_rib.c:2822
>     FRRouting#9 0x55910c655be2 in rib_unlink zebra/zebra_rib.c:4212
>     FRRouting#10 0x55910c6430f9 in zebra_rtable_node_cleanup zebra/zebra_rib.c:968
>     FRRouting#11 0x7f26f275b8a9 in route_node_free lib/table.c:75
>     FRRouting#12 0x7f26f275bae4 in route_table_free lib/table.c:111
>     FRRouting#13 0x7f26f275b749 in route_table_finish lib/table.c:46
>     FRRouting#14 0x55910c65db17 in zebra_router_free_table zebra/zebra_router.c:191
>     FRRouting#15 0x55910c65dfb5 in zebra_router_terminate zebra/zebra_router.c:244
>     FRRouting#16 0x55910c4f40db in zebra_finalize zebra/main.c:249
>     FRRouting#17 0x7f26f2777108 in event_call lib/event.c:2011
>     FRRouting#18 0x7f26f264180e in frr_run lib/libfrr.c:1212
>     FRRouting#19 0x55910c4f49cb in main zebra/main.c:531
>     FRRouting#20 0x7f26f2029d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>     FRRouting#21 0x7f26f2029e3f in __libc_start_main_impl ../csu/libc-start.c:392
>     FRRouting#22 0x55910c4b0114 in _start (/usr/lib/frr/zebra+0x1ae114)

It happens with FRR using the kernel. During shutdown, the
namespace identifier is attempted to be obtained by zebra, in an
attempt to prepare zebra dataplane nexthop messages.

Fix this by accessing the ns structure.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
louis-6wind pushed a commit to louis-6wind/frr that referenced this issue Oct 9, 2024
The following ASAN issue has been observed:

> ERROR: AddressSanitizer: heap-use-after-free on address 0x6160000acba4 at pc 0x55910c5694d0 bp 0x7ffe3a8ac850 sp 0x7ffe3a8ac840
> READ of size 4 at 0x6160000acba4 thread T0
>         #0 0x55910c5694cf in ctx_info_from_zns zebra/zebra_dplane.c:3315
>     #1 0x55910c569696 in dplane_ctx_ns_init zebra/zebra_dplane.c:3331
>     #2 0x55910c56bf61 in dplane_ctx_nexthop_init zebra/zebra_dplane.c:3680
>     #3 0x55910c5711ca in dplane_nexthop_update_internal zebra/zebra_dplane.c:4490
>     FRRouting#4 0x55910c571c5c in dplane_nexthop_delete zebra/zebra_dplane.c:4717
>     FRRouting#5 0x55910c61e90e in zebra_nhg_uninstall_kernel zebra/zebra_nhg.c:3413
>     FRRouting#6 0x55910c615d8a in zebra_nhg_decrement_ref zebra/zebra_nhg.c:1919
>     FRRouting#7 0x55910c6404db in route_entry_update_nhe zebra/zebra_rib.c:454
>     FRRouting#8 0x55910c64c904 in rib_re_nhg_free zebra/zebra_rib.c:2822
>     FRRouting#9 0x55910c655be2 in rib_unlink zebra/zebra_rib.c:4212
>     FRRouting#10 0x55910c6430f9 in zebra_rtable_node_cleanup zebra/zebra_rib.c:968
>     FRRouting#11 0x7f26f275b8a9 in route_node_free lib/table.c:75
>     FRRouting#12 0x7f26f275bae4 in route_table_free lib/table.c:111
>     FRRouting#13 0x7f26f275b749 in route_table_finish lib/table.c:46
>     FRRouting#14 0x55910c65db17 in zebra_router_free_table zebra/zebra_router.c:191
>     FRRouting#15 0x55910c65dfb5 in zebra_router_terminate zebra/zebra_router.c:244
>     FRRouting#16 0x55910c4f40db in zebra_finalize zebra/main.c:249
>     FRRouting#17 0x7f26f2777108 in event_call lib/event.c:2011
>     FRRouting#18 0x7f26f264180e in frr_run lib/libfrr.c:1212
>     FRRouting#19 0x55910c4f49cb in main zebra/main.c:531
>     FRRouting#20 0x7f26f2029d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>     FRRouting#21 0x7f26f2029e3f in __libc_start_main_impl ../csu/libc-start.c:392
>     FRRouting#22 0x55910c4b0114 in _start (/usr/lib/frr/zebra+0x1ae114)

It happens with FRR using the kernel. During shutdown, the
namespace identifier is attempted to be obtained by zebra, in an
attempt to prepare zebra dataplane nexthop messages.

Fix this by accessing the ns structure.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
pguibert6WIND added a commit to pguibert6WIND/frr that referenced this issue Oct 10, 2024
The following ASAN issue has been observed:

> ERROR: AddressSanitizer: heap-use-after-free on address 0x6160000acba4 at pc 0x55910c5694d0 bp 0x7ffe3a8ac850 sp 0x7ffe3a8ac840
> READ of size 4 at 0x6160000acba4 thread T0
>         #0 0x55910c5694cf in ctx_info_from_zns zebra/zebra_dplane.c:3315
>     FRRouting#1 0x55910c569696 in dplane_ctx_ns_init zebra/zebra_dplane.c:3331
>     FRRouting#2 0x55910c56bf61 in dplane_ctx_nexthop_init zebra/zebra_dplane.c:3680
>     FRRouting#3 0x55910c5711ca in dplane_nexthop_update_internal zebra/zebra_dplane.c:4490
>     FRRouting#4 0x55910c571c5c in dplane_nexthop_delete zebra/zebra_dplane.c:4717
>     FRRouting#5 0x55910c61e90e in zebra_nhg_uninstall_kernel zebra/zebra_nhg.c:3413
>     FRRouting#6 0x55910c615d8a in zebra_nhg_decrement_ref zebra/zebra_nhg.c:1919
>     FRRouting#7 0x55910c6404db in route_entry_update_nhe zebra/zebra_rib.c:454
>     FRRouting#8 0x55910c64c904 in rib_re_nhg_free zebra/zebra_rib.c:2822
>     FRRouting#9 0x55910c655be2 in rib_unlink zebra/zebra_rib.c:4212
>     FRRouting#10 0x55910c6430f9 in zebra_rtable_node_cleanup zebra/zebra_rib.c:968
>     FRRouting#11 0x7f26f275b8a9 in route_node_free lib/table.c:75
>     FRRouting#12 0x7f26f275bae4 in route_table_free lib/table.c:111
>     FRRouting#13 0x7f26f275b749 in route_table_finish lib/table.c:46
>     FRRouting#14 0x55910c65db17 in zebra_router_free_table zebra/zebra_router.c:191
>     FRRouting#15 0x55910c65dfb5 in zebra_router_terminate zebra/zebra_router.c:244
>     FRRouting#16 0x55910c4f40db in zebra_finalize zebra/main.c:249
>     FRRouting#17 0x7f26f2777108 in event_call lib/event.c:2011
>     FRRouting#18 0x7f26f264180e in frr_run lib/libfrr.c:1212
>     FRRouting#19 0x55910c4f49cb in main zebra/main.c:531
>     FRRouting#20 0x7f26f2029d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>     FRRouting#21 0x7f26f2029e3f in __libc_start_main_impl ../csu/libc-start.c:392
>     FRRouting#22 0x55910c4b0114 in _start (/usr/lib/frr/zebra+0x1ae114)

It happens with FRR using the kernel. During shutdown, the
namespace identifier is attempted to be obtained by zebra, in an
attempt to prepare zebra dataplane nexthop messages.

Fix this by accessing the ns structure.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
pguibert6WIND added a commit to pguibert6WIND/frr that referenced this issue Oct 14, 2024
When a failover happens on ECMP paths that use the same
nexthop which is recursively resolved, ZEBRA replaces the
old NHG with a new one, and updates the pointer of all
routes using that nexthop.

Actually, if only the recursive nexthop changed, there is
no need to replace the old NHG.
Modify the zebra_nhg_proto_add() function, by updating
the recursive nexthop on the original NHG.

Using this change replaces the old method that was consisting in
allocating a new nhe. This change triggers an ASAN in the
bgp_nhg_zapi_scalability test, function
test_bgp_ipv4_simulate_r5_machine_going_down().

> ==1195107==ERROR: AddressSanitizer: heap-use-after-free on address 0x60e0000de580 at pc 0x55b6b7d55d8e bp 0x7fffd81977a0 sp 0x7fffd8197790
> READ of size 4 at 0x60e0000de580 thread T0
>     #0 0x55b6b7d55d8d in zebra_nhg_decrement_ref zebra/zebra_nhg.c:1858
>     FRRouting#1 0x55b6b7d55fee in zebra_nhg_free_members zebra/zebra_nhg.c:1752
>     FRRouting#2 0x55b6b7d55fee in zebra_nhg_free zebra/zebra_nhg.c:1772
>     FRRouting#3 0x55b6b7d59215 in zebra_nhg_proto_add zebra/zebra_nhg.c:3883
>     FRRouting#4 0x55b6b7d83615 in process_subq_nhg zebra/zebra_rib.c:2738
>     FRRouting#5 0x55b6b7d83615 in process_subq zebra/zebra_rib.c:3344
>     FRRouting#6 0x55b6b7d83615 in meta_queue_process zebra/zebra_rib.c:3397
>     FRRouting#7 0x7fe57a916fef in work_queue_run lib/workqueue.c:282
>     FRRouting#8 0x7fe57a8f863b in event_call lib/event.c:1996
>     FRRouting#9 0x7fe57a81e527 in frr_run lib/libfrr.c:1237
>     FRRouting#10 0x55b6b7c40c74 in main zebra/main.c:526
>     FRRouting#11 0x7fe57a229d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>     FRRouting#12 0x7fe57a229e3f in __libc_start_main_impl ../csu/libc-start.c:392
>     FRRouting#13 0x55b6b7c43b84 in _start (/usr/lib/frr/zebra+0x1adb84)
>
> 0x60e0000de580 is located 96 bytes inside of 160-byte region [0x60e0000de520,0x60e0000de5c0)
> freed by thread T0 here:
>     #0 0x7fe57acb4537 in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:127
>     FRRouting#1 0x55b6b7d59628 in zebra_nhg_proto_add zebra/zebra_nhg.c:3876
>     FRRouting#2 0x55b6b7d83615 in process_subq_nhg zebra/zebra_rib.c:2738
>     FRRouting#3 0x55b6b7d83615 in process_subq zebra/zebra_rib.c:3344
>     FRRouting#4 0x55b6b7d83615 in meta_queue_process zebra/zebra_rib.c:3397
>     FRRouting#5 0x7fe57a916fef in work_queue_run lib/workqueue.c:282
>     FRRouting#6 0x7fe57a8f863b in event_call lib/event.c:1996
>     FRRouting#7 0x7fe57a81e527 in frr_run lib/libfrr.c:1237
>     FRRouting#8 0x55b6b7c40c74 in main zebra/main.c:526
>     FRRouting#9 0x7fe57a229d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>
> previously allocated by thread T0 here:
>     #0 0x7fe57acb4a57 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
>     FRRouting#1 0x7fe57a83e98e in qcalloc lib/memory.c:106
>     FRRouting#2 0x55b6b7d5149e in zebra_nhg_alloc zebra/zebra_nhg.c:392
>     FRRouting#3 0x55b6b7d5149e in zebra_nhe_copy zebra/zebra_nhg.c:499
>     FRRouting#4 0x55b6b7d5181f in zebra_nhg_hash_alloc zebra/zebra_nhg.c:538
>     FRRouting#5 0x7fe57a7fbf0d in hash_get lib/hash.c:147
>     FRRouting#6 0x55b6b7d542ea in zebra_nhe_find zebra/zebra_nhg.c:832
>     FRRouting#7 0x55b6b7d5495f in zebra_nhg_find zebra/zebra_nhg.c:1014
>     FRRouting#8 0x55b6b7d54dcd in zebra_nhg_find_nexthop zebra/zebra_nhg.c:1031
>     FRRouting#9 0x55b6b7d535e8 in depends_find_recursive zebra/zebra_nhg.c:1514
>     FRRouting#10 0x55b6b7d535e8 in depends_find zebra/zebra_nhg.c:1563
>     FRRouting#11 0x55b6b7d535e8 in depends_find_add zebra/zebra_nhg.c:1602
>     FRRouting#12 0x55b6b7d59884 in zebra_nhg_update_nhe zebra/zebra_nhg.c:3738
>     FRRouting#13 0x55b6b7d59884 in zebra_nhg_proto_add zebra/zebra_nhg.c:3844
>     FRRouting#14 0x55b6b7d83615 in process_subq_nhg zebra/zebra_rib.c:2738
>     FRRouting#15 0x55b6b7d83615 in process_subq zebra/zebra_rib.c:3344
>     FRRouting#16 0x55b6b7d83615 in meta_queue_process zebra/zebra_rib.c:3397
>     FRRouting#17 0x7fe57a916fef in work_queue_run lib/workqueue.c:282
>     FRRouting#18 0x7fe57a8f863b in event_call lib/event.c:1996
>     FRRouting#19 0x7fe57a81e527 in frr_run lib/libfrr.c:1237
>     FRRouting#20 0x55b6b7c40c74 in main zebra/main.c:526
>     FRRouting#21 0x7fe57a229d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>
> SUMMARY: AddressSanitizer: heap-use-after-free zebra/zebra_nhg.c:1858 in zebra_nhg_decrement_ref
> Shadow bytes around the buggy address:
>   0x0c1c80013c60: fd fd fd fa fa fa fa fa fa fa fa fa fd fd fd fd
>   0x0c1c80013c70: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa
>   0x0c1c80013c80: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
>   0x0c1c80013c90: fd fd fd fd fd fd fd fd fd fd fd fa fa fa fa fa
>   0x0c1c80013ca0: fa fa fa fa fd fd fd fd fd fd fd fd fd fd fd fd
> =>0x0c1c80013cb0:[fd]fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa
>   0x0c1c80013cc0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
>   0x0c1c80013cd0: fd fd fd fa fa fa fa fa fa fa fa fa fd fd fd fd
>   0x0c1c80013ce0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa
>   0x0c1c80013cf0: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
>   0x0c1c80013d00: 00 00 00 00 00 00 00 00 00 00 00 fa fa fa fa fa
> Shadow byte legend (one shadow byte represents 8 application bytes):
>   Addressable:           00
>   Partially addressable: 01 02 03 04 05 06 07
>   Heap left redzone:       fa
>   Freed heap region:       fd
>   Stack left redzone:      f1
>   Stack mid redzone:       f2
>   Stack right redzone:     f3
>   Stack after return:      f5
>   Stack use after scope:   f8
>   Global redzone:          f9
>   Global init order:       f6
>   Poisoned by user:        f7
>   Container overflow:      fc
>   Array cookie:            ac
>   Intra object redzone:    bb
>   ASan internal:           fe
>   Left alloca redzone:     ca
>   Right alloca redzone:    cb
>   Shadow gap:              cc
> ==1195107==ABORTING
>

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
pguibert6WIND added a commit to pguibert6WIND/frr that referenced this issue Oct 14, 2024
A general flush is done on the nhg depend of the protocol nexthop group.
Actually, the NHG should not be removed, if there are routes attached to
it. In the same time, it seems the route count does not propagate to
the nhg_depends.

The con of this method is that there is still ASAN, and by comparing
the refcount value of the old way (allocation), the count is less
than expectd, for nexthop group with route count only:

Allocation method in proto_add():

> 2024/10/14 10:57:24.915401 ZEBRA: [VB8P9-5F2GE] zebra_nhg_proto_add: BEFORE NHE 71428576, (71428576[39/49/59]) cnt 2002
> 2024/10/14 10:57:24.915510 ZEBRA: [HCTBK-W37K2] zebra_nhg_proto_add: NHE 71428576, (71428576[49/59/65]) cnt 1
> 2024/10/14 10:57:24.915513 ZEBRA: [RM3ZQ-V7JN5] zebra_nhg_proto_add:            NHE 49, (49[50]) cnt 2012
> 2024/10/14 10:57:24.915515 ZEBRA: [VP9H1-EV2BN] 	(71428573)
> 2024/10/14 10:57:24.915515 ZEBRA: [VP9H1-EV2BN] 	(71428574)
> 2024/10/14 10:57:24.915516 ZEBRA: [VP9H1-EV2BN] 	(71428576)
> 2024/10/14 10:57:24.915517 ZEBRA: [VP9H1-EV2BN] 	(71428578)
> 2024/10/14 10:57:24.915517 ZEBRA: [RM3ZQ-V7JN5] zebra_nhg_proto_add:            NHE 59, (59[60]) cnt 2007
> 2024/10/14 10:57:24.915519 ZEBRA: [VP9H1-EV2BN] 	(71428575)
> 2024/10/14 10:57:24.915519 ZEBRA: [VP9H1-EV2BN] 	(71428576)
> 2024/10/14 10:57:24.915520 ZEBRA: [RM3ZQ-V7JN5] zebra_nhg_proto_add:            NHE 65, (65[42]) cnt 4
> 2024/10/14 10:57:24.915521 ZEBRA: [VP9H1-EV2BN] 	(71428571)
> 2024/10/14 10:57:24.915522 ZEBRA: [VP9H1-EV2BN] 	(71428576)

Method using general flush, but keep old pointer:

> 2024/10/14 10:51:17.229799 ZEBRA: [VB8P9-5F2GE] zebra_nhg_proto_add: BEFORE NHE 71428576, (71428576[39/49/59]) cnt 2002
> 2024/10/14 10:51:17.229909 ZEBRA: [HCTBK-W37K2] zebra_nhg_proto_add: NHE 71428576, (71428576[49/59/65]) cnt 2002
> 2024/10/14 10:51:17.229912 ZEBRA: [RM3ZQ-V7JN5] zebra_nhg_proto_add:            NHE 49, (49[50]) cnt 2011
> 2024/10/14 10:51:17.229914 ZEBRA: [VP9H1-EV2BN] 	(71428573)
> 2024/10/14 10:51:17.229915 ZEBRA: [VP9H1-EV2BN] 	(71428574)
> 2024/10/14 10:51:17.229915 ZEBRA: [VP9H1-EV2BN] 	(71428576)
> 2024/10/14 10:51:17.229916 ZEBRA: [VP9H1-EV2BN] 	(71428578)
> 2024/10/14 10:51:17.229916 ZEBRA: [RM3ZQ-V7JN5] zebra_nhg_proto_add:            NHE 59, (59[60]) cnt 2006
> 2024/10/14 10:51:17.229918 ZEBRA: [VP9H1-EV2BN] 	(71428575)
> 2024/10/14 10:51:17.229918 ZEBRA: [VP9H1-EV2BN] 	(71428576)
> 2024/10/14 10:51:17.229919 ZEBRA: [RM3ZQ-V7JN5] zebra_nhg_proto_add:            NHE 65, (65[42]) cnt 4
> 2024/10/14 10:51:17.229920 ZEBRA: [VP9H1-EV2BN] 	(71428571)
> 2024/10/14 10:51:17.229921 ZEBRA: [VP9H1-EV2BN] 	(71428576)

Resulting ASAN error when running bgp_nhg_zapi_notification, on the
test_bgp_ipv4_simulate_r5_machine_going_down() function:

> r1: zebra triggered an exception by AddressSanitizer
> AddressSanitizer error in topotest `test_bgp_nhg_zapi_scalability.py`, test `teardown_module`, router `r1`
>
> ERROR: AddressSanitizer: heap-use-after-free on address 0x60e0000de580 at pc 0x558a7d98cd8e bp 0x7fff4915a6e0 sp 0x7fff4915a6d0
> READ of size 4 at 0x60e0000de580 thread T0
>     #0 0x558a7d98cd8d in zebra_nhg_decrement_ref zebra/zebra_nhg.c:1858
>     FRRouting#1 0x558a7d98cfee in zebra_nhg_free_members zebra/zebra_nhg.c:1752
>     FRRouting#2 0x558a7d98cfee in zebra_nhg_free zebra/zebra_nhg.c:1772
>     FRRouting#3 0x558a7d9901ff in zebra_nhg_proto_add zebra/zebra_nhg.c:3861
>     FRRouting#4 0x558a7d9ba365 in process_subq_nhg zebra/zebra_rib.c:2738
>     FRRouting#5 0x558a7d9ba365 in process_subq zebra/zebra_rib.c:3344
>     FRRouting#6 0x558a7d9ba365 in meta_queue_process zebra/zebra_rib.c:3397
>     FRRouting#7 0x7fa262f16fef in work_queue_run lib/workqueue.c:282
>     FRRouting#8 0x7fa262ef863b in event_call lib/event.c:1996
>     FRRouting#9 0x7fa262e1e527 in frr_run lib/libfrr.c:1237
>     FRRouting#10 0x558a7d877c74 in main zebra/main.c:526
>     FRRouting#11 0x7fa262829d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>     FRRouting#12 0x7fa262829e3f in __libc_start_main_impl ../csu/libc-start.c:392
>     FRRouting#13 0x558a7d87ab84 in _start (/usr/lib/frr/zebra+0x1acb84)
>
> 0x60e0000de580 is located 96 bytes inside of 160-byte region [0x60e0000de520,0x60e0000de5c0)
> freed by thread T0 here:
>     #0 0x7fa2632b4537 in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:127
>     FRRouting#1 0x558a7d9908a1 in zebra_nhg_proto_add zebra/zebra_nhg.c:3854
>     FRRouting#2 0x558a7d9ba365 in process_subq_nhg zebra/zebra_rib.c:2738
>     FRRouting#3 0x558a7d9ba365 in process_subq zebra/zebra_rib.c:3344
>     FRRouting#4 0x558a7d9ba365 in meta_queue_process zebra/zebra_rib.c:3397
>     FRRouting#5 0x7fa262f16fef in work_queue_run lib/workqueue.c:282
>     FRRouting#6 0x7fa262ef863b in event_call lib/event.c:1996
>     FRRouting#7 0x7fa262e1e527 in frr_run lib/libfrr.c:1237
>     FRRouting#8 0x558a7d877c74 in main zebra/main.c:526
>     FRRouting#9 0x7fa262829d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>
> previously allocated by thread T0 here:
>     #0 0x7fa2632b4a57 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
>     FRRouting#1 0x7fa262e3e98e in qcalloc lib/memory.c:106
>     FRRouting#2 0x558a7d98849e in zebra_nhg_alloc zebra/zebra_nhg.c:392
>     FRRouting#3 0x558a7d98849e in zebra_nhe_copy zebra/zebra_nhg.c:499
>     FRRouting#4 0x558a7d98881f in zebra_nhg_hash_alloc zebra/zebra_nhg.c:538
>     FRRouting#5 0x7fa262dfbf0d in hash_get lib/hash.c:147
>     FRRouting#6 0x558a7d98b2ea in zebra_nhe_find zebra/zebra_nhg.c:832
>     FRRouting#7 0x558a7d98b95f in zebra_nhg_find zebra/zebra_nhg.c:1014
>     FRRouting#8 0x558a7d98bdcd in zebra_nhg_find_nexthop zebra/zebra_nhg.c:1031
>     FRRouting#9 0x558a7d98a5e8 in depends_find_recursive zebra/zebra_nhg.c:1514
>     FRRouting#10 0x558a7d98a5e8 in depends_find zebra/zebra_nhg.c:1563
>     FRRouting#11 0x558a7d98a5e8 in depends_find_add zebra/zebra_nhg.c:1602
>     FRRouting#12 0x558a7d990378 in zebra_nhg_update_nhe zebra/zebra_nhg.c:3739
>     FRRouting#13 0x558a7d990378 in zebra_nhg_proto_add zebra/zebra_nhg.c:3822
>     FRRouting#14 0x558a7d9ba365 in process_subq_nhg zebra/zebra_rib.c:2738
>     FRRouting#15 0x558a7d9ba365 in process_subq zebra/zebra_rib.c:3344
>     FRRouting#16 0x558a7d9ba365 in meta_queue_process zebra/zebra_rib.c:3397
>     FRRouting#17 0x7fa262f16fef in work_queue_run lib/workqueue.c:282
>     FRRouting#18 0x7fa262ef863b in event_call lib/event.c:1996
>     FRRouting#19 0x7fa262e1e527 in frr_run lib/libfrr.c:1237
>     FRRouting#20 0x558a7d877c74 in main zebra/main.c:526
>     FRRouting#21 0x7fa262829d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>
> SUMMARY: AddressSanitizer: heap-use-after-free zebra/zebra_nhg.c:1858 in zebra_nhg_decrement_ref
> Shadow bytes around the buggy address:
>   0x0c1c80013c60: fd fd fd fa fa fa fa fa fa fa fa fa fd fd fd fd
>   0x0c1c80013c70: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa
>   0x0c1c80013c80: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
>   0x0c1c80013c90: fd fd fd fd fd fd fd fd fd fd fd fa fa fa fa fa
>   0x0c1c80013ca0: fa fa fa fa fd fd fd fd fd fd fd fd fd fd fd fd
> =>0x0c1c80013cb0:[fd]fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa
>   0x0c1c80013cc0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
>   0x0c1c80013cd0: fd fd fd fa fa fa fa fa fa fa fa fa fd fd fd fd
>   0x0c1c80013ce0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa
>   0x0c1c80013cf0: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
>   0x0c1c80013d00: 00 00 00 00 00 00 00 00 00 00 00 fa fa fa fa fa
> Shadow byte legend (one shadow byte represents 8 application bytes):
>   Addressable:           00
>   Partially addressable: 01 02 03 04 05 06 07
>   Heap left redzone:       fa
>   Freed heap region:       fd
>   Stack left redzone:      f1
>   Stack mid redzone:       f2
>   Stack right redzone:     f3
>   Stack after return:      f5
>   Stack use after scope:   f8
>   Global redzone:          f9
>   Global init order:       f6
>   Poisoned by user:        f7
>   Container overflow:      fc
>   Array cookie:            ac
>   Intra object redzone:    bb
>   ASan internal:           fe
>   Left alloca redzone:     ca
>   Right alloca redzone:    cb
>   Shadow gap:              cc
>

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
pguibert6WIND added a commit to pguibert6WIND/frr that referenced this issue Oct 14, 2024
When a failover happens on ECMP paths that use the same
nexthop which is recursively resolved, ZEBRA replaces the
old NHG with a new one, and updates the pointer of all
routes using that nexthop.

Actually, if only the recursive nexthop changed, there is
no need to replace the old NHG.
Modify the zebra_nhg_proto_add() function, by updating
the recursive nexthop on the original NHG.

Using this change replaces the old method that was consisting in
allocating a new nhe. This change triggers an ASAN in the
bgp_nhg_zapi_scalability test, function
test_bgp_ipv4_simulate_r5_machine_going_down().

> ==1195107==ERROR: AddressSanitizer: heap-use-after-free on address 0x60e0000de580 at pc 0x55b6b7d55d8e bp 0x7fffd81977a0 sp 0x7fffd8197790
> READ of size 4 at 0x60e0000de580 thread T0
>     #0 0x55b6b7d55d8d in zebra_nhg_decrement_ref zebra/zebra_nhg.c:1858
>     FRRouting#1 0x55b6b7d55fee in zebra_nhg_free_members zebra/zebra_nhg.c:1752
>     FRRouting#2 0x55b6b7d55fee in zebra_nhg_free zebra/zebra_nhg.c:1772
>     FRRouting#3 0x55b6b7d59215 in zebra_nhg_proto_add zebra/zebra_nhg.c:3883
>     FRRouting#4 0x55b6b7d83615 in process_subq_nhg zebra/zebra_rib.c:2738
>     FRRouting#5 0x55b6b7d83615 in process_subq zebra/zebra_rib.c:3344
>     FRRouting#6 0x55b6b7d83615 in meta_queue_process zebra/zebra_rib.c:3397
>     FRRouting#7 0x7fe57a916fef in work_queue_run lib/workqueue.c:282
>     FRRouting#8 0x7fe57a8f863b in event_call lib/event.c:1996
>     FRRouting#9 0x7fe57a81e527 in frr_run lib/libfrr.c:1237
>     FRRouting#10 0x55b6b7c40c74 in main zebra/main.c:526
>     FRRouting#11 0x7fe57a229d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>     FRRouting#12 0x7fe57a229e3f in __libc_start_main_impl ../csu/libc-start.c:392
>     FRRouting#13 0x55b6b7c43b84 in _start (/usr/lib/frr/zebra+0x1adb84)
>
> 0x60e0000de580 is located 96 bytes inside of 160-byte region [0x60e0000de520,0x60e0000de5c0)
> freed by thread T0 here:
>     #0 0x7fe57acb4537 in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:127
>     FRRouting#1 0x55b6b7d59628 in zebra_nhg_proto_add zebra/zebra_nhg.c:3876
>     FRRouting#2 0x55b6b7d83615 in process_subq_nhg zebra/zebra_rib.c:2738
>     FRRouting#3 0x55b6b7d83615 in process_subq zebra/zebra_rib.c:3344
>     FRRouting#4 0x55b6b7d83615 in meta_queue_process zebra/zebra_rib.c:3397
>     FRRouting#5 0x7fe57a916fef in work_queue_run lib/workqueue.c:282
>     FRRouting#6 0x7fe57a8f863b in event_call lib/event.c:1996
>     FRRouting#7 0x7fe57a81e527 in frr_run lib/libfrr.c:1237
>     FRRouting#8 0x55b6b7c40c74 in main zebra/main.c:526
>     FRRouting#9 0x7fe57a229d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>
> previously allocated by thread T0 here:
>     #0 0x7fe57acb4a57 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
>     FRRouting#1 0x7fe57a83e98e in qcalloc lib/memory.c:106
>     FRRouting#2 0x55b6b7d5149e in zebra_nhg_alloc zebra/zebra_nhg.c:392
>     FRRouting#3 0x55b6b7d5149e in zebra_nhe_copy zebra/zebra_nhg.c:499
>     FRRouting#4 0x55b6b7d5181f in zebra_nhg_hash_alloc zebra/zebra_nhg.c:538
>     FRRouting#5 0x7fe57a7fbf0d in hash_get lib/hash.c:147
>     FRRouting#6 0x55b6b7d542ea in zebra_nhe_find zebra/zebra_nhg.c:832
>     FRRouting#7 0x55b6b7d5495f in zebra_nhg_find zebra/zebra_nhg.c:1014
>     FRRouting#8 0x55b6b7d54dcd in zebra_nhg_find_nexthop zebra/zebra_nhg.c:1031
>     FRRouting#9 0x55b6b7d535e8 in depends_find_recursive zebra/zebra_nhg.c:1514
>     FRRouting#10 0x55b6b7d535e8 in depends_find zebra/zebra_nhg.c:1563
>     FRRouting#11 0x55b6b7d535e8 in depends_find_add zebra/zebra_nhg.c:1602
>     FRRouting#12 0x55b6b7d59884 in zebra_nhg_update_nhe zebra/zebra_nhg.c:3738
>     FRRouting#13 0x55b6b7d59884 in zebra_nhg_proto_add zebra/zebra_nhg.c:3844
>     FRRouting#14 0x55b6b7d83615 in process_subq_nhg zebra/zebra_rib.c:2738
>     FRRouting#15 0x55b6b7d83615 in process_subq zebra/zebra_rib.c:3344
>     FRRouting#16 0x55b6b7d83615 in meta_queue_process zebra/zebra_rib.c:3397
>     FRRouting#17 0x7fe57a916fef in work_queue_run lib/workqueue.c:282
>     FRRouting#18 0x7fe57a8f863b in event_call lib/event.c:1996
>     FRRouting#19 0x7fe57a81e527 in frr_run lib/libfrr.c:1237
>     FRRouting#20 0x55b6b7c40c74 in main zebra/main.c:526
>     FRRouting#21 0x7fe57a229d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>
> SUMMARY: AddressSanitizer: heap-use-after-free zebra/zebra_nhg.c:1858 in zebra_nhg_decrement_ref
> Shadow bytes around the buggy address:
>   0x0c1c80013c60: fd fd fd fa fa fa fa fa fa fa fa fa fd fd fd fd
>   0x0c1c80013c70: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa
>   0x0c1c80013c80: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
>   0x0c1c80013c90: fd fd fd fd fd fd fd fd fd fd fd fa fa fa fa fa
>   0x0c1c80013ca0: fa fa fa fa fd fd fd fd fd fd fd fd fd fd fd fd
> =>0x0c1c80013cb0:[fd]fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa
>   0x0c1c80013cc0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
>   0x0c1c80013cd0: fd fd fd fa fa fa fa fa fa fa fa fa fd fd fd fd
>   0x0c1c80013ce0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa
>   0x0c1c80013cf0: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
>   0x0c1c80013d00: 00 00 00 00 00 00 00 00 00 00 00 fa fa fa fa fa
> Shadow byte legend (one shadow byte represents 8 application bytes):
>   Addressable:           00
>   Partially addressable: 01 02 03 04 05 06 07
>   Heap left redzone:       fa
>   Freed heap region:       fd
>   Stack left redzone:      f1
>   Stack mid redzone:       f2
>   Stack right redzone:     f3
>   Stack after return:      f5
>   Stack use after scope:   f8
>   Global redzone:          f9
>   Global init order:       f6
>   Poisoned by user:        f7
>   Container overflow:      fc
>   Array cookie:            ac
>   Intra object redzone:    bb
>   ASan internal:           fe
>   Left alloca redzone:     ca
>   Right alloca redzone:    cb
>   Shadow gap:              cc
> ==1195107==ABORTING
>

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
pguibert6WIND added a commit to pguibert6WIND/frr that referenced this issue Oct 14, 2024
A general flush is done on the nhg depend of the protocol nexthop group.
Actually, the NHG should not be removed, if there are routes attached to
it. In the same time, it seems the route count does not propagate to
the nhg_depends.

The con of this method is that there is still ASAN, and by comparing
the refcount value of the old way (allocation), the count is less
than expectd, for nexthop group with route count only:

Allocation method in proto_add():

> 2024/10/14 10:57:24.915401 ZEBRA: [VB8P9-5F2GE] zebra_nhg_proto_add: BEFORE NHE 71428576, (71428576[39/49/59]) cnt 2002
> 2024/10/14 10:57:24.915510 ZEBRA: [HCTBK-W37K2] zebra_nhg_proto_add: NHE 71428576, (71428576[49/59/65]) cnt 1
> 2024/10/14 10:57:24.915513 ZEBRA: [RM3ZQ-V7JN5] zebra_nhg_proto_add:            NHE 49, (49[50]) cnt 2012
> 2024/10/14 10:57:24.915515 ZEBRA: [VP9H1-EV2BN] 	(71428573)
> 2024/10/14 10:57:24.915515 ZEBRA: [VP9H1-EV2BN] 	(71428574)
> 2024/10/14 10:57:24.915516 ZEBRA: [VP9H1-EV2BN] 	(71428576)
> 2024/10/14 10:57:24.915517 ZEBRA: [VP9H1-EV2BN] 	(71428578)
> 2024/10/14 10:57:24.915517 ZEBRA: [RM3ZQ-V7JN5] zebra_nhg_proto_add:            NHE 59, (59[60]) cnt 2007
> 2024/10/14 10:57:24.915519 ZEBRA: [VP9H1-EV2BN] 	(71428575)
> 2024/10/14 10:57:24.915519 ZEBRA: [VP9H1-EV2BN] 	(71428576)
> 2024/10/14 10:57:24.915520 ZEBRA: [RM3ZQ-V7JN5] zebra_nhg_proto_add:            NHE 65, (65[42]) cnt 4
> 2024/10/14 10:57:24.915521 ZEBRA: [VP9H1-EV2BN] 	(71428571)
> 2024/10/14 10:57:24.915522 ZEBRA: [VP9H1-EV2BN] 	(71428576)

Method using general flush, but keep old pointer:

> 2024/10/14 10:51:17.229799 ZEBRA: [VB8P9-5F2GE] zebra_nhg_proto_add: BEFORE NHE 71428576, (71428576[39/49/59]) cnt 2002
> 2024/10/14 10:51:17.229909 ZEBRA: [HCTBK-W37K2] zebra_nhg_proto_add: NHE 71428576, (71428576[49/59/65]) cnt 2002
> 2024/10/14 10:51:17.229912 ZEBRA: [RM3ZQ-V7JN5] zebra_nhg_proto_add:            NHE 49, (49[50]) cnt 2011
> 2024/10/14 10:51:17.229914 ZEBRA: [VP9H1-EV2BN] 	(71428573)
> 2024/10/14 10:51:17.229915 ZEBRA: [VP9H1-EV2BN] 	(71428574)
> 2024/10/14 10:51:17.229915 ZEBRA: [VP9H1-EV2BN] 	(71428576)
> 2024/10/14 10:51:17.229916 ZEBRA: [VP9H1-EV2BN] 	(71428578)
> 2024/10/14 10:51:17.229916 ZEBRA: [RM3ZQ-V7JN5] zebra_nhg_proto_add:            NHE 59, (59[60]) cnt 2006
> 2024/10/14 10:51:17.229918 ZEBRA: [VP9H1-EV2BN] 	(71428575)
> 2024/10/14 10:51:17.229918 ZEBRA: [VP9H1-EV2BN] 	(71428576)
> 2024/10/14 10:51:17.229919 ZEBRA: [RM3ZQ-V7JN5] zebra_nhg_proto_add:            NHE 65, (65[42]) cnt 4
> 2024/10/14 10:51:17.229920 ZEBRA: [VP9H1-EV2BN] 	(71428571)
> 2024/10/14 10:51:17.229921 ZEBRA: [VP9H1-EV2BN] 	(71428576)

Resulting ASAN error when running bgp_nhg_zapi_notification, on the
test_bgp_ipv4_simulate_r5_machine_going_down() function:

> r1: zebra triggered an exception by AddressSanitizer
> AddressSanitizer error in topotest `test_bgp_nhg_zapi_scalability.py`, test `teardown_module`, router `r1`
>
> ERROR: AddressSanitizer: heap-use-after-free on address 0x60e0000de580 at pc 0x558a7d98cd8e bp 0x7fff4915a6e0 sp 0x7fff4915a6d0
> READ of size 4 at 0x60e0000de580 thread T0
>     #0 0x558a7d98cd8d in zebra_nhg_decrement_ref zebra/zebra_nhg.c:1858
>     FRRouting#1 0x558a7d98cfee in zebra_nhg_free_members zebra/zebra_nhg.c:1752
>     FRRouting#2 0x558a7d98cfee in zebra_nhg_free zebra/zebra_nhg.c:1772
>     FRRouting#3 0x558a7d9901ff in zebra_nhg_proto_add zebra/zebra_nhg.c:3861
>     FRRouting#4 0x558a7d9ba365 in process_subq_nhg zebra/zebra_rib.c:2738
>     FRRouting#5 0x558a7d9ba365 in process_subq zebra/zebra_rib.c:3344
>     FRRouting#6 0x558a7d9ba365 in meta_queue_process zebra/zebra_rib.c:3397
>     FRRouting#7 0x7fa262f16fef in work_queue_run lib/workqueue.c:282
>     FRRouting#8 0x7fa262ef863b in event_call lib/event.c:1996
>     FRRouting#9 0x7fa262e1e527 in frr_run lib/libfrr.c:1237
>     FRRouting#10 0x558a7d877c74 in main zebra/main.c:526
>     FRRouting#11 0x7fa262829d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>     FRRouting#12 0x7fa262829e3f in __libc_start_main_impl ../csu/libc-start.c:392
>     FRRouting#13 0x558a7d87ab84 in _start (/usr/lib/frr/zebra+0x1acb84)
>
> 0x60e0000de580 is located 96 bytes inside of 160-byte region [0x60e0000de520,0x60e0000de5c0)
> freed by thread T0 here:
>     #0 0x7fa2632b4537 in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:127
>     FRRouting#1 0x558a7d9908a1 in zebra_nhg_proto_add zebra/zebra_nhg.c:3854
>     FRRouting#2 0x558a7d9ba365 in process_subq_nhg zebra/zebra_rib.c:2738
>     FRRouting#3 0x558a7d9ba365 in process_subq zebra/zebra_rib.c:3344
>     FRRouting#4 0x558a7d9ba365 in meta_queue_process zebra/zebra_rib.c:3397
>     FRRouting#5 0x7fa262f16fef in work_queue_run lib/workqueue.c:282
>     FRRouting#6 0x7fa262ef863b in event_call lib/event.c:1996
>     FRRouting#7 0x7fa262e1e527 in frr_run lib/libfrr.c:1237
>     FRRouting#8 0x558a7d877c74 in main zebra/main.c:526
>     FRRouting#9 0x7fa262829d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>
> previously allocated by thread T0 here:
>     #0 0x7fa2632b4a57 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
>     FRRouting#1 0x7fa262e3e98e in qcalloc lib/memory.c:106
>     FRRouting#2 0x558a7d98849e in zebra_nhg_alloc zebra/zebra_nhg.c:392
>     FRRouting#3 0x558a7d98849e in zebra_nhe_copy zebra/zebra_nhg.c:499
>     FRRouting#4 0x558a7d98881f in zebra_nhg_hash_alloc zebra/zebra_nhg.c:538
>     FRRouting#5 0x7fa262dfbf0d in hash_get lib/hash.c:147
>     FRRouting#6 0x558a7d98b2ea in zebra_nhe_find zebra/zebra_nhg.c:832
>     FRRouting#7 0x558a7d98b95f in zebra_nhg_find zebra/zebra_nhg.c:1014
>     FRRouting#8 0x558a7d98bdcd in zebra_nhg_find_nexthop zebra/zebra_nhg.c:1031
>     FRRouting#9 0x558a7d98a5e8 in depends_find_recursive zebra/zebra_nhg.c:1514
>     FRRouting#10 0x558a7d98a5e8 in depends_find zebra/zebra_nhg.c:1563
>     FRRouting#11 0x558a7d98a5e8 in depends_find_add zebra/zebra_nhg.c:1602
>     FRRouting#12 0x558a7d990378 in zebra_nhg_update_nhe zebra/zebra_nhg.c:3739
>     FRRouting#13 0x558a7d990378 in zebra_nhg_proto_add zebra/zebra_nhg.c:3822
>     FRRouting#14 0x558a7d9ba365 in process_subq_nhg zebra/zebra_rib.c:2738
>     FRRouting#15 0x558a7d9ba365 in process_subq zebra/zebra_rib.c:3344
>     FRRouting#16 0x558a7d9ba365 in meta_queue_process zebra/zebra_rib.c:3397
>     FRRouting#17 0x7fa262f16fef in work_queue_run lib/workqueue.c:282
>     FRRouting#18 0x7fa262ef863b in event_call lib/event.c:1996
>     FRRouting#19 0x7fa262e1e527 in frr_run lib/libfrr.c:1237
>     FRRouting#20 0x558a7d877c74 in main zebra/main.c:526
>     FRRouting#21 0x7fa262829d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>
> SUMMARY: AddressSanitizer: heap-use-after-free zebra/zebra_nhg.c:1858 in zebra_nhg_decrement_ref
> Shadow bytes around the buggy address:
>   0x0c1c80013c60: fd fd fd fa fa fa fa fa fa fa fa fa fd fd fd fd
>   0x0c1c80013c70: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa
>   0x0c1c80013c80: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
>   0x0c1c80013c90: fd fd fd fd fd fd fd fd fd fd fd fa fa fa fa fa
>   0x0c1c80013ca0: fa fa fa fa fd fd fd fd fd fd fd fd fd fd fd fd
> =>0x0c1c80013cb0:[fd]fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa
>   0x0c1c80013cc0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
>   0x0c1c80013cd0: fd fd fd fa fa fa fa fa fa fa fa fa fd fd fd fd
>   0x0c1c80013ce0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa
>   0x0c1c80013cf0: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
>   0x0c1c80013d00: 00 00 00 00 00 00 00 00 00 00 00 fa fa fa fa fa
> Shadow byte legend (one shadow byte represents 8 application bytes):
>   Addressable:           00
>   Partially addressable: 01 02 03 04 05 06 07
>   Heap left redzone:       fa
>   Freed heap region:       fd
>   Stack left redzone:      f1
>   Stack mid redzone:       f2
>   Stack right redzone:     f3
>   Stack after return:      f5
>   Stack use after scope:   f8
>   Global redzone:          f9
>   Global init order:       f6
>   Poisoned by user:        f7
>   Container overflow:      fc
>   Array cookie:            ac
>   Intra object redzone:    bb
>   ASan internal:           fe
>   Left alloca redzone:     ca
>   Right alloca redzone:    cb
>   Shadow gap:              cc
>

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
louis-6wind pushed a commit to louis-6wind/frr that referenced this issue Oct 15, 2024
The following ASAN issue has been observed:

> ERROR: AddressSanitizer: heap-use-after-free on address 0x6160000acba4 at pc 0x55910c5694d0 bp 0x7ffe3a8ac850 sp 0x7ffe3a8ac840
> READ of size 4 at 0x6160000acba4 thread T0
>         #0 0x55910c5694cf in ctx_info_from_zns zebra/zebra_dplane.c:3315
>     #1 0x55910c569696 in dplane_ctx_ns_init zebra/zebra_dplane.c:3331
>     #2 0x55910c56bf61 in dplane_ctx_nexthop_init zebra/zebra_dplane.c:3680
>     #3 0x55910c5711ca in dplane_nexthop_update_internal zebra/zebra_dplane.c:4490
>     FRRouting#4 0x55910c571c5c in dplane_nexthop_delete zebra/zebra_dplane.c:4717
>     FRRouting#5 0x55910c61e90e in zebra_nhg_uninstall_kernel zebra/zebra_nhg.c:3413
>     FRRouting#6 0x55910c615d8a in zebra_nhg_decrement_ref zebra/zebra_nhg.c:1919
>     FRRouting#7 0x55910c6404db in route_entry_update_nhe zebra/zebra_rib.c:454
>     FRRouting#8 0x55910c64c904 in rib_re_nhg_free zebra/zebra_rib.c:2822
>     FRRouting#9 0x55910c655be2 in rib_unlink zebra/zebra_rib.c:4212
>     FRRouting#10 0x55910c6430f9 in zebra_rtable_node_cleanup zebra/zebra_rib.c:968
>     FRRouting#11 0x7f26f275b8a9 in route_node_free lib/table.c:75
>     FRRouting#12 0x7f26f275bae4 in route_table_free lib/table.c:111
>     FRRouting#13 0x7f26f275b749 in route_table_finish lib/table.c:46
>     FRRouting#14 0x55910c65db17 in zebra_router_free_table zebra/zebra_router.c:191
>     FRRouting#15 0x55910c65dfb5 in zebra_router_terminate zebra/zebra_router.c:244
>     FRRouting#16 0x55910c4f40db in zebra_finalize zebra/main.c:249
>     FRRouting#17 0x7f26f2777108 in event_call lib/event.c:2011
>     FRRouting#18 0x7f26f264180e in frr_run lib/libfrr.c:1212
>     FRRouting#19 0x55910c4f49cb in main zebra/main.c:531
>     FRRouting#20 0x7f26f2029d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>     FRRouting#21 0x7f26f2029e3f in __libc_start_main_impl ../csu/libc-start.c:392
>     FRRouting#22 0x55910c4b0114 in _start (/usr/lib/frr/zebra+0x1ae114)

It happens with FRR using the kernel. During shutdown, the
namespace identifier is attempted to be obtained by zebra, in an
attempt to prepare zebra dataplane nexthop messages.

Fix this by accessing the ns structure.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
louis-6wind pushed a commit to louis-6wind/frr that referenced this issue Oct 16, 2024
The following ASAN issue has been observed:

> ERROR: AddressSanitizer: heap-use-after-free on address 0x6160000acba4 at pc 0x55910c5694d0 bp 0x7ffe3a8ac850 sp 0x7ffe3a8ac840
> READ of size 4 at 0x6160000acba4 thread T0
>         #0 0x55910c5694cf in ctx_info_from_zns zebra/zebra_dplane.c:3315
>     #1 0x55910c569696 in dplane_ctx_ns_init zebra/zebra_dplane.c:3331
>     #2 0x55910c56bf61 in dplane_ctx_nexthop_init zebra/zebra_dplane.c:3680
>     #3 0x55910c5711ca in dplane_nexthop_update_internal zebra/zebra_dplane.c:4490
>     FRRouting#4 0x55910c571c5c in dplane_nexthop_delete zebra/zebra_dplane.c:4717
>     FRRouting#5 0x55910c61e90e in zebra_nhg_uninstall_kernel zebra/zebra_nhg.c:3413
>     FRRouting#6 0x55910c615d8a in zebra_nhg_decrement_ref zebra/zebra_nhg.c:1919
>     FRRouting#7 0x55910c6404db in route_entry_update_nhe zebra/zebra_rib.c:454
>     FRRouting#8 0x55910c64c904 in rib_re_nhg_free zebra/zebra_rib.c:2822
>     FRRouting#9 0x55910c655be2 in rib_unlink zebra/zebra_rib.c:4212
>     FRRouting#10 0x55910c6430f9 in zebra_rtable_node_cleanup zebra/zebra_rib.c:968
>     FRRouting#11 0x7f26f275b8a9 in route_node_free lib/table.c:75
>     FRRouting#12 0x7f26f275bae4 in route_table_free lib/table.c:111
>     FRRouting#13 0x7f26f275b749 in route_table_finish lib/table.c:46
>     FRRouting#14 0x55910c65db17 in zebra_router_free_table zebra/zebra_router.c:191
>     FRRouting#15 0x55910c65dfb5 in zebra_router_terminate zebra/zebra_router.c:244
>     FRRouting#16 0x55910c4f40db in zebra_finalize zebra/main.c:249
>     FRRouting#17 0x7f26f2777108 in event_call lib/event.c:2011
>     FRRouting#18 0x7f26f264180e in frr_run lib/libfrr.c:1212
>     FRRouting#19 0x55910c4f49cb in main zebra/main.c:531
>     FRRouting#20 0x7f26f2029d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>     FRRouting#21 0x7f26f2029e3f in __libc_start_main_impl ../csu/libc-start.c:392
>     FRRouting#22 0x55910c4b0114 in _start (/usr/lib/frr/zebra+0x1ae114)

It happens with FRR using the kernel. During shutdown, the
namespace identifier is attempted to be obtained by zebra, in an
attempt to prepare zebra dataplane nexthop messages.

Fix this by accessing the ns structure.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
mergify bot pushed a commit that referenced this issue Oct 16, 2024
The following ASAN issue has been observed:

> ERROR: AddressSanitizer: heap-use-after-free on address 0x6160000acba4 at pc 0x55910c5694d0 bp 0x7ffe3a8ac850 sp 0x7ffe3a8ac840
> READ of size 4 at 0x6160000acba4 thread T0
>         #0 0x55910c5694cf in ctx_info_from_zns zebra/zebra_dplane.c:3315
>     #1 0x55910c569696 in dplane_ctx_ns_init zebra/zebra_dplane.c:3331
>     #2 0x55910c56bf61 in dplane_ctx_nexthop_init zebra/zebra_dplane.c:3680
>     #3 0x55910c5711ca in dplane_nexthop_update_internal zebra/zebra_dplane.c:4490
>     #4 0x55910c571c5c in dplane_nexthop_delete zebra/zebra_dplane.c:4717
>     #5 0x55910c61e90e in zebra_nhg_uninstall_kernel zebra/zebra_nhg.c:3413
>     #6 0x55910c615d8a in zebra_nhg_decrement_ref zebra/zebra_nhg.c:1919
>     #7 0x55910c6404db in route_entry_update_nhe zebra/zebra_rib.c:454
>     #8 0x55910c64c904 in rib_re_nhg_free zebra/zebra_rib.c:2822
>     #9 0x55910c655be2 in rib_unlink zebra/zebra_rib.c:4212
>     #10 0x55910c6430f9 in zebra_rtable_node_cleanup zebra/zebra_rib.c:968
>     #11 0x7f26f275b8a9 in route_node_free lib/table.c:75
>     #12 0x7f26f275bae4 in route_table_free lib/table.c:111
>     #13 0x7f26f275b749 in route_table_finish lib/table.c:46
>     #14 0x55910c65db17 in zebra_router_free_table zebra/zebra_router.c:191
>     #15 0x55910c65dfb5 in zebra_router_terminate zebra/zebra_router.c:244
>     #16 0x55910c4f40db in zebra_finalize zebra/main.c:249
>     #17 0x7f26f2777108 in event_call lib/event.c:2011
>     #18 0x7f26f264180e in frr_run lib/libfrr.c:1212
>     #19 0x55910c4f49cb in main zebra/main.c:531
>     #20 0x7f26f2029d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>     #21 0x7f26f2029e3f in __libc_start_main_impl ../csu/libc-start.c:392
>     #22 0x55910c4b0114 in _start (/usr/lib/frr/zebra+0x1ae114)

It happens with FRR using the kernel. During shutdown, the
namespace identifier is attempted to be obtained by zebra, in an
attempt to prepare zebra dataplane nexthop messages.

Fix this by accessing the ns structure.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
(cherry picked from commit 7ae70eb)
mergify bot pushed a commit that referenced this issue Oct 16, 2024
The following ASAN issue has been observed:

> ERROR: AddressSanitizer: heap-use-after-free on address 0x6160000acba4 at pc 0x55910c5694d0 bp 0x7ffe3a8ac850 sp 0x7ffe3a8ac840
> READ of size 4 at 0x6160000acba4 thread T0
>         #0 0x55910c5694cf in ctx_info_from_zns zebra/zebra_dplane.c:3315
>     #1 0x55910c569696 in dplane_ctx_ns_init zebra/zebra_dplane.c:3331
>     #2 0x55910c56bf61 in dplane_ctx_nexthop_init zebra/zebra_dplane.c:3680
>     #3 0x55910c5711ca in dplane_nexthop_update_internal zebra/zebra_dplane.c:4490
>     #4 0x55910c571c5c in dplane_nexthop_delete zebra/zebra_dplane.c:4717
>     #5 0x55910c61e90e in zebra_nhg_uninstall_kernel zebra/zebra_nhg.c:3413
>     #6 0x55910c615d8a in zebra_nhg_decrement_ref zebra/zebra_nhg.c:1919
>     #7 0x55910c6404db in route_entry_update_nhe zebra/zebra_rib.c:454
>     #8 0x55910c64c904 in rib_re_nhg_free zebra/zebra_rib.c:2822
>     #9 0x55910c655be2 in rib_unlink zebra/zebra_rib.c:4212
>     #10 0x55910c6430f9 in zebra_rtable_node_cleanup zebra/zebra_rib.c:968
>     #11 0x7f26f275b8a9 in route_node_free lib/table.c:75
>     #12 0x7f26f275bae4 in route_table_free lib/table.c:111
>     #13 0x7f26f275b749 in route_table_finish lib/table.c:46
>     #14 0x55910c65db17 in zebra_router_free_table zebra/zebra_router.c:191
>     #15 0x55910c65dfb5 in zebra_router_terminate zebra/zebra_router.c:244
>     #16 0x55910c4f40db in zebra_finalize zebra/main.c:249
>     #17 0x7f26f2777108 in event_call lib/event.c:2011
>     #18 0x7f26f264180e in frr_run lib/libfrr.c:1212
>     #19 0x55910c4f49cb in main zebra/main.c:531
>     #20 0x7f26f2029d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>     #21 0x7f26f2029e3f in __libc_start_main_impl ../csu/libc-start.c:392
>     #22 0x55910c4b0114 in _start (/usr/lib/frr/zebra+0x1ae114)

It happens with FRR using the kernel. During shutdown, the
namespace identifier is attempted to be obtained by zebra, in an
attempt to prepare zebra dataplane nexthop messages.

Fix this by accessing the ns structure.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
(cherry picked from commit 7ae70eb)

# Conflicts:
#	zebra/main.c
#	zebra/zebra_ns.h
mergify bot pushed a commit that referenced this issue Oct 16, 2024
The following ASAN issue has been observed:

> ERROR: AddressSanitizer: heap-use-after-free on address 0x6160000acba4 at pc 0x55910c5694d0 bp 0x7ffe3a8ac850 sp 0x7ffe3a8ac840
> READ of size 4 at 0x6160000acba4 thread T0
>         #0 0x55910c5694cf in ctx_info_from_zns zebra/zebra_dplane.c:3315
>     #1 0x55910c569696 in dplane_ctx_ns_init zebra/zebra_dplane.c:3331
>     #2 0x55910c56bf61 in dplane_ctx_nexthop_init zebra/zebra_dplane.c:3680
>     #3 0x55910c5711ca in dplane_nexthop_update_internal zebra/zebra_dplane.c:4490
>     #4 0x55910c571c5c in dplane_nexthop_delete zebra/zebra_dplane.c:4717
>     #5 0x55910c61e90e in zebra_nhg_uninstall_kernel zebra/zebra_nhg.c:3413
>     #6 0x55910c615d8a in zebra_nhg_decrement_ref zebra/zebra_nhg.c:1919
>     #7 0x55910c6404db in route_entry_update_nhe zebra/zebra_rib.c:454
>     #8 0x55910c64c904 in rib_re_nhg_free zebra/zebra_rib.c:2822
>     #9 0x55910c655be2 in rib_unlink zebra/zebra_rib.c:4212
>     #10 0x55910c6430f9 in zebra_rtable_node_cleanup zebra/zebra_rib.c:968
>     #11 0x7f26f275b8a9 in route_node_free lib/table.c:75
>     #12 0x7f26f275bae4 in route_table_free lib/table.c:111
>     #13 0x7f26f275b749 in route_table_finish lib/table.c:46
>     #14 0x55910c65db17 in zebra_router_free_table zebra/zebra_router.c:191
>     #15 0x55910c65dfb5 in zebra_router_terminate zebra/zebra_router.c:244
>     #16 0x55910c4f40db in zebra_finalize zebra/main.c:249
>     #17 0x7f26f2777108 in event_call lib/event.c:2011
>     #18 0x7f26f264180e in frr_run lib/libfrr.c:1212
>     #19 0x55910c4f49cb in main zebra/main.c:531
>     #20 0x7f26f2029d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>     #21 0x7f26f2029e3f in __libc_start_main_impl ../csu/libc-start.c:392
>     #22 0x55910c4b0114 in _start (/usr/lib/frr/zebra+0x1ae114)

It happens with FRR using the kernel. During shutdown, the
namespace identifier is attempted to be obtained by zebra, in an
attempt to prepare zebra dataplane nexthop messages.

Fix this by accessing the ns structure.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
(cherry picked from commit 7ae70eb)
mergify bot pushed a commit that referenced this issue Oct 16, 2024
The following ASAN issue has been observed:

> ERROR: AddressSanitizer: heap-use-after-free on address 0x6160000acba4 at pc 0x55910c5694d0 bp 0x7ffe3a8ac850 sp 0x7ffe3a8ac840
> READ of size 4 at 0x6160000acba4 thread T0
>         #0 0x55910c5694cf in ctx_info_from_zns zebra/zebra_dplane.c:3315
>     #1 0x55910c569696 in dplane_ctx_ns_init zebra/zebra_dplane.c:3331
>     #2 0x55910c56bf61 in dplane_ctx_nexthop_init zebra/zebra_dplane.c:3680
>     #3 0x55910c5711ca in dplane_nexthop_update_internal zebra/zebra_dplane.c:4490
>     #4 0x55910c571c5c in dplane_nexthop_delete zebra/zebra_dplane.c:4717
>     #5 0x55910c61e90e in zebra_nhg_uninstall_kernel zebra/zebra_nhg.c:3413
>     #6 0x55910c615d8a in zebra_nhg_decrement_ref zebra/zebra_nhg.c:1919
>     #7 0x55910c6404db in route_entry_update_nhe zebra/zebra_rib.c:454
>     #8 0x55910c64c904 in rib_re_nhg_free zebra/zebra_rib.c:2822
>     #9 0x55910c655be2 in rib_unlink zebra/zebra_rib.c:4212
>     #10 0x55910c6430f9 in zebra_rtable_node_cleanup zebra/zebra_rib.c:968
>     #11 0x7f26f275b8a9 in route_node_free lib/table.c:75
>     #12 0x7f26f275bae4 in route_table_free lib/table.c:111
>     #13 0x7f26f275b749 in route_table_finish lib/table.c:46
>     #14 0x55910c65db17 in zebra_router_free_table zebra/zebra_router.c:191
>     #15 0x55910c65dfb5 in zebra_router_terminate zebra/zebra_router.c:244
>     #16 0x55910c4f40db in zebra_finalize zebra/main.c:249
>     #17 0x7f26f2777108 in event_call lib/event.c:2011
>     #18 0x7f26f264180e in frr_run lib/libfrr.c:1212
>     #19 0x55910c4f49cb in main zebra/main.c:531
>     #20 0x7f26f2029d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>     #21 0x7f26f2029e3f in __libc_start_main_impl ../csu/libc-start.c:392
>     #22 0x55910c4b0114 in _start (/usr/lib/frr/zebra+0x1ae114)

It happens with FRR using the kernel. During shutdown, the
namespace identifier is attempted to be obtained by zebra, in an
attempt to prepare zebra dataplane nexthop messages.

Fix this by accessing the ns structure.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
(cherry picked from commit 7ae70eb)

# Conflicts:
#	zebra/main.c
#	zebra/zebra_ns.h
mergify bot pushed a commit that referenced this issue Oct 16, 2024
The following ASAN issue has been observed:

> ERROR: AddressSanitizer: heap-use-after-free on address 0x6160000acba4 at pc 0x55910c5694d0 bp 0x7ffe3a8ac850 sp 0x7ffe3a8ac840
> READ of size 4 at 0x6160000acba4 thread T0
>         #0 0x55910c5694cf in ctx_info_from_zns zebra/zebra_dplane.c:3315
>     #1 0x55910c569696 in dplane_ctx_ns_init zebra/zebra_dplane.c:3331
>     #2 0x55910c56bf61 in dplane_ctx_nexthop_init zebra/zebra_dplane.c:3680
>     #3 0x55910c5711ca in dplane_nexthop_update_internal zebra/zebra_dplane.c:4490
>     #4 0x55910c571c5c in dplane_nexthop_delete zebra/zebra_dplane.c:4717
>     #5 0x55910c61e90e in zebra_nhg_uninstall_kernel zebra/zebra_nhg.c:3413
>     #6 0x55910c615d8a in zebra_nhg_decrement_ref zebra/zebra_nhg.c:1919
>     #7 0x55910c6404db in route_entry_update_nhe zebra/zebra_rib.c:454
>     #8 0x55910c64c904 in rib_re_nhg_free zebra/zebra_rib.c:2822
>     #9 0x55910c655be2 in rib_unlink zebra/zebra_rib.c:4212
>     #10 0x55910c6430f9 in zebra_rtable_node_cleanup zebra/zebra_rib.c:968
>     #11 0x7f26f275b8a9 in route_node_free lib/table.c:75
>     #12 0x7f26f275bae4 in route_table_free lib/table.c:111
>     #13 0x7f26f275b749 in route_table_finish lib/table.c:46
>     #14 0x55910c65db17 in zebra_router_free_table zebra/zebra_router.c:191
>     #15 0x55910c65dfb5 in zebra_router_terminate zebra/zebra_router.c:244
>     #16 0x55910c4f40db in zebra_finalize zebra/main.c:249
>     #17 0x7f26f2777108 in event_call lib/event.c:2011
>     #18 0x7f26f264180e in frr_run lib/libfrr.c:1212
>     #19 0x55910c4f49cb in main zebra/main.c:531
>     #20 0x7f26f2029d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>     #21 0x7f26f2029e3f in __libc_start_main_impl ../csu/libc-start.c:392
>     #22 0x55910c4b0114 in _start (/usr/lib/frr/zebra+0x1ae114)

It happens with FRR using the kernel. During shutdown, the
namespace identifier is attempted to be obtained by zebra, in an
attempt to prepare zebra dataplane nexthop messages.

Fix this by accessing the ns structure.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
(cherry picked from commit 7ae70eb)
pguibert6WIND added a commit to pguibert6WIND/frr that referenced this issue Oct 21, 2024
When a failover happens on ECMP paths that use the same
nexthop which is recursively resolved, ZEBRA replaces the
old NHG with a new one, and updates the pointer of all
routes using that nexthop.

Actually, if only the recursive nexthop changed, there is
no need to replace the old NHG.
Modify the zebra_nhg_proto_add() function, by updating
the recursive nexthop on the original NHG.

Using this change replaces the old method that was consisting in
allocating a new nhe. This change triggers an ASAN in the
bgp_nhg_zapi_scalability test, function
test_bgp_ipv4_simulate_r5_machine_going_down().

> r1: zebra triggered an exception by AddressSanitizer
> AddressSanitizer error in topotest `test_bgp_nhg_zapi_scalability.py`, test `teardown_module`, router `r1`
>
> ERROR: AddressSanitizer: heap-use-after-free on address 0x60e00230afa0 at pc 0x55bfebc9681e bp 0x7ffd657ceb40 sp 0x7ffd657ceb30
> READ of size 4 at 0x60e00230afa0 thread T0
>     #0 0x55bfebc9681d in zebra_nhg_decrement_ref zebra/zebra_nhg.c:1855
>     FRRouting#1 0x55bfebc967f7 in zebra_nhg_decrement_ref zebra/zebra_nhg.c:1868
>     FRRouting#2 0x55bfebcb32f6 in route_entry_update_nhe zebra/zebra_rib.c:460
>     FRRouting#3 0x55bfebcb352f in rib_handle_nhg_replace zebra/zebra_rib.c:486
>     FRRouting#4 0x55bfebc99c14 in zebra_nhg_proto_add zebra/zebra_nhg.c:3836
>     FRRouting#5 0x55bfebcc4035 in process_subq_nhg zebra/zebra_rib.c:2763
>     FRRouting#6 0x55bfebcc4035 in process_subq zebra/zebra_rib.c:3369
>     FRRouting#7 0x55bfebcc4035 in meta_queue_process zebra/zebra_rib.c:3422
>     FRRouting#8 0x7f458a518bff in work_queue_run lib/workqueue.c:282
>     FRRouting#9 0x7f458a4fa24b in event_call lib/event.c:2019
>     FRRouting#10 0x7f458a41f717 in frr_run lib/libfrr.c:1238
>     FRRouting#11 0x55bfebb82cb4 in main zebra/main.c:528
>     FRRouting#12 0x7f4589e29d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>     FRRouting#13 0x7f4589e29e3f in __libc_start_main_impl ../csu/libc-start.c:392
>     FRRouting#14 0x55bfebb85c34 in _start (/usr/lib/frr/zebra+0x1abc34)
>
> 0x60e00230afa0 is located 96 bytes inside of 160-byte region [0x60e00230af40,0x60e00230afe0)
> freed by thread T0 here:
>     #0 0x7f458a8b4537 in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:127
>     FRRouting#1 0x55bfebc967f7 in zebra_nhg_decrement_ref zebra/zebra_nhg.c:1868
>     FRRouting#2 0x55bfebcb32f6 in route_entry_update_nhe zebra/zebra_rib.c:460
>     FRRouting#3 0x55bfebcb352f in rib_handle_nhg_replace zebra/zebra_rib.c:486
>     FRRouting#4 0x55bfebc99c14 in zebra_nhg_proto_add zebra/zebra_nhg.c:3836
>     FRRouting#5 0x55bfebcc4035 in process_subq_nhg zebra/zebra_rib.c:2763
>     FRRouting#6 0x55bfebcc4035 in process_subq zebra/zebra_rib.c:3369
>     FRRouting#7 0x55bfebcc4035 in meta_queue_process zebra/zebra_rib.c:3422
>     FRRouting#8 0x7f458a518bff in work_queue_run lib/workqueue.c:282
>     FRRouting#9 0x7f458a4fa24b in event_call lib/event.c:2019
>     FRRouting#10 0x7f458a41f717 in frr_run lib/libfrr.c:1238
>     FRRouting#11 0x55bfebb82cb4 in main zebra/main.c:528
>     FRRouting#12 0x7f4589e29d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>
> previously allocated by thread T0 here:
>     #0 0x7f458a8b4a57 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
>     FRRouting#1 0x7f458a43fb7e in qcalloc lib/memory.c:106
>     FRRouting#2 0x55bfebc91f2e in zebra_nhg_alloc zebra/zebra_nhg.c:392
>     FRRouting#3 0x55bfebc91f2e in zebra_nhe_copy zebra/zebra_nhg.c:499
>     FRRouting#4 0x55bfebc922af in zebra_nhg_hash_alloc zebra/zebra_nhg.c:538
>     FRRouting#5 0x7f458a3fd0bd in hash_get lib/hash.c:147
>     FRRouting#6 0x55bfebc94d7a in zebra_nhe_find zebra/zebra_nhg.c:831
>     FRRouting#7 0x55bfebc953ef in zebra_nhg_find zebra/zebra_nhg.c:1013
>     FRRouting#8 0x55bfebc9585d in zebra_nhg_find_nexthop zebra/zebra_nhg.c:1030
>     FRRouting#9 0x55bfebc94078 in depends_find_recursive zebra/zebra_nhg.c:1511
>     FRRouting#10 0x55bfebc94078 in depends_find zebra/zebra_nhg.c:1560
>     FRRouting#11 0x55bfebc94078 in depends_find_add zebra/zebra_nhg.c:1599
>     FRRouting#12 0x55bfebc99e40 in zebra_nhg_update_nhe zebra/zebra_nhg.c:3732
>     FRRouting#13 0x55bfebc99e40 in zebra_nhg_proto_add zebra/zebra_nhg.c:3819
>     FRRouting#14 0x55bfebcc4035 in process_subq_nhg zebra/zebra_rib.c:2763
>     FRRouting#15 0x55bfebcc4035 in process_subq zebra/zebra_rib.c:3369
>     FRRouting#16 0x55bfebcc4035 in meta_queue_process zebra/zebra_rib.c:3422
>     FRRouting#17 0x7f458a518bff in work_queue_run lib/workqueue.c:282
>     FRRouting#18 0x7f458a4fa24b in event_call lib/event.c:2019
>     FRRouting#19 0x7f458a41f717 in frr_run lib/libfrr.c:1238
>     FRRouting#20 0x55bfebb82cb4 in main zebra/main.c:528
>     FRRouting#21 0x7f4589e29d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>
> SUMMARY: AddressSanitizer: heap-use-after-free zebra/zebra_nhg.c:1855 in zebra_nhg_decrement_ref
> Shadow bytes around the buggy address:
>   0x0c1c804595a0: fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa fa
>   0x0c1c804595b0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
>   0x0c1c804595c0: fd fd fd fa fa fa fa fa fa fa fa fa fd fd fd fd
>   0x0c1c804595d0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa
>   0x0c1c804595e0: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
> =>0x0c1c804595f0: fd fd fd fd[fd]fd fd fd fd fd fd fd fa fa fa fa
>   0x0c1c80459600: fa fa fa fa fd fd fd fd fd fd fd fd fd fd fd fd
>   0x0c1c80459610: fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa fa
>   0x0c1c80459620: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
>   0x0c1c80459630: fd fd fd fa fa fa fa fa fa fa fa fa 00 00 00 00
>   0x0c1c80459640: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fa
> Shadow byte legend (one shadow byte represents 8 application bytes):
>   Addressable:           00
>   Partially addressable: 01 02 03 04 05 06 07
>   Heap left redzone:       fa
>   Freed heap region:       fd
>   Stack left redzone:      f1
>   Stack mid redzone:       f2
>   Stack right redzone:     f3
>   Stack after return:      f5
>   Stack use after scope:   f8
>   Global redzone:          f9
>   Global init order:       f6
>   Poisoned by user:        f7
>   Container overflow:      fc
>   Array cookie:            ac
>   Intra object redzone:    bb
>   ASan internal:           fe
>   Left alloca redzone:     ca
>   Right alloca redzone:    cb
>   Shadow gap:              cc
>

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
pguibert6WIND added a commit to pguibert6WIND/frr that referenced this issue Oct 30, 2024
The following ASAN error can be seen.

> ERROR: AddressSanitizer: attempting to call malloc_usable_size() for pointer which is not owned: 0x608000036c20
>     #0 0x7f3d7a4b5425 in __interceptor_malloc_usable_size ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:198
>     FRRouting#1 0x7f3d7a426a16 in __sanitizer::BufferedStackTrace::Unwind(unsigned long, unsigned long, void*, bool, unsigned int) ../../../../src/libsanitizer/sanitizer_common
> /sanitizer_stacktrace.h:122
>     FRRouting#2 0x7f3d7a426a16 in __asan::asan_malloc_usable_size(void const*, unsigned long, unsigned long) ../../../../src/libsanitizer/asan/asan_allocator.cpp:1074
>     FRRouting#3 0x7f3d7a03f330 in mt_count_free lib/memory.c:78
>     FRRouting#4 0x7f3d7a03f330 in qfree lib/memory.c:130
>     FRRouting#5 0x7f3d76ccf89b in bmp_peer_status_changed bgpd/bgp_bmp.c:982
>     FRRouting#6 0x560ae2aa6a94 in hook_call_peer_status_changed bgpd/bgp_fsm.c:47
>     FRRouting#7 0x560ae2aa6a94 in bgp_fsm_change_status bgpd/bgp_fsm.c:1287
>     FRRouting#8 0x560ae2c4f2e5 in peer_delete bgpd/bgpd.c:2777
>     FRRouting#9 0x560ae2c58d24 in bgp_delete bgpd/bgpd.c:4140
>     FRRouting#10 0x560ae2bbb47e in no_router_bgp bgpd/bgp_vty.c:1764
>     FRRouting#11 0x7f3d79fb74ed in cmd_execute_command_real lib/command.c:1003
>     FRRouting#12 0x7f3d79fb78a3 in cmd_execute_command lib/command.c:1062
>     FRRouting#13 0x7f3d79fb7e03 in cmd_execute lib/command.c:1228
>     FRRouting#14 0x7f3d7a107b53 in vty_command lib/vty.c:625
>     FRRouting#15 0x7f3d7a109902 in vty_execute lib/vty.c:1388
>     FRRouting#16 0x7f3d7a10cc32 in vtysh_read lib/vty.c:2400
>     FRRouting#17 0x7f3d7a0f848b in event_call lib/event.c:2019
>     FRRouting#18 0x7f3d7a01e627 in frr_run lib/libfrr.c:1232
>     FRRouting#19 0x560ae29e0037 in main bgpd/bgp_main.c:555
>     FRRouting#20 0x7f3d79a29d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>     FRRouting#21 0x7f3d79a29e3f in __libc_start_main_impl ../csu/libc-start.c:392
>     FRRouting#22 0x560ae29e4ef4 in _start (/usr/lib/frr/bgpd+0x2eeef4)
>
> 0x608000036c20 is located 0 bytes inside of 81-byte region [0x608000036c20,0x608000036c71)
> freed by thread T0 here:
>     #0 0x7f3d7a4b4537 in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:127
>     FRRouting#1 0x7f3d76ccf85f in bmp_peer_status_changed bgpd/bgp_bmp.c:981
>     FRRouting#2 0x560ae2aa6a94 in hook_call_peer_status_changed bgpd/bgp_fsm.c:47
>     FRRouting#3 0x560ae2aa6a94 in bgp_fsm_change_status bgpd/bgp_fsm.c:1287
>     FRRouting#4 0x560ae2c4f2e5 in peer_delete bgpd/bgpd.c:2777
>     FRRouting#5 0x560ae2c58d24 in bgp_delete bgpd/bgpd.c:4140
>     FRRouting#6 0x560ae2bbb47e in no_router_bgp bgpd/bgp_vty.c:1764
>     FRRouting#7 0x7f3d79fb74ed in cmd_execute_command_real lib/command.c:1003
>     FRRouting#8 0x7f3d79fb78a3 in cmd_execute_command lib/command.c:1062
>     FRRouting#9 0x7f3d79fb7e03 in cmd_execute lib/command.c:1228
>     FRRouting#10 0x7f3d7a107b53 in vty_command lib/vty.c:625
>     FRRouting#11 0x7f3d7a109902 in vty_execute lib/vty.c:1388
>     FRRouting#12 0x7f3d7a10cc32 in vtysh_read lib/vty.c:2400
>     FRRouting#13 0x7f3d7a0f848b in event_call lib/event.c:2019
>     FRRouting#14 0x7f3d7a01e627 in frr_run lib/libfrr.c:1232
>     FRRouting#15 0x560ae29e0037 in main bgpd/bgp_main.c:555
>     FRRouting#16 0x7f3d79a29d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>
> previously allocated by thread T0 here:
>     #0 0x7f3d7a4b4887 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
>     FRRouting#1 0x7f3d7a03f0e9 in qmalloc lib/memory.c:101
>     FRRouting#2 0x7f3d76cd0166 in bmp_bgp_peer_vrf bgpd/bgp_bmp.c:2194
>     FRRouting#3 0x7f3d76cd0166 in bmp_bgp_update_vrf_status bgpd/bgp_bmp.c:2236
>     FRRouting#4 0x7f3d76cd29b8 in bmp_vrf_state_changed bgpd/bgp_bmp.c:3479
>     FRRouting#5 0x560ae2c45b34 in hook_call_bgp_instance_state bgpd/bgpd.c:88
>     FRRouting#6 0x560ae2c4d158 in bgp_instance_up bgpd/bgpd.c:3936
>     FRRouting#7 0x560ae29e5ed1 in bgp_vrf_enable bgpd/bgp_main.c:299
>     FRRouting#8 0x7f3d7a0ff8b1 in vrf_enable lib/vrf.c:286
>     FRRouting#9 0x7f3d7a0ff8b1 in vrf_enable lib/vrf.c:275
>     FRRouting#10 0x7f3d7a12ab66 in zclient_vrf_add lib/zclient.c:2561
>     FRRouting#11 0x7f3d7a12eb43 in zclient_read lib/zclient.c:4624
>     FRRouting#12 0x7f3d7a0f848b in event_call lib/event.c:2019
>     FRRouting#13 0x7f3d7a01e627 in frr_run lib/libfrr.c:1232
>     FRRouting#14 0x560ae29e0037 in main bgpd/bgp_main.c:555
>     FRRouting#15 0x7f3d79a29d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
pguibert6WIND added a commit to pguibert6WIND/frr that referenced this issue Oct 30, 2024
The following ASAN error can be seen.

> ERROR: AddressSanitizer: attempting to call malloc_usable_size() for pointer which is not owned: 0x608000036c20
>     #0 0x7f3d7a4b5425 in __interceptor_malloc_usable_size ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:198
>     FRRouting#1 0x7f3d7a426a16 in __sanitizer::BufferedStackTrace::Unwind(unsigned long, unsigned long, void*, bool, unsigned int) ../../../../src/libsanitizer/sanitizer_common
> /sanitizer_stacktrace.h:122
>     FRRouting#2 0x7f3d7a426a16 in __asan::asan_malloc_usable_size(void const*, unsigned long, unsigned long) ../../../../src/libsanitizer/asan/asan_allocator.cpp:1074
>     FRRouting#3 0x7f3d7a03f330 in mt_count_free lib/memory.c:78
>     FRRouting#4 0x7f3d7a03f330 in qfree lib/memory.c:130
>     FRRouting#5 0x7f3d76ccf89b in bmp_peer_status_changed bgpd/bgp_bmp.c:982
>     FRRouting#6 0x560ae2aa6a94 in hook_call_peer_status_changed bgpd/bgp_fsm.c:47
>     FRRouting#7 0x560ae2aa6a94 in bgp_fsm_change_status bgpd/bgp_fsm.c:1287
>     FRRouting#8 0x560ae2c4f2e5 in peer_delete bgpd/bgpd.c:2777
>     FRRouting#9 0x560ae2c58d24 in bgp_delete bgpd/bgpd.c:4140
>     FRRouting#10 0x560ae2bbb47e in no_router_bgp bgpd/bgp_vty.c:1764
>     FRRouting#11 0x7f3d79fb74ed in cmd_execute_command_real lib/command.c:1003
>     FRRouting#12 0x7f3d79fb78a3 in cmd_execute_command lib/command.c:1062
>     FRRouting#13 0x7f3d79fb7e03 in cmd_execute lib/command.c:1228
>     FRRouting#14 0x7f3d7a107b53 in vty_command lib/vty.c:625
>     FRRouting#15 0x7f3d7a109902 in vty_execute lib/vty.c:1388
>     FRRouting#16 0x7f3d7a10cc32 in vtysh_read lib/vty.c:2400
>     FRRouting#17 0x7f3d7a0f848b in event_call lib/event.c:2019
>     FRRouting#18 0x7f3d7a01e627 in frr_run lib/libfrr.c:1232
>     FRRouting#19 0x560ae29e0037 in main bgpd/bgp_main.c:555
>     FRRouting#20 0x7f3d79a29d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>     FRRouting#21 0x7f3d79a29e3f in __libc_start_main_impl ../csu/libc-start.c:392
>     FRRouting#22 0x560ae29e4ef4 in _start (/usr/lib/frr/bgpd+0x2eeef4)
>
> 0x608000036c20 is located 0 bytes inside of 81-byte region [0x608000036c20,0x608000036c71)
> freed by thread T0 here:
>     #0 0x7f3d7a4b4537 in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:127
>     FRRouting#1 0x7f3d76ccf85f in bmp_peer_status_changed bgpd/bgp_bmp.c:981
>     FRRouting#2 0x560ae2aa6a94 in hook_call_peer_status_changed bgpd/bgp_fsm.c:47
>     FRRouting#3 0x560ae2aa6a94 in bgp_fsm_change_status bgpd/bgp_fsm.c:1287
>     FRRouting#4 0x560ae2c4f2e5 in peer_delete bgpd/bgpd.c:2777
>     FRRouting#5 0x560ae2c58d24 in bgp_delete bgpd/bgpd.c:4140
>     FRRouting#6 0x560ae2bbb47e in no_router_bgp bgpd/bgp_vty.c:1764
>     FRRouting#7 0x7f3d79fb74ed in cmd_execute_command_real lib/command.c:1003
>     FRRouting#8 0x7f3d79fb78a3 in cmd_execute_command lib/command.c:1062
>     FRRouting#9 0x7f3d79fb7e03 in cmd_execute lib/command.c:1228
>     FRRouting#10 0x7f3d7a107b53 in vty_command lib/vty.c:625
>     FRRouting#11 0x7f3d7a109902 in vty_execute lib/vty.c:1388
>     FRRouting#12 0x7f3d7a10cc32 in vtysh_read lib/vty.c:2400
>     FRRouting#13 0x7f3d7a0f848b in event_call lib/event.c:2019
>     FRRouting#14 0x7f3d7a01e627 in frr_run lib/libfrr.c:1232
>     FRRouting#15 0x560ae29e0037 in main bgpd/bgp_main.c:555
>     FRRouting#16 0x7f3d79a29d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>
> previously allocated by thread T0 here:
>     #0 0x7f3d7a4b4887 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
>     FRRouting#1 0x7f3d7a03f0e9 in qmalloc lib/memory.c:101
>     FRRouting#2 0x7f3d76cd0166 in bmp_bgp_peer_vrf bgpd/bgp_bmp.c:2194
>     FRRouting#3 0x7f3d76cd0166 in bmp_bgp_update_vrf_status bgpd/bgp_bmp.c:2236
>     FRRouting#4 0x7f3d76cd29b8 in bmp_vrf_state_changed bgpd/bgp_bmp.c:3479
>     FRRouting#5 0x560ae2c45b34 in hook_call_bgp_instance_state bgpd/bgpd.c:88
>     FRRouting#6 0x560ae2c4d158 in bgp_instance_up bgpd/bgpd.c:3936
>     FRRouting#7 0x560ae29e5ed1 in bgp_vrf_enable bgpd/bgp_main.c:299
>     FRRouting#8 0x7f3d7a0ff8b1 in vrf_enable lib/vrf.c:286
>     FRRouting#9 0x7f3d7a0ff8b1 in vrf_enable lib/vrf.c:275
>     FRRouting#10 0x7f3d7a12ab66 in zclient_vrf_add lib/zclient.c:2561
>     FRRouting#11 0x7f3d7a12eb43 in zclient_read lib/zclient.c:4624
>     FRRouting#12 0x7f3d7a0f848b in event_call lib/event.c:2019
>     FRRouting#13 0x7f3d7a01e627 in frr_run lib/libfrr.c:1232
>     FRRouting#14 0x560ae29e0037 in main bgpd/bgp_main.c:555
>     FRRouting#15 0x7f3d79a29d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants