Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

isis crash #4

Closed
donaldsharp opened this issue Dec 16, 2016 · 7 comments
Closed

isis crash #4

donaldsharp opened this issue Dec 16, 2016 · 7 comments
Milestone

Comments

@donaldsharp
Copy link
Member

donaldsharp commented Dec 16, 2016

(gdb) bt
#0 0x00007f8a5c930067 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007f8a5c931448 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007f8a5c96e1b4 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#3 0x00007f8a5c97398e in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#4 0x00007f8a5c974696 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#5 0x000055dda631e615 in lsp_clear_data (lsp=lsp@entry=0x55dda73daaa0) at isis_lsp.c:123
#6 0x000055dda631e731 in lsp_destroy (lsp=0x55dda73daaa0) at isis_lsp.c:154
#7 0x000055dda6322ea9 in lsp_tick (thread=) at isis_lsp.c:2670
#8 0x00007f8a5d74539d in thread_call (thread=0x7ffde5c4a0c8) at thread.c:1462
#9 0x000055dda631d070 in main (argc=4, argv=, envp=) at isis_main.c:390
(gdb)

[6:35]

2016/12/14 23:27:32.188829 ISIS: lan hello on non broadcast circuit
2016/12/14 23:27:32.189054 ISIS: %ADJCHANGE: Adjacency to 1921.6810.0009 (swp1) changed from Unknown to Initializing, unspecified
2016/12/14 23:27:32.189064 ISIS: %ADJCHANGE: Adjacency to 1921.6810.0009 (swp1) changed from Initializing to Up, unspecified
2016/12/14 23:27:37.186777 ISIS: ISIS-Upd (FOO): LSP 0000.0000.0000.00-00 seq 0x00000001 with confused checksum received.
2016/12/14 23:27:37.186876 ISIS: ISIS-Spf: TENT is empty SPF-root:r10
2016/12/14 23:27:37.247257 ISIS: ISIS-Upd (FOO): LSP 1921.6810.0009.00-00 invalid LSP is type 0
2016/12/14 23:27:49.675359 ISIS: ISIS-Spf: TENT is empty SPF-root:r10
2016/12/14 23:27:50.250427 ISIS: ISIS-Spf: TENT is empty SPF-root:r10
2016/12/14 23:27:51.189042 ISIS: ISIS-Spf: TENT is empty SPF-root:r10
2016/12/14 23:28:02.189823 ISIS: ISIS-Spf: TENT is empty SPF-root:r10
2016/12/14 23:28:03.252071 ISIS: ISIS-Spf: TENT is empty SPF-root:r10
2016/12/14 23:28:38.532282 ISIS: ISIS-Upd (FOO): L1 LSP 0000.0000.0000.00-00 seq 0x00000001 aged out
2016/12/14 23:30:39.319943 ZEBRA: client 12 disconnected. 9 isis routes removed from the rib
2016/12/14 23:31:09.433950 ZEBRA: Terminating on signal
2016/12/14 23:31:09.433991 ZEBRA: IRDP: Received shutdown notification.

[6:35]

 r6 ---- r9 --- r10
 |\      |       |
 | \     |       |
 |  \    r8 --- r11
 |   r7
 r5  |
 | \ |
 |  r3 --- r2
 | /        |
 r4        r1

[6:36]
In the above topology we are seeing crashes in isis on r11, r4, and r7

[6:36]
config on r10:

[6:37]

interface lo
ip router isis FOO
isis circuit-type level-1
isis passive
!
interface swp1
ip router isis FOO
isis circuit-type level-1
isis network point-to-point
!
interface swp2
ip router isis FOO
isis circuit-type level-1
isis network point-to-point
!
router isis FOO
net 49.0003.1921.6810.0010.00
metric-style wide
is-type level-1
log-adjacency-changes
!```

[6:37]  
oh yeah crash on r10 aswell
@eqvinox eqvinox added this to the 2.0-rc1 milestone Dec 16, 2016
@eqvinox
Copy link
Contributor

eqvinox commented Dec 16, 2016

do we have some pcap files for this?

@donaldsharp
Copy link
Member Author

output.swp2.pcap.gz
output.swp1.pcap.gz

Multiple iterations of the crash hopefully included in the 2 pcap files. This is on r11

@donaldsharp
Copy link
Member Author

Valgrind caught it this time:

==7946== Invalid free() / delete / delete[] / realloc()
==7946== at 0x4C29E90: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==7946== by 0x119634: lsp_clear_data (isis_lsp.c:119)
==7946== by 0x119750: lsp_destroy (isis_lsp.c:150)
==7946== by 0x11DEC8: lsp_tick (isis_lsp.c:2639)
==7946== by 0x4E611CB: thread_call (thread.c:1442)
==7946== by 0x11808F: main (isis_main.c:389)
==7946== Address 0x7d94fdc is 28 bytes inside a block of size 43 alloc'd
==7946== at 0x4C28C20: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==7946== by 0x4E7E9B8: qmalloc (memory.c:61)
==7946== by 0x4E69901: stream_new (stream.c:105)
==7946== by 0x4E69AB3: stream_dup (stream.c:149)
==7946== by 0x11990B: lsp_update_data (isis_lsp.c:485)
==7946== by 0x11A579: lsp_update (isis_lsp.c:554)
==7946== by 0x12628F: process_lsp (isis_pdu.c:1526)
==7946== by 0x12679F: isis_handle_pdu (isis_pdu.c:2116)
==7946== by 0x12679F: isis_receive (isis_pdu.c:2157)
==7946== by 0x4E611CB: thread_call (thread.c:1442)
==7946== by 0x11808F: main (isis_main.c:389)
==7946==
==7946== Invalid free() / delete / delete[] / realloc()
==7946== at 0x4C29E90: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==7946== by 0x119657: lsp_clear_data (isis_lsp.c:121)
==7946== by 0x119750: lsp_destroy (isis_lsp.c:150)
==7946== by 0x11DEC8: lsp_tick (isis_lsp.c:2639)
==7946== by 0x4E611CB: thread_call (thread.c:1442)
==7946== by 0x11808F: main (isis_main.c:389)
==7946== Address 0x7d94fe7 is 39 bytes inside a block of size 43 alloc'd
==7946== at 0x4C28C20: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==7946== by 0x4E7E9B8: qmalloc (memory.c:61)
==7946== by 0x4E69901: stream_new (stream.c:105)
==7946== by 0x4E69AB3: stream_dup (stream.c:149)
==7946== by 0x11990B: lsp_update_data (isis_lsp.c:485)
==7946== by 0x11A579: lsp_update (isis_lsp.c:554)
==7946== by 0x12628F: process_lsp (isis_pdu.c:1526)
==7946== by 0x12679F: isis_handle_pdu (isis_pdu.c:2116)
==7946== by 0x12679F: isis_receive (isis_pdu.c:2157)
==7946== by 0x4E611CB: thread_call (thread.c:1442)
==7946== by 0x11808F: main (isis_main.c:389)
==7946==

@donaldsharp
Copy link
Member Author

So in isis_tlv.c we have this:

case DYNAMIC_HOSTNAME:
  *found |= TLVFLAG_DYN_HOSTNAME;

#ifdef EXTREME_TLV_DEBUG
zlog_debug ("ISIS-TLV (%s): Dynamic Hostname length %d",
areatag, length);
#endif /* EXTREME_TLV_DEBUG */
if (expected & TLVFLAG_DYN_HOSTNAME)
{
/
the length is also included in the pointed struct */
tlvs->hostname = (struct hostname *) (pnt - 1);
}
pnt += length;
break;

pnt is set off the stream_dup that lsp->pdu is set from, but we have a lsp->own_lsp set to true hence the crash.

@donaldsharp
Copy link
Member Author

Commit 4fedc05 addresses the issue from happening. But I would like to see Christian comment on the further debugging I provided before closing to see if he thinks what he has done has sufficiently closed the loop holes.

@donaldsharp
Copy link
Member Author

actually I was wrong, I just happened to recheck my test setup and am seeing the same core files.

@donaldsharp
Copy link
Member Author

Resolved via 07f2fb1

qlyoung referenced this issue in qlyoung/frr Sep 15, 2017
* commit '4709b4faa43907ed9fcaf5920e56b6664f0523cf':
  bgpd: peer hash expands until we are out of memory
qlyoung referenced this issue in qlyoung/frr Sep 15, 2017
Signed-off-by: Daniel Walton <dwalton@cumulusnetworks.com>
Reviewed-by:   Donald Sharp <sharpd@cumulusnetworks.com>

Ticket: CM-17175

If you are doing multipath in a VRF and bounce one of the multipaths for
a prefix, bgp is not updating the zebra entry for that prefix with the
new multipaths. We start with:

cel-redxp-10# show bgp vrf RED  ipv4 unicast 6.0.0.16/32
BGP routing table entry for 6.0.0.16/32
Paths: (4 available, best #4, table RED)
  Advertised to non peer-group peers:
  spine-1(swp1) spine-2(swp2) spine-3(swp3) spine-4(swp4)
  104 65104 65002
    fe80::202:ff:fe00:2d from spine-4(swp4) (6.0.0.12)
    (fe80::202:ff:fe00:2d) (used)
      Origin incomplete, localpref 100, valid, external, multipath, bestpath-from-AS 104
      AddPath ID: RX 0, TX 21
      Last update: Tue Aug  1 18:28:33 2017

  102 65104 65002
    fe80::202:ff:fe00:25 from spine-2(swp2) (6.0.0.10)
    (fe80::202:ff:fe00:25) (used)
      Origin incomplete, localpref 100, valid, external, multipath, bestpath-from-AS 102
      AddPath ID: RX 0, TX 20
      Last update: Tue Aug  1 18:28:33 2017

  103 65104 65002
    fe80::202:ff:fe00:29 from spine-3(swp3) (6.0.0.11)
    (fe80::202:ff:fe00:29) (used)
      Origin incomplete, localpref 100, valid, external, multipath, bestpath-from-AS 103
      AddPath ID: RX 0, TX 17
      Last update: Tue Aug  1 18:28:33 2017

  101 65104 65002
    fe80::202:ff:fe00:21 from spine-1(swp1) (6.0.0.9)
    (fe80::202:ff:fe00:21) (used)
      Origin incomplete, localpref 100, valid, external, multipath, bestpath-from-AS 101, best
      AddPath ID: RX 0, TX 8
      Last update: Tue Aug  1 18:28:33 2017

cel-redxp-10#
cel-redxp-10# show ip route vrf RED 6.0.0.16/32
Routing entry for 6.0.0.16/32
  Known via "bgp", distance 20, metric 0, vrf RED, best
  Last update 00:00:25 ago
  * fe80::202:ff:fe00:21, via swp1
  * fe80::202:ff:fe00:25, via swp2
  * fe80::202:ff:fe00:29, via swp3
  * fe80::202:ff:fe00:2d, via swp4

cel-redxp-10#

And then on spine-1 we bounce all peers

spine-1# clear ip bgp *
spine-1#

On the leaf (cel-redxp-10) we remove the route from spine-1

cel-redxp-10# show ip route vrf RED 6.0.0.16/32
Routing entry for 6.0.0.16/32
  Known via "bgp", distance 20, metric 0, vrf RED, best
  Last update 00:00:01 ago
  * fe80::202:ff:fe00:25, via swp2
  * fe80::202:ff:fe00:29, via swp3
  * fe80::202:ff:fe00:2d, via swp4

cel-redxp-10#

So far so good. The problem is when the session to spine-1 comes back up
bgp will mark the flag from spine-1 as multipath but does not update
zebra. We end up in a state where BGP has 4 paths flags as multipath but
only 3 paths are in the RIB.
qlyoung referenced this issue in qlyoung/frr Nov 6, 2017
Signed-off-by: Daniel Walton <dwalton@cumulusnetworks.com>

If you are doing multipath in a VRF and bounce one of the multipaths for
a prefix, bgp is not updating the zebra entry for that prefix with the
new multipaths. We start with:

cel-redxp-10# show bgp vrf RED  ipv4 unicast 6.0.0.16/32
BGP routing table entry for 6.0.0.16/32
Paths: (4 available, best #4, table RED)
  Advertised to non peer-group peers:
  spine-1(swp1) spine-2(swp2) spine-3(swp3) spine-4(swp4)
  104 65104 65002
    fe80::202:ff:fe00:2d from spine-4(swp4) (6.0.0.12)
    (fe80::202:ff:fe00:2d) (used)
      Origin incomplete, localpref 100, valid, external, multipath, bestpath-from-AS 104
      AddPath ID: RX 0, TX 21
      Last update: Tue Aug  1 18:28:33 2017

  102 65104 65002
    fe80::202:ff:fe00:25 from spine-2(swp2) (6.0.0.10)
    (fe80::202:ff:fe00:25) (used)
      Origin incomplete, localpref 100, valid, external, multipath, bestpath-from-AS 102
      AddPath ID: RX 0, TX 20
      Last update: Tue Aug  1 18:28:33 2017

  103 65104 65002
    fe80::202:ff:fe00:29 from spine-3(swp3) (6.0.0.11)
    (fe80::202:ff:fe00:29) (used)
      Origin incomplete, localpref 100, valid, external, multipath, bestpath-from-AS 103
      AddPath ID: RX 0, TX 17
      Last update: Tue Aug  1 18:28:33 2017

  101 65104 65002
    fe80::202:ff:fe00:21 from spine-1(swp1) (6.0.0.9)
    (fe80::202:ff:fe00:21) (used)
      Origin incomplete, localpref 100, valid, external, multipath, bestpath-from-AS 101, best
      AddPath ID: RX 0, TX 8
      Last update: Tue Aug  1 18:28:33 2017

cel-redxp-10#
cel-redxp-10# show ip route vrf RED 6.0.0.16/32
Routing entry for 6.0.0.16/32
  Known via "bgp", distance 20, metric 0, vrf RED, best
  Last update 00:00:25 ago
  * fe80::202:ff:fe00:21, via swp1
  * fe80::202:ff:fe00:25, via swp2
  * fe80::202:ff:fe00:29, via swp3
  * fe80::202:ff:fe00:2d, via swp4

cel-redxp-10#

And then on spine-1 we bounce all peers

spine-1# clear ip bgp *
spine-1#

On the leaf (cel-redxp-10) we remove the route from spine-1

cel-redxp-10# show ip route vrf RED 6.0.0.16/32
Routing entry for 6.0.0.16/32
  Known via "bgp", distance 20, metric 0, vrf RED, best
  Last update 00:00:01 ago
  * fe80::202:ff:fe00:25, via swp2
  * fe80::202:ff:fe00:29, via swp3
  * fe80::202:ff:fe00:2d, via swp4

cel-redxp-10#

So far so good. The problem is when the session to spine-1 comes back up
bgp will mark the flag from spine-1 as `multipath` but does not update
zebra. We end up in a state where BGP has 4 paths flags as multipath but
only 3 paths are in the RIB.
chiragshah6 pushed a commit to chiragshah6/frr that referenced this issue Jun 19, 2018
pguibert6WIND added a commit to pguibert6WIND/frr that referenced this issue Oct 10, 2024
The following ASAN issue has been observed:

> ERROR: AddressSanitizer: heap-use-after-free on address 0x6160000acba4 at pc 0x55910c5694d0 bp 0x7ffe3a8ac850 sp 0x7ffe3a8ac840
> READ of size 4 at 0x6160000acba4 thread T0
>         #0 0x55910c5694cf in ctx_info_from_zns zebra/zebra_dplane.c:3315
>     FRRouting#1 0x55910c569696 in dplane_ctx_ns_init zebra/zebra_dplane.c:3331
>     FRRouting#2 0x55910c56bf61 in dplane_ctx_nexthop_init zebra/zebra_dplane.c:3680
>     FRRouting#3 0x55910c5711ca in dplane_nexthop_update_internal zebra/zebra_dplane.c:4490
>     FRRouting#4 0x55910c571c5c in dplane_nexthop_delete zebra/zebra_dplane.c:4717
>     FRRouting#5 0x55910c61e90e in zebra_nhg_uninstall_kernel zebra/zebra_nhg.c:3413
>     FRRouting#6 0x55910c615d8a in zebra_nhg_decrement_ref zebra/zebra_nhg.c:1919
>     FRRouting#7 0x55910c6404db in route_entry_update_nhe zebra/zebra_rib.c:454
>     FRRouting#8 0x55910c64c904 in rib_re_nhg_free zebra/zebra_rib.c:2822
>     FRRouting#9 0x55910c655be2 in rib_unlink zebra/zebra_rib.c:4212
>     FRRouting#10 0x55910c6430f9 in zebra_rtable_node_cleanup zebra/zebra_rib.c:968
>     FRRouting#11 0x7f26f275b8a9 in route_node_free lib/table.c:75
>     FRRouting#12 0x7f26f275bae4 in route_table_free lib/table.c:111
>     FRRouting#13 0x7f26f275b749 in route_table_finish lib/table.c:46
>     FRRouting#14 0x55910c65db17 in zebra_router_free_table zebra/zebra_router.c:191
>     FRRouting#15 0x55910c65dfb5 in zebra_router_terminate zebra/zebra_router.c:244
>     FRRouting#16 0x55910c4f40db in zebra_finalize zebra/main.c:249
>     FRRouting#17 0x7f26f2777108 in event_call lib/event.c:2011
>     FRRouting#18 0x7f26f264180e in frr_run lib/libfrr.c:1212
>     FRRouting#19 0x55910c4f49cb in main zebra/main.c:531
>     FRRouting#20 0x7f26f2029d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>     FRRouting#21 0x7f26f2029e3f in __libc_start_main_impl ../csu/libc-start.c:392
>     FRRouting#22 0x55910c4b0114 in _start (/usr/lib/frr/zebra+0x1ae114)

It happens with FRR using the kernel. During shutdown, the
namespace identifier is attempted to be obtained by zebra, in an
attempt to prepare zebra dataplane nexthop messages.

Fix this by accessing the ns structure.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
pguibert6WIND added a commit to pguibert6WIND/frr that referenced this issue Oct 10, 2024
The following heap-use-after-free message happens when
teardown test happens on a topotest using protocol nexthop-groups.

> ==739645==ERROR: AddressSanitizer: heap-use-after-free on address 0x60e00004df48 at pc 0x558966dbd6d1 bp 0x7ffdfc1e0ec0 sp 0x7ffdfc1e0eb0
> READ of size 8 at 0x60e00004df48 thread T0
>     #0 0x558966dbd6d0 in dplane_ctx_route_init zebra/zebra_dplane.c:3447
>     FRRouting#1 0x558966dbd8f5 in dplane_route_update_internal zebra/zebra_dplane.c:4237
>     FRRouting#2 0x558966e5eb99 in rib_uninstall_kernel zebra/zebra_rib.c:778
>     FRRouting#3 0x558966e685f8 in rib_process_del_fib zebra/zebra_rib.c:1023
>     FRRouting#4 0x558966e685f8 in rib_process zebra/zebra_rib.c:1489
>     FRRouting#5 0x558966e6ab55 in process_subq_route zebra/zebra_rib.c:2792
>     FRRouting#6 0x558966e6ab55 in process_subq zebra/zebra_rib.c:3356
>     FRRouting#7 0x558966e6ab55 in meta_queue_process zebra/zebra_rib.c:3395
>     FRRouting#8 0x7f7fd771207f in work_queue_run lib/workqueue.c:282
>     FRRouting#9 0x7f7fd76f3d3b in event_call lib/event.c:2011
>     FRRouting#10 0x7f7fd761b897 in frr_run lib/libfrr.c:1212
>     FRRouting#11 0x558966d270b6 in main zebra/main.c:533
>     FRRouting#12 0x7f7fd7029d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>     FRRouting#13 0x7f7fd7029e3f in __libc_start_main_impl ../csu/libc-start.c:392
>     FRRouting#14 0x558966d29ed4 in _start (/usr/lib/frr/zebra+0x1b4ed4)
>
> 0x60e00004df48 is located 40 bytes inside of 160-byte region [0x60e00004df20,0x60e00004dfc0)
> freed by thread T0 here:
>     #0 0x7f7fd7ab4537 in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:127
>     FRRouting#1 0x558966e6b38b in process_subq_nhg zebra/zebra_rib.c:2730
>     FRRouting#2 0x558966e6b38b in process_subq zebra/zebra_rib.c:3342
>     FRRouting#3 0x558966e6b38b in meta_queue_process zebra/zebra_rib.c:3395
>     FRRouting#4 0x7f7fd771207f in work_queue_run lib/workqueue.c:282
>     FRRouting#5 0x7f7fd76f3d3b in event_call lib/event.c:2011
>     FRRouting#6 0x7f7fd761b897 in frr_run lib/libfrr.c:1212
>     FRRouting#7 0x558966d270b6 in main zebra/main.c:533
>     FRRouting#8 0x7f7fd7029d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58

A ROUTE_DELETE message is sent with an NHE identifier, in addition
to NHG_DELETE. The latter message triggers the deletion of the NHE,
but no check is done for the former message.

Fix this by checking if the NHE ID exists before sending it to the
dataplane.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
piotrsuchy added a commit to piotrsuchy/frr that referenced this issue Oct 11, 2024
…d: fix show bgp all with evpn

Merge in HARDWARE/frr from psuchy/fix_show_bgp_all to akamai/debian/frr-8.4.2

Squashed commit of the following:

commit 094f403d1c900e232ac009f3ac0047dfd652c58e
Author: Louis Scalbert <louis.scalbert@6wind.com>
Date:   Thu Dec 29 16:50:54 2022 +0100

    bgpd: fix show bgp all with evpn

    Fix crash on "show bgp all" when BGP EVPN is set.

    > #0  raise (sig=11) at ../sysdeps/unix/sysv/linux/raise.c:50
    > #1  0x00007fdfe03cf53c in core_handler (signo=11, siginfo=0x7ffdebbffe30, context=0x7ffdebbffd00) at lib/sigevent.c:261
    > FRRouting#2  <signal handler called>
    > FRRouting#3  0x00000000004d4fec in bgp_attr_get_community (attr=0x41) at bgpd/bgp_attr.h:553
    > FRRouting#4  0x00000000004eee84 in bgp_show_table (vty=0x1a790d0, bgp=0x19d0a00, safi=SAFI_EVPN, table=0x19f6010, type=bgp_show_type_normal, output_arg=0x0, rd=0x0, is_last=1, output_cum=0x0,
    >     total_cum=0x0, json_header_depth=0x7ffdebc00bf8, show_flags=4, rpki_target_state=RPKI_NOT_BEING_USED) at bgpd/bgp_route.c:11329
    > FRRouting#5  0x00000000004f7765 in bgp_show (vty=0x1a790d0, bgp=0x19d0a00, afi=AFI_L2VPN, safi=SAFI_EVPN, type=bgp_show_type_normal, output_arg=0x0, show_flags=4,
    >     rpki_target_state=RPKI_NOT_BEING_USED) at bgpd/bgp_route.c:11814
    > FRRouting#6  0x00000000004fb53b in show_ip_bgp_magic (self=0x6752b0 <show_ip_bgp_cmd>, vty=0x1a790d0, argc=3, argv=0x19cb050, viewvrfname=0x0, all=0x1395390 "all", aa_nn=0x0, community_list=0,
    >     community_list_str=0x0, community_list_name=0x0, as_path_filter_name=0x0, prefix_list=0x0, accesslist_name=0x0, rmap_name=0x0, version=0, version_str=0x0, alias_name=0x0,
    >     orr_group_name=0x0, detail_routes=0x0, uj=0x0, detail_json=0x0, wide=0x0) at bgpd/bgp_route.c:13040
    > FRRouting#7  0x00000000004fa322 in show_ip_bgp (self=0x6752b0 <show_ip_bgp_cmd>, vty=0x1a790d0, argc=3, argv=0x19cb050) at ./bgpd/bgp_route_clippy.c:519
    > FRRouting#8  0x00007fdfe033ccc8 in cmd_execute_command_real (vline=0x19c9300, filter=FILTER_RELAXED, vty=0x1a790d0, cmd=0x0, up_level=0) at lib/command.c:996
    > FRRouting#9  0x00007fdfe033c739 in cmd_execute_command (vline=0x19c9300, vty=0x1a790d0, cmd=0x0, vtysh=0) at lib/command.c:1056
    > FRRouting#10 0x00007fdfe033cdf5 in cmd_execute (vty=0x1a790d0, cmd=0x19c9eb0 "show bgp all", matched=0x0, vtysh=0) at lib/command.c:1223
    > FRRouting#11 0x00007fdfe03f65c6 in vty_command (vty=0x1a790d0, buf=0x19c9eb0 "show bgp all") at lib/vty.c:486
    > FRRouting#12 0x00007fdfe03f603b in vty_execute (vty=0x1a790d0) at lib/vty.c:1249
    > FRRouting#13 0x00007fdfe03f533b in vtysh_read (thread=0x7ffdebc03838) at lib/vty.c:2148
    > FRRouting#14 0x00007fdfe03e815d in thread_call (thread=0x7ffdebc03838) at lib/thread.c:2006
    > FRRouting#15 0x00007fdfe0379b54 in frr_run (master=0x1246880) at lib/libfrr.c:1198
    > FRRouting#16 0x000000000042b2a8 in main (argc=7, argv=0x7ffdebc03af8) at bgpd/bgp_main.c:520

    Link: FRRouting#12576
    Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
pguibert6WIND added a commit to pguibert6WIND/frr that referenced this issue Oct 11, 2024
After having refreshed a recursive protocol NHG, a heaf after free
happens on the NHG dependencies.

> READ of size 4 at 0x60e000074cc0 thread T0
>     #0 0x555ea629eef0 in zebra_nhg_decrement_ref zebra/zebra_nhg.c:1904
>     FRRouting#1 0x555ea62a2748 in zebra_nhg_proto_add zebra/zebra_nhg.c:3981
>     FRRouting#2 0x555ea62ccf6c in process_subq_nhg zebra/zebra_rib.c:2737
>     FRRouting#3 0x555ea62ccf6c in process_subq zebra/zebra_rib.c:3342
>     FRRouting#4 0x555ea62ccf6c in meta_queue_process zebra/zebra_rib.c:3395
>     FRRouting#5 0x7fd799f1207f in work_queue_run lib/workqueue.c:282
>     FRRouting#6 0x7fd799ef3d3b in event_call lib/event.c:2011
>     FRRouting#7 0x7fd799e1b897 in frr_run lib/libfrr.c:1212
>     FRRouting#8 0x555ea61860b6 in main zebra/main.c:533
>     FRRouting#9 0x7fd799829d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>     FRRouting#10 0x7fd799829e3f in __libc_start_main_impl ../csu/libc-start.c:392
>     FRRouting#11 0x555ea6188ed4 in _start (/usr/lib/frr/zebra+0x1b4ed4)
>
> 0x60e000074cc0 is located 96 bytes inside of 160-byte region [0x60e000074c60,0x60e000074d00)
> freed by thread T0 here:
>     #0 0x7fd79a2b4537 in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:127
>     FRRouting#1 0x555ea629ef69 in nhg_connected_tree_decrement_ref zebra/zebra_nhg.c:187
>     FRRouting#2 0x555ea629eec7 in zebra_nhg_decrement_ref zebra/zebra_nhg.c:1920
>     FRRouting#3 0x555ea62bc110 in route_entry_update_nhe zebra/zebra_rib.c:454
>     FRRouting#4 0x555ea62bc3fb in rib_handle_nhg_replace zebra/zebra_rib.c:478
>     FRRouting#5 0x555ea62a22f8 in zebra_nhg_proto_add zebra/zebra_nhg.c:3966

Actually, 'debug zebra nexthop detail' is enabled and tries to display
nhg_depend list whose NHE have been previously flushed.
Fix this by removing the nhg_depends list itself, before sending it to
zebra_nhg_free().

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
pguibert6WIND added a commit to pguibert6WIND/frr that referenced this issue Oct 14, 2024
When a failover happens on ECMP paths that use the same
nexthop which is recursively resolved, ZEBRA replaces the
old NHG with a new one, and updates the pointer of all
routes using that nexthop.

Actually, if only the recursive nexthop changed, there is
no need to replace the old NHG.
Modify the zebra_nhg_proto_add() function, by updating
the recursive nexthop on the original NHG.

Using this change replaces the old method that was consisting in
allocating a new nhe. This change triggers an ASAN in the
bgp_nhg_zapi_scalability test, function
test_bgp_ipv4_simulate_r5_machine_going_down().

> ==1195107==ERROR: AddressSanitizer: heap-use-after-free on address 0x60e0000de580 at pc 0x55b6b7d55d8e bp 0x7fffd81977a0 sp 0x7fffd8197790
> READ of size 4 at 0x60e0000de580 thread T0
>     #0 0x55b6b7d55d8d in zebra_nhg_decrement_ref zebra/zebra_nhg.c:1858
>     FRRouting#1 0x55b6b7d55fee in zebra_nhg_free_members zebra/zebra_nhg.c:1752
>     FRRouting#2 0x55b6b7d55fee in zebra_nhg_free zebra/zebra_nhg.c:1772
>     FRRouting#3 0x55b6b7d59215 in zebra_nhg_proto_add zebra/zebra_nhg.c:3883
>     FRRouting#4 0x55b6b7d83615 in process_subq_nhg zebra/zebra_rib.c:2738
>     FRRouting#5 0x55b6b7d83615 in process_subq zebra/zebra_rib.c:3344
>     FRRouting#6 0x55b6b7d83615 in meta_queue_process zebra/zebra_rib.c:3397
>     FRRouting#7 0x7fe57a916fef in work_queue_run lib/workqueue.c:282
>     FRRouting#8 0x7fe57a8f863b in event_call lib/event.c:1996
>     FRRouting#9 0x7fe57a81e527 in frr_run lib/libfrr.c:1237
>     FRRouting#10 0x55b6b7c40c74 in main zebra/main.c:526
>     FRRouting#11 0x7fe57a229d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>     FRRouting#12 0x7fe57a229e3f in __libc_start_main_impl ../csu/libc-start.c:392
>     FRRouting#13 0x55b6b7c43b84 in _start (/usr/lib/frr/zebra+0x1adb84)
>
> 0x60e0000de580 is located 96 bytes inside of 160-byte region [0x60e0000de520,0x60e0000de5c0)
> freed by thread T0 here:
>     #0 0x7fe57acb4537 in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:127
>     FRRouting#1 0x55b6b7d59628 in zebra_nhg_proto_add zebra/zebra_nhg.c:3876
>     FRRouting#2 0x55b6b7d83615 in process_subq_nhg zebra/zebra_rib.c:2738
>     FRRouting#3 0x55b6b7d83615 in process_subq zebra/zebra_rib.c:3344
>     FRRouting#4 0x55b6b7d83615 in meta_queue_process zebra/zebra_rib.c:3397
>     FRRouting#5 0x7fe57a916fef in work_queue_run lib/workqueue.c:282
>     FRRouting#6 0x7fe57a8f863b in event_call lib/event.c:1996
>     FRRouting#7 0x7fe57a81e527 in frr_run lib/libfrr.c:1237
>     FRRouting#8 0x55b6b7c40c74 in main zebra/main.c:526
>     FRRouting#9 0x7fe57a229d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>
> previously allocated by thread T0 here:
>     #0 0x7fe57acb4a57 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
>     FRRouting#1 0x7fe57a83e98e in qcalloc lib/memory.c:106
>     FRRouting#2 0x55b6b7d5149e in zebra_nhg_alloc zebra/zebra_nhg.c:392
>     FRRouting#3 0x55b6b7d5149e in zebra_nhe_copy zebra/zebra_nhg.c:499
>     FRRouting#4 0x55b6b7d5181f in zebra_nhg_hash_alloc zebra/zebra_nhg.c:538
>     FRRouting#5 0x7fe57a7fbf0d in hash_get lib/hash.c:147
>     FRRouting#6 0x55b6b7d542ea in zebra_nhe_find zebra/zebra_nhg.c:832
>     FRRouting#7 0x55b6b7d5495f in zebra_nhg_find zebra/zebra_nhg.c:1014
>     FRRouting#8 0x55b6b7d54dcd in zebra_nhg_find_nexthop zebra/zebra_nhg.c:1031
>     FRRouting#9 0x55b6b7d535e8 in depends_find_recursive zebra/zebra_nhg.c:1514
>     FRRouting#10 0x55b6b7d535e8 in depends_find zebra/zebra_nhg.c:1563
>     FRRouting#11 0x55b6b7d535e8 in depends_find_add zebra/zebra_nhg.c:1602
>     FRRouting#12 0x55b6b7d59884 in zebra_nhg_update_nhe zebra/zebra_nhg.c:3738
>     FRRouting#13 0x55b6b7d59884 in zebra_nhg_proto_add zebra/zebra_nhg.c:3844
>     FRRouting#14 0x55b6b7d83615 in process_subq_nhg zebra/zebra_rib.c:2738
>     FRRouting#15 0x55b6b7d83615 in process_subq zebra/zebra_rib.c:3344
>     FRRouting#16 0x55b6b7d83615 in meta_queue_process zebra/zebra_rib.c:3397
>     FRRouting#17 0x7fe57a916fef in work_queue_run lib/workqueue.c:282
>     FRRouting#18 0x7fe57a8f863b in event_call lib/event.c:1996
>     FRRouting#19 0x7fe57a81e527 in frr_run lib/libfrr.c:1237
>     FRRouting#20 0x55b6b7c40c74 in main zebra/main.c:526
>     FRRouting#21 0x7fe57a229d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>
> SUMMARY: AddressSanitizer: heap-use-after-free zebra/zebra_nhg.c:1858 in zebra_nhg_decrement_ref
> Shadow bytes around the buggy address:
>   0x0c1c80013c60: fd fd fd fa fa fa fa fa fa fa fa fa fd fd fd fd
>   0x0c1c80013c70: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa
>   0x0c1c80013c80: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
>   0x0c1c80013c90: fd fd fd fd fd fd fd fd fd fd fd fa fa fa fa fa
>   0x0c1c80013ca0: fa fa fa fa fd fd fd fd fd fd fd fd fd fd fd fd
> =>0x0c1c80013cb0:[fd]fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa
>   0x0c1c80013cc0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
>   0x0c1c80013cd0: fd fd fd fa fa fa fa fa fa fa fa fa fd fd fd fd
>   0x0c1c80013ce0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa
>   0x0c1c80013cf0: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
>   0x0c1c80013d00: 00 00 00 00 00 00 00 00 00 00 00 fa fa fa fa fa
> Shadow byte legend (one shadow byte represents 8 application bytes):
>   Addressable:           00
>   Partially addressable: 01 02 03 04 05 06 07
>   Heap left redzone:       fa
>   Freed heap region:       fd
>   Stack left redzone:      f1
>   Stack mid redzone:       f2
>   Stack right redzone:     f3
>   Stack after return:      f5
>   Stack use after scope:   f8
>   Global redzone:          f9
>   Global init order:       f6
>   Poisoned by user:        f7
>   Container overflow:      fc
>   Array cookie:            ac
>   Intra object redzone:    bb
>   ASan internal:           fe
>   Left alloca redzone:     ca
>   Right alloca redzone:    cb
>   Shadow gap:              cc
> ==1195107==ABORTING
>

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
pguibert6WIND added a commit to pguibert6WIND/frr that referenced this issue Oct 14, 2024
A general flush is done on the nhg depend of the protocol nexthop group.
Actually, the NHG should not be removed, if there are routes attached to
it. In the same time, it seems the route count does not propagate to
the nhg_depends.

The con of this method is that there is still ASAN, and by comparing
the refcount value of the old way (allocation), the count is less
than expectd, for nexthop group with route count only:

Allocation method in proto_add():

> 2024/10/14 10:57:24.915401 ZEBRA: [VB8P9-5F2GE] zebra_nhg_proto_add: BEFORE NHE 71428576, (71428576[39/49/59]) cnt 2002
> 2024/10/14 10:57:24.915510 ZEBRA: [HCTBK-W37K2] zebra_nhg_proto_add: NHE 71428576, (71428576[49/59/65]) cnt 1
> 2024/10/14 10:57:24.915513 ZEBRA: [RM3ZQ-V7JN5] zebra_nhg_proto_add:            NHE 49, (49[50]) cnt 2012
> 2024/10/14 10:57:24.915515 ZEBRA: [VP9H1-EV2BN] 	(71428573)
> 2024/10/14 10:57:24.915515 ZEBRA: [VP9H1-EV2BN] 	(71428574)
> 2024/10/14 10:57:24.915516 ZEBRA: [VP9H1-EV2BN] 	(71428576)
> 2024/10/14 10:57:24.915517 ZEBRA: [VP9H1-EV2BN] 	(71428578)
> 2024/10/14 10:57:24.915517 ZEBRA: [RM3ZQ-V7JN5] zebra_nhg_proto_add:            NHE 59, (59[60]) cnt 2007
> 2024/10/14 10:57:24.915519 ZEBRA: [VP9H1-EV2BN] 	(71428575)
> 2024/10/14 10:57:24.915519 ZEBRA: [VP9H1-EV2BN] 	(71428576)
> 2024/10/14 10:57:24.915520 ZEBRA: [RM3ZQ-V7JN5] zebra_nhg_proto_add:            NHE 65, (65[42]) cnt 4
> 2024/10/14 10:57:24.915521 ZEBRA: [VP9H1-EV2BN] 	(71428571)
> 2024/10/14 10:57:24.915522 ZEBRA: [VP9H1-EV2BN] 	(71428576)

Method using general flush, but keep old pointer:

> 2024/10/14 10:51:17.229799 ZEBRA: [VB8P9-5F2GE] zebra_nhg_proto_add: BEFORE NHE 71428576, (71428576[39/49/59]) cnt 2002
> 2024/10/14 10:51:17.229909 ZEBRA: [HCTBK-W37K2] zebra_nhg_proto_add: NHE 71428576, (71428576[49/59/65]) cnt 2002
> 2024/10/14 10:51:17.229912 ZEBRA: [RM3ZQ-V7JN5] zebra_nhg_proto_add:            NHE 49, (49[50]) cnt 2011
> 2024/10/14 10:51:17.229914 ZEBRA: [VP9H1-EV2BN] 	(71428573)
> 2024/10/14 10:51:17.229915 ZEBRA: [VP9H1-EV2BN] 	(71428574)
> 2024/10/14 10:51:17.229915 ZEBRA: [VP9H1-EV2BN] 	(71428576)
> 2024/10/14 10:51:17.229916 ZEBRA: [VP9H1-EV2BN] 	(71428578)
> 2024/10/14 10:51:17.229916 ZEBRA: [RM3ZQ-V7JN5] zebra_nhg_proto_add:            NHE 59, (59[60]) cnt 2006
> 2024/10/14 10:51:17.229918 ZEBRA: [VP9H1-EV2BN] 	(71428575)
> 2024/10/14 10:51:17.229918 ZEBRA: [VP9H1-EV2BN] 	(71428576)
> 2024/10/14 10:51:17.229919 ZEBRA: [RM3ZQ-V7JN5] zebra_nhg_proto_add:            NHE 65, (65[42]) cnt 4
> 2024/10/14 10:51:17.229920 ZEBRA: [VP9H1-EV2BN] 	(71428571)
> 2024/10/14 10:51:17.229921 ZEBRA: [VP9H1-EV2BN] 	(71428576)

Resulting ASAN error when running bgp_nhg_zapi_notification, on the
test_bgp_ipv4_simulate_r5_machine_going_down() function:

> r1: zebra triggered an exception by AddressSanitizer
> AddressSanitizer error in topotest `test_bgp_nhg_zapi_scalability.py`, test `teardown_module`, router `r1`
>
> ERROR: AddressSanitizer: heap-use-after-free on address 0x60e0000de580 at pc 0x558a7d98cd8e bp 0x7fff4915a6e0 sp 0x7fff4915a6d0
> READ of size 4 at 0x60e0000de580 thread T0
>     #0 0x558a7d98cd8d in zebra_nhg_decrement_ref zebra/zebra_nhg.c:1858
>     FRRouting#1 0x558a7d98cfee in zebra_nhg_free_members zebra/zebra_nhg.c:1752
>     FRRouting#2 0x558a7d98cfee in zebra_nhg_free zebra/zebra_nhg.c:1772
>     FRRouting#3 0x558a7d9901ff in zebra_nhg_proto_add zebra/zebra_nhg.c:3861
>     FRRouting#4 0x558a7d9ba365 in process_subq_nhg zebra/zebra_rib.c:2738
>     FRRouting#5 0x558a7d9ba365 in process_subq zebra/zebra_rib.c:3344
>     FRRouting#6 0x558a7d9ba365 in meta_queue_process zebra/zebra_rib.c:3397
>     FRRouting#7 0x7fa262f16fef in work_queue_run lib/workqueue.c:282
>     FRRouting#8 0x7fa262ef863b in event_call lib/event.c:1996
>     FRRouting#9 0x7fa262e1e527 in frr_run lib/libfrr.c:1237
>     FRRouting#10 0x558a7d877c74 in main zebra/main.c:526
>     FRRouting#11 0x7fa262829d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>     FRRouting#12 0x7fa262829e3f in __libc_start_main_impl ../csu/libc-start.c:392
>     FRRouting#13 0x558a7d87ab84 in _start (/usr/lib/frr/zebra+0x1acb84)
>
> 0x60e0000de580 is located 96 bytes inside of 160-byte region [0x60e0000de520,0x60e0000de5c0)
> freed by thread T0 here:
>     #0 0x7fa2632b4537 in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:127
>     FRRouting#1 0x558a7d9908a1 in zebra_nhg_proto_add zebra/zebra_nhg.c:3854
>     FRRouting#2 0x558a7d9ba365 in process_subq_nhg zebra/zebra_rib.c:2738
>     FRRouting#3 0x558a7d9ba365 in process_subq zebra/zebra_rib.c:3344
>     FRRouting#4 0x558a7d9ba365 in meta_queue_process zebra/zebra_rib.c:3397
>     FRRouting#5 0x7fa262f16fef in work_queue_run lib/workqueue.c:282
>     FRRouting#6 0x7fa262ef863b in event_call lib/event.c:1996
>     FRRouting#7 0x7fa262e1e527 in frr_run lib/libfrr.c:1237
>     FRRouting#8 0x558a7d877c74 in main zebra/main.c:526
>     FRRouting#9 0x7fa262829d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>
> previously allocated by thread T0 here:
>     #0 0x7fa2632b4a57 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
>     FRRouting#1 0x7fa262e3e98e in qcalloc lib/memory.c:106
>     FRRouting#2 0x558a7d98849e in zebra_nhg_alloc zebra/zebra_nhg.c:392
>     FRRouting#3 0x558a7d98849e in zebra_nhe_copy zebra/zebra_nhg.c:499
>     FRRouting#4 0x558a7d98881f in zebra_nhg_hash_alloc zebra/zebra_nhg.c:538
>     FRRouting#5 0x7fa262dfbf0d in hash_get lib/hash.c:147
>     FRRouting#6 0x558a7d98b2ea in zebra_nhe_find zebra/zebra_nhg.c:832
>     FRRouting#7 0x558a7d98b95f in zebra_nhg_find zebra/zebra_nhg.c:1014
>     FRRouting#8 0x558a7d98bdcd in zebra_nhg_find_nexthop zebra/zebra_nhg.c:1031
>     FRRouting#9 0x558a7d98a5e8 in depends_find_recursive zebra/zebra_nhg.c:1514
>     FRRouting#10 0x558a7d98a5e8 in depends_find zebra/zebra_nhg.c:1563
>     FRRouting#11 0x558a7d98a5e8 in depends_find_add zebra/zebra_nhg.c:1602
>     FRRouting#12 0x558a7d990378 in zebra_nhg_update_nhe zebra/zebra_nhg.c:3739
>     FRRouting#13 0x558a7d990378 in zebra_nhg_proto_add zebra/zebra_nhg.c:3822
>     FRRouting#14 0x558a7d9ba365 in process_subq_nhg zebra/zebra_rib.c:2738
>     FRRouting#15 0x558a7d9ba365 in process_subq zebra/zebra_rib.c:3344
>     FRRouting#16 0x558a7d9ba365 in meta_queue_process zebra/zebra_rib.c:3397
>     FRRouting#17 0x7fa262f16fef in work_queue_run lib/workqueue.c:282
>     FRRouting#18 0x7fa262ef863b in event_call lib/event.c:1996
>     FRRouting#19 0x7fa262e1e527 in frr_run lib/libfrr.c:1237
>     FRRouting#20 0x558a7d877c74 in main zebra/main.c:526
>     FRRouting#21 0x7fa262829d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>
> SUMMARY: AddressSanitizer: heap-use-after-free zebra/zebra_nhg.c:1858 in zebra_nhg_decrement_ref
> Shadow bytes around the buggy address:
>   0x0c1c80013c60: fd fd fd fa fa fa fa fa fa fa fa fa fd fd fd fd
>   0x0c1c80013c70: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa
>   0x0c1c80013c80: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
>   0x0c1c80013c90: fd fd fd fd fd fd fd fd fd fd fd fa fa fa fa fa
>   0x0c1c80013ca0: fa fa fa fa fd fd fd fd fd fd fd fd fd fd fd fd
> =>0x0c1c80013cb0:[fd]fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa
>   0x0c1c80013cc0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
>   0x0c1c80013cd0: fd fd fd fa fa fa fa fa fa fa fa fa fd fd fd fd
>   0x0c1c80013ce0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa
>   0x0c1c80013cf0: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
>   0x0c1c80013d00: 00 00 00 00 00 00 00 00 00 00 00 fa fa fa fa fa
> Shadow byte legend (one shadow byte represents 8 application bytes):
>   Addressable:           00
>   Partially addressable: 01 02 03 04 05 06 07
>   Heap left redzone:       fa
>   Freed heap region:       fd
>   Stack left redzone:      f1
>   Stack mid redzone:       f2
>   Stack right redzone:     f3
>   Stack after return:      f5
>   Stack use after scope:   f8
>   Global redzone:          f9
>   Global init order:       f6
>   Poisoned by user:        f7
>   Container overflow:      fc
>   Array cookie:            ac
>   Intra object redzone:    bb
>   ASan internal:           fe
>   Left alloca redzone:     ca
>   Right alloca redzone:    cb
>   Shadow gap:              cc
>

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
pguibert6WIND added a commit to pguibert6WIND/frr that referenced this issue Oct 14, 2024
When a failover happens on ECMP paths that use the same
nexthop which is recursively resolved, ZEBRA replaces the
old NHG with a new one, and updates the pointer of all
routes using that nexthop.

Actually, if only the recursive nexthop changed, there is
no need to replace the old NHG.
Modify the zebra_nhg_proto_add() function, by updating
the recursive nexthop on the original NHG.

Using this change replaces the old method that was consisting in
allocating a new nhe. This change triggers an ASAN in the
bgp_nhg_zapi_scalability test, function
test_bgp_ipv4_simulate_r5_machine_going_down().

> ==1195107==ERROR: AddressSanitizer: heap-use-after-free on address 0x60e0000de580 at pc 0x55b6b7d55d8e bp 0x7fffd81977a0 sp 0x7fffd8197790
> READ of size 4 at 0x60e0000de580 thread T0
>     #0 0x55b6b7d55d8d in zebra_nhg_decrement_ref zebra/zebra_nhg.c:1858
>     FRRouting#1 0x55b6b7d55fee in zebra_nhg_free_members zebra/zebra_nhg.c:1752
>     FRRouting#2 0x55b6b7d55fee in zebra_nhg_free zebra/zebra_nhg.c:1772
>     FRRouting#3 0x55b6b7d59215 in zebra_nhg_proto_add zebra/zebra_nhg.c:3883
>     FRRouting#4 0x55b6b7d83615 in process_subq_nhg zebra/zebra_rib.c:2738
>     FRRouting#5 0x55b6b7d83615 in process_subq zebra/zebra_rib.c:3344
>     FRRouting#6 0x55b6b7d83615 in meta_queue_process zebra/zebra_rib.c:3397
>     FRRouting#7 0x7fe57a916fef in work_queue_run lib/workqueue.c:282
>     FRRouting#8 0x7fe57a8f863b in event_call lib/event.c:1996
>     FRRouting#9 0x7fe57a81e527 in frr_run lib/libfrr.c:1237
>     FRRouting#10 0x55b6b7c40c74 in main zebra/main.c:526
>     FRRouting#11 0x7fe57a229d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>     FRRouting#12 0x7fe57a229e3f in __libc_start_main_impl ../csu/libc-start.c:392
>     FRRouting#13 0x55b6b7c43b84 in _start (/usr/lib/frr/zebra+0x1adb84)
>
> 0x60e0000de580 is located 96 bytes inside of 160-byte region [0x60e0000de520,0x60e0000de5c0)
> freed by thread T0 here:
>     #0 0x7fe57acb4537 in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:127
>     FRRouting#1 0x55b6b7d59628 in zebra_nhg_proto_add zebra/zebra_nhg.c:3876
>     FRRouting#2 0x55b6b7d83615 in process_subq_nhg zebra/zebra_rib.c:2738
>     FRRouting#3 0x55b6b7d83615 in process_subq zebra/zebra_rib.c:3344
>     FRRouting#4 0x55b6b7d83615 in meta_queue_process zebra/zebra_rib.c:3397
>     FRRouting#5 0x7fe57a916fef in work_queue_run lib/workqueue.c:282
>     FRRouting#6 0x7fe57a8f863b in event_call lib/event.c:1996
>     FRRouting#7 0x7fe57a81e527 in frr_run lib/libfrr.c:1237
>     FRRouting#8 0x55b6b7c40c74 in main zebra/main.c:526
>     FRRouting#9 0x7fe57a229d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>
> previously allocated by thread T0 here:
>     #0 0x7fe57acb4a57 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
>     FRRouting#1 0x7fe57a83e98e in qcalloc lib/memory.c:106
>     FRRouting#2 0x55b6b7d5149e in zebra_nhg_alloc zebra/zebra_nhg.c:392
>     FRRouting#3 0x55b6b7d5149e in zebra_nhe_copy zebra/zebra_nhg.c:499
>     FRRouting#4 0x55b6b7d5181f in zebra_nhg_hash_alloc zebra/zebra_nhg.c:538
>     FRRouting#5 0x7fe57a7fbf0d in hash_get lib/hash.c:147
>     FRRouting#6 0x55b6b7d542ea in zebra_nhe_find zebra/zebra_nhg.c:832
>     FRRouting#7 0x55b6b7d5495f in zebra_nhg_find zebra/zebra_nhg.c:1014
>     FRRouting#8 0x55b6b7d54dcd in zebra_nhg_find_nexthop zebra/zebra_nhg.c:1031
>     FRRouting#9 0x55b6b7d535e8 in depends_find_recursive zebra/zebra_nhg.c:1514
>     FRRouting#10 0x55b6b7d535e8 in depends_find zebra/zebra_nhg.c:1563
>     FRRouting#11 0x55b6b7d535e8 in depends_find_add zebra/zebra_nhg.c:1602
>     FRRouting#12 0x55b6b7d59884 in zebra_nhg_update_nhe zebra/zebra_nhg.c:3738
>     FRRouting#13 0x55b6b7d59884 in zebra_nhg_proto_add zebra/zebra_nhg.c:3844
>     FRRouting#14 0x55b6b7d83615 in process_subq_nhg zebra/zebra_rib.c:2738
>     FRRouting#15 0x55b6b7d83615 in process_subq zebra/zebra_rib.c:3344
>     FRRouting#16 0x55b6b7d83615 in meta_queue_process zebra/zebra_rib.c:3397
>     FRRouting#17 0x7fe57a916fef in work_queue_run lib/workqueue.c:282
>     FRRouting#18 0x7fe57a8f863b in event_call lib/event.c:1996
>     FRRouting#19 0x7fe57a81e527 in frr_run lib/libfrr.c:1237
>     FRRouting#20 0x55b6b7c40c74 in main zebra/main.c:526
>     FRRouting#21 0x7fe57a229d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>
> SUMMARY: AddressSanitizer: heap-use-after-free zebra/zebra_nhg.c:1858 in zebra_nhg_decrement_ref
> Shadow bytes around the buggy address:
>   0x0c1c80013c60: fd fd fd fa fa fa fa fa fa fa fa fa fd fd fd fd
>   0x0c1c80013c70: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa
>   0x0c1c80013c80: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
>   0x0c1c80013c90: fd fd fd fd fd fd fd fd fd fd fd fa fa fa fa fa
>   0x0c1c80013ca0: fa fa fa fa fd fd fd fd fd fd fd fd fd fd fd fd
> =>0x0c1c80013cb0:[fd]fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa
>   0x0c1c80013cc0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
>   0x0c1c80013cd0: fd fd fd fa fa fa fa fa fa fa fa fa fd fd fd fd
>   0x0c1c80013ce0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa
>   0x0c1c80013cf0: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
>   0x0c1c80013d00: 00 00 00 00 00 00 00 00 00 00 00 fa fa fa fa fa
> Shadow byte legend (one shadow byte represents 8 application bytes):
>   Addressable:           00
>   Partially addressable: 01 02 03 04 05 06 07
>   Heap left redzone:       fa
>   Freed heap region:       fd
>   Stack left redzone:      f1
>   Stack mid redzone:       f2
>   Stack right redzone:     f3
>   Stack after return:      f5
>   Stack use after scope:   f8
>   Global redzone:          f9
>   Global init order:       f6
>   Poisoned by user:        f7
>   Container overflow:      fc
>   Array cookie:            ac
>   Intra object redzone:    bb
>   ASan internal:           fe
>   Left alloca redzone:     ca
>   Right alloca redzone:    cb
>   Shadow gap:              cc
> ==1195107==ABORTING
>

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
pguibert6WIND added a commit to pguibert6WIND/frr that referenced this issue Oct 14, 2024
A general flush is done on the nhg depend of the protocol nexthop group.
Actually, the NHG should not be removed, if there are routes attached to
it. In the same time, it seems the route count does not propagate to
the nhg_depends.

The con of this method is that there is still ASAN, and by comparing
the refcount value of the old way (allocation), the count is less
than expectd, for nexthop group with route count only:

Allocation method in proto_add():

> 2024/10/14 10:57:24.915401 ZEBRA: [VB8P9-5F2GE] zebra_nhg_proto_add: BEFORE NHE 71428576, (71428576[39/49/59]) cnt 2002
> 2024/10/14 10:57:24.915510 ZEBRA: [HCTBK-W37K2] zebra_nhg_proto_add: NHE 71428576, (71428576[49/59/65]) cnt 1
> 2024/10/14 10:57:24.915513 ZEBRA: [RM3ZQ-V7JN5] zebra_nhg_proto_add:            NHE 49, (49[50]) cnt 2012
> 2024/10/14 10:57:24.915515 ZEBRA: [VP9H1-EV2BN] 	(71428573)
> 2024/10/14 10:57:24.915515 ZEBRA: [VP9H1-EV2BN] 	(71428574)
> 2024/10/14 10:57:24.915516 ZEBRA: [VP9H1-EV2BN] 	(71428576)
> 2024/10/14 10:57:24.915517 ZEBRA: [VP9H1-EV2BN] 	(71428578)
> 2024/10/14 10:57:24.915517 ZEBRA: [RM3ZQ-V7JN5] zebra_nhg_proto_add:            NHE 59, (59[60]) cnt 2007
> 2024/10/14 10:57:24.915519 ZEBRA: [VP9H1-EV2BN] 	(71428575)
> 2024/10/14 10:57:24.915519 ZEBRA: [VP9H1-EV2BN] 	(71428576)
> 2024/10/14 10:57:24.915520 ZEBRA: [RM3ZQ-V7JN5] zebra_nhg_proto_add:            NHE 65, (65[42]) cnt 4
> 2024/10/14 10:57:24.915521 ZEBRA: [VP9H1-EV2BN] 	(71428571)
> 2024/10/14 10:57:24.915522 ZEBRA: [VP9H1-EV2BN] 	(71428576)

Method using general flush, but keep old pointer:

> 2024/10/14 10:51:17.229799 ZEBRA: [VB8P9-5F2GE] zebra_nhg_proto_add: BEFORE NHE 71428576, (71428576[39/49/59]) cnt 2002
> 2024/10/14 10:51:17.229909 ZEBRA: [HCTBK-W37K2] zebra_nhg_proto_add: NHE 71428576, (71428576[49/59/65]) cnt 2002
> 2024/10/14 10:51:17.229912 ZEBRA: [RM3ZQ-V7JN5] zebra_nhg_proto_add:            NHE 49, (49[50]) cnt 2011
> 2024/10/14 10:51:17.229914 ZEBRA: [VP9H1-EV2BN] 	(71428573)
> 2024/10/14 10:51:17.229915 ZEBRA: [VP9H1-EV2BN] 	(71428574)
> 2024/10/14 10:51:17.229915 ZEBRA: [VP9H1-EV2BN] 	(71428576)
> 2024/10/14 10:51:17.229916 ZEBRA: [VP9H1-EV2BN] 	(71428578)
> 2024/10/14 10:51:17.229916 ZEBRA: [RM3ZQ-V7JN5] zebra_nhg_proto_add:            NHE 59, (59[60]) cnt 2006
> 2024/10/14 10:51:17.229918 ZEBRA: [VP9H1-EV2BN] 	(71428575)
> 2024/10/14 10:51:17.229918 ZEBRA: [VP9H1-EV2BN] 	(71428576)
> 2024/10/14 10:51:17.229919 ZEBRA: [RM3ZQ-V7JN5] zebra_nhg_proto_add:            NHE 65, (65[42]) cnt 4
> 2024/10/14 10:51:17.229920 ZEBRA: [VP9H1-EV2BN] 	(71428571)
> 2024/10/14 10:51:17.229921 ZEBRA: [VP9H1-EV2BN] 	(71428576)

Resulting ASAN error when running bgp_nhg_zapi_notification, on the
test_bgp_ipv4_simulate_r5_machine_going_down() function:

> r1: zebra triggered an exception by AddressSanitizer
> AddressSanitizer error in topotest `test_bgp_nhg_zapi_scalability.py`, test `teardown_module`, router `r1`
>
> ERROR: AddressSanitizer: heap-use-after-free on address 0x60e0000de580 at pc 0x558a7d98cd8e bp 0x7fff4915a6e0 sp 0x7fff4915a6d0
> READ of size 4 at 0x60e0000de580 thread T0
>     #0 0x558a7d98cd8d in zebra_nhg_decrement_ref zebra/zebra_nhg.c:1858
>     FRRouting#1 0x558a7d98cfee in zebra_nhg_free_members zebra/zebra_nhg.c:1752
>     FRRouting#2 0x558a7d98cfee in zebra_nhg_free zebra/zebra_nhg.c:1772
>     FRRouting#3 0x558a7d9901ff in zebra_nhg_proto_add zebra/zebra_nhg.c:3861
>     FRRouting#4 0x558a7d9ba365 in process_subq_nhg zebra/zebra_rib.c:2738
>     FRRouting#5 0x558a7d9ba365 in process_subq zebra/zebra_rib.c:3344
>     FRRouting#6 0x558a7d9ba365 in meta_queue_process zebra/zebra_rib.c:3397
>     FRRouting#7 0x7fa262f16fef in work_queue_run lib/workqueue.c:282
>     FRRouting#8 0x7fa262ef863b in event_call lib/event.c:1996
>     FRRouting#9 0x7fa262e1e527 in frr_run lib/libfrr.c:1237
>     FRRouting#10 0x558a7d877c74 in main zebra/main.c:526
>     FRRouting#11 0x7fa262829d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>     FRRouting#12 0x7fa262829e3f in __libc_start_main_impl ../csu/libc-start.c:392
>     FRRouting#13 0x558a7d87ab84 in _start (/usr/lib/frr/zebra+0x1acb84)
>
> 0x60e0000de580 is located 96 bytes inside of 160-byte region [0x60e0000de520,0x60e0000de5c0)
> freed by thread T0 here:
>     #0 0x7fa2632b4537 in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:127
>     FRRouting#1 0x558a7d9908a1 in zebra_nhg_proto_add zebra/zebra_nhg.c:3854
>     FRRouting#2 0x558a7d9ba365 in process_subq_nhg zebra/zebra_rib.c:2738
>     FRRouting#3 0x558a7d9ba365 in process_subq zebra/zebra_rib.c:3344
>     FRRouting#4 0x558a7d9ba365 in meta_queue_process zebra/zebra_rib.c:3397
>     FRRouting#5 0x7fa262f16fef in work_queue_run lib/workqueue.c:282
>     FRRouting#6 0x7fa262ef863b in event_call lib/event.c:1996
>     FRRouting#7 0x7fa262e1e527 in frr_run lib/libfrr.c:1237
>     FRRouting#8 0x558a7d877c74 in main zebra/main.c:526
>     FRRouting#9 0x7fa262829d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>
> previously allocated by thread T0 here:
>     #0 0x7fa2632b4a57 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
>     FRRouting#1 0x7fa262e3e98e in qcalloc lib/memory.c:106
>     FRRouting#2 0x558a7d98849e in zebra_nhg_alloc zebra/zebra_nhg.c:392
>     FRRouting#3 0x558a7d98849e in zebra_nhe_copy zebra/zebra_nhg.c:499
>     FRRouting#4 0x558a7d98881f in zebra_nhg_hash_alloc zebra/zebra_nhg.c:538
>     FRRouting#5 0x7fa262dfbf0d in hash_get lib/hash.c:147
>     FRRouting#6 0x558a7d98b2ea in zebra_nhe_find zebra/zebra_nhg.c:832
>     FRRouting#7 0x558a7d98b95f in zebra_nhg_find zebra/zebra_nhg.c:1014
>     FRRouting#8 0x558a7d98bdcd in zebra_nhg_find_nexthop zebra/zebra_nhg.c:1031
>     FRRouting#9 0x558a7d98a5e8 in depends_find_recursive zebra/zebra_nhg.c:1514
>     FRRouting#10 0x558a7d98a5e8 in depends_find zebra/zebra_nhg.c:1563
>     FRRouting#11 0x558a7d98a5e8 in depends_find_add zebra/zebra_nhg.c:1602
>     FRRouting#12 0x558a7d990378 in zebra_nhg_update_nhe zebra/zebra_nhg.c:3739
>     FRRouting#13 0x558a7d990378 in zebra_nhg_proto_add zebra/zebra_nhg.c:3822
>     FRRouting#14 0x558a7d9ba365 in process_subq_nhg zebra/zebra_rib.c:2738
>     FRRouting#15 0x558a7d9ba365 in process_subq zebra/zebra_rib.c:3344
>     FRRouting#16 0x558a7d9ba365 in meta_queue_process zebra/zebra_rib.c:3397
>     FRRouting#17 0x7fa262f16fef in work_queue_run lib/workqueue.c:282
>     FRRouting#18 0x7fa262ef863b in event_call lib/event.c:1996
>     FRRouting#19 0x7fa262e1e527 in frr_run lib/libfrr.c:1237
>     FRRouting#20 0x558a7d877c74 in main zebra/main.c:526
>     FRRouting#21 0x7fa262829d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>
> SUMMARY: AddressSanitizer: heap-use-after-free zebra/zebra_nhg.c:1858 in zebra_nhg_decrement_ref
> Shadow bytes around the buggy address:
>   0x0c1c80013c60: fd fd fd fa fa fa fa fa fa fa fa fa fd fd fd fd
>   0x0c1c80013c70: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa
>   0x0c1c80013c80: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
>   0x0c1c80013c90: fd fd fd fd fd fd fd fd fd fd fd fa fa fa fa fa
>   0x0c1c80013ca0: fa fa fa fa fd fd fd fd fd fd fd fd fd fd fd fd
> =>0x0c1c80013cb0:[fd]fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa
>   0x0c1c80013cc0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
>   0x0c1c80013cd0: fd fd fd fa fa fa fa fa fa fa fa fa fd fd fd fd
>   0x0c1c80013ce0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa
>   0x0c1c80013cf0: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
>   0x0c1c80013d00: 00 00 00 00 00 00 00 00 00 00 00 fa fa fa fa fa
> Shadow byte legend (one shadow byte represents 8 application bytes):
>   Addressable:           00
>   Partially addressable: 01 02 03 04 05 06 07
>   Heap left redzone:       fa
>   Freed heap region:       fd
>   Stack left redzone:      f1
>   Stack mid redzone:       f2
>   Stack right redzone:     f3
>   Stack after return:      f5
>   Stack use after scope:   f8
>   Global redzone:          f9
>   Global init order:       f6
>   Poisoned by user:        f7
>   Container overflow:      fc
>   Array cookie:            ac
>   Intra object redzone:    bb
>   ASan internal:           fe
>   Left alloca redzone:     ca
>   Right alloca redzone:    cb
>   Shadow gap:              cc
>

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
louis-6wind added a commit to louis-6wind/frr that referenced this issue Oct 15, 2024
Fix a heap-after-free that causes zebra to crash even without
address-sanitizer. To reproduce:

> echo "100 my_table" | tee -a /etc/iproute2/rt_tables
> ip route add blackhole default table 100
> ip route show table 100
> ip l add red type vrf table 100
> ip l del red
> ip route del blackhole default table 100

Zebra manages routing tables for all existing Linux RT tables,
regardless of whether they are assigned to a VRF interface. When a table
is not assigned to any VRF, zebra arbitrarily assigns it to the default
VRF, even though this is not strictly accurate (the code expects this
behavior).

When an RT table is created after a VRF, zebra correctly assigns the
table to the VRF. However, if a VRF interface is assigned to an existing
RT table, zebra does not update the table owner, which remains as the
default VRF. As a result, existing routing entries remain under the
default VRF, while new entries are correctly assigned to the VRF. The
VRF mismatch is unexpected in the code and creates crashes and memory
related issues.

Furthermore, Linux does not automatically delete RT tables when they are
unassigned from a VRF. It is incorrect to delete these tables from zebra.

Instead, at VRF disabling, do not release the table but reassign it to
the default VRF. At VRF enabling, change the table owner back to the
appropriate VRF.

> ==2866266==ERROR: AddressSanitizer: heap-use-after-free on address 0x606000154f54 at pc 0x7fa32474b83f bp 0x7ffe94f67d90 sp 0x7ffe94f67d88
> READ of size 1 at 0x606000154f54 thread T0
>     #0 0x7fa32474b83e in rn_hash_node_const_find lib/table.c:28
>     #1 0x7fa32474bab1 in rn_hash_node_find lib/table.c:28
>     #2 0x7fa32474d783 in route_node_get lib/table.c:283
>     #3 0x7fa3247328dd in srcdest_rnode_get lib/srcdest_table.c:231
>     FRRouting#4 0x55b0e4fa8da4 in rib_find_rn_from_ctx zebra/zebra_rib.c:1957
>     FRRouting#5 0x55b0e4fa8e31 in rib_process_result zebra/zebra_rib.c:1988
>     FRRouting#6 0x55b0e4fb9d64 in rib_process_dplane_results zebra/zebra_rib.c:4894
>     FRRouting#7 0x7fa32476689c in event_call lib/event.c:1996
>     FRRouting#8 0x7fa32463b7b2 in frr_run lib/libfrr.c:1232
>     FRRouting#9 0x55b0e4e6c32a in main zebra/main.c:526
>     FRRouting#10 0x7fa32424fd09 in __libc_start_main ../csu/libc-start.c:308
>     FRRouting#11 0x55b0e4e2d649 in _start (/usr/lib/frr/zebra+0x1a1649)
>
> 0x606000154f54 is located 20 bytes inside of 56-byte region [0x606000154f40,0x606000154f78)
> freed by thread T0 here:
>     #0 0x7fa324ca9b6f in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:123
>     #1 0x7fa324668d8f in qfree lib/memory.c:130
>     #2 0x7fa32474c421 in route_table_free lib/table.c:126
>     #3 0x7fa32474bf96 in route_table_finish lib/table.c:46
>     FRRouting#4 0x55b0e4fbca3a in zebra_router_free_table zebra/zebra_router.c:191
>     FRRouting#5 0x55b0e4fbccea in zebra_router_release_table zebra/zebra_router.c:214
>     FRRouting#6 0x55b0e4fd428e in zebra_vrf_disable zebra/zebra_vrf.c:219
>     FRRouting#7 0x7fa32476fabf in vrf_disable lib/vrf.c:326
>     FRRouting#8 0x7fa32476f5d4 in vrf_delete lib/vrf.c:231
>     FRRouting#9 0x55b0e4e4ad36 in interface_vrf_change zebra/interface.c:1478
>     FRRouting#10 0x55b0e4e4d5d2 in zebra_if_dplane_ifp_handling zebra/interface.c:1949
>     FRRouting#11 0x55b0e4e4fb89 in zebra_if_dplane_result zebra/interface.c:2268
>     FRRouting#12 0x55b0e4fb9f26 in rib_process_dplane_results zebra/zebra_rib.c:4954
>     FRRouting#13 0x7fa32476689c in event_call lib/event.c:1996
>     FRRouting#14 0x7fa32463b7b2 in frr_run lib/libfrr.c:1232
>     FRRouting#15 0x55b0e4e6c32a in main zebra/main.c:526
>     FRRouting#16 0x7fa32424fd09 in __libc_start_main ../csu/libc-start.c:308
>
> previously allocated by thread T0 here:
>     #0 0x7fa324caa037 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
>     #1 0x7fa324668c4d in qcalloc lib/memory.c:105
>     #2 0x7fa32474bf33 in route_table_init_with_delegate lib/table.c:38
>     #3 0x7fa32474e73c in route_table_init lib/table.c:512
>     FRRouting#4 0x55b0e4fbc353 in zebra_router_get_table zebra/zebra_router.c:137
>     FRRouting#5 0x55b0e4fd4da0 in zebra_vrf_table_create zebra/zebra_vrf.c:358
>     FRRouting#6 0x55b0e4fd3d30 in zebra_vrf_enable zebra/zebra_vrf.c:140
>     FRRouting#7 0x7fa32476f9b2 in vrf_enable lib/vrf.c:286
>     FRRouting#8 0x55b0e4e4af76 in interface_vrf_change zebra/interface.c:1533
>     FRRouting#9 0x55b0e4e4d612 in zebra_if_dplane_ifp_handling zebra/interface.c:1968
>     FRRouting#10 0x55b0e4e4fb89 in zebra_if_dplane_result zebra/interface.c:2268
>     FRRouting#11 0x55b0e4fb9f26 in rib_process_dplane_results zebra/zebra_rib.c:4954
>     FRRouting#12 0x7fa32476689c in event_call lib/event.c:1996
>     FRRouting#13 0x7fa32463b7b2 in frr_run lib/libfrr.c:1232
>     FRRouting#14 0x55b0e4e6c32a in main zebra/main.c:526
>     FRRouting#15 0x7fa32424fd09 in __libc_start_main ../csu/libc-start.c:308

Fixes: d8612e6 ("zebra: Track tables allocated by vrf and cleanup")
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
louis-6wind pushed a commit to louis-6wind/frr that referenced this issue Oct 15, 2024
The following ASAN issue has been observed:

> ERROR: AddressSanitizer: heap-use-after-free on address 0x6160000acba4 at pc 0x55910c5694d0 bp 0x7ffe3a8ac850 sp 0x7ffe3a8ac840
> READ of size 4 at 0x6160000acba4 thread T0
>         #0 0x55910c5694cf in ctx_info_from_zns zebra/zebra_dplane.c:3315
>     #1 0x55910c569696 in dplane_ctx_ns_init zebra/zebra_dplane.c:3331
>     #2 0x55910c56bf61 in dplane_ctx_nexthop_init zebra/zebra_dplane.c:3680
>     #3 0x55910c5711ca in dplane_nexthop_update_internal zebra/zebra_dplane.c:4490
>     FRRouting#4 0x55910c571c5c in dplane_nexthop_delete zebra/zebra_dplane.c:4717
>     FRRouting#5 0x55910c61e90e in zebra_nhg_uninstall_kernel zebra/zebra_nhg.c:3413
>     FRRouting#6 0x55910c615d8a in zebra_nhg_decrement_ref zebra/zebra_nhg.c:1919
>     FRRouting#7 0x55910c6404db in route_entry_update_nhe zebra/zebra_rib.c:454
>     FRRouting#8 0x55910c64c904 in rib_re_nhg_free zebra/zebra_rib.c:2822
>     FRRouting#9 0x55910c655be2 in rib_unlink zebra/zebra_rib.c:4212
>     FRRouting#10 0x55910c6430f9 in zebra_rtable_node_cleanup zebra/zebra_rib.c:968
>     FRRouting#11 0x7f26f275b8a9 in route_node_free lib/table.c:75
>     FRRouting#12 0x7f26f275bae4 in route_table_free lib/table.c:111
>     FRRouting#13 0x7f26f275b749 in route_table_finish lib/table.c:46
>     FRRouting#14 0x55910c65db17 in zebra_router_free_table zebra/zebra_router.c:191
>     FRRouting#15 0x55910c65dfb5 in zebra_router_terminate zebra/zebra_router.c:244
>     FRRouting#16 0x55910c4f40db in zebra_finalize zebra/main.c:249
>     FRRouting#17 0x7f26f2777108 in event_call lib/event.c:2011
>     FRRouting#18 0x7f26f264180e in frr_run lib/libfrr.c:1212
>     FRRouting#19 0x55910c4f49cb in main zebra/main.c:531
>     FRRouting#20 0x7f26f2029d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>     FRRouting#21 0x7f26f2029e3f in __libc_start_main_impl ../csu/libc-start.c:392
>     FRRouting#22 0x55910c4b0114 in _start (/usr/lib/frr/zebra+0x1ae114)

It happens with FRR using the kernel. During shutdown, the
namespace identifier is attempted to be obtained by zebra, in an
attempt to prepare zebra dataplane nexthop messages.

Fix this by accessing the ns structure.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
louis-6wind added a commit to louis-6wind/frr that referenced this issue Oct 16, 2024
Fix a heap-after-free that causes zebra to crash even without
address-sanitizer. To reproduce:

> echo "100 my_table" | tee -a /etc/iproute2/rt_tables
> ip route add blackhole default table 100
> ip route show table 100
> ip l add red type vrf table 100
> ip l del red
> ip route del blackhole default table 100

Zebra manages routing tables for all existing Linux RT tables,
regardless of whether they are assigned to a VRF interface. When a table
is not assigned to any VRF, zebra arbitrarily assigns it to the default
VRF, even though this is not strictly accurate (the code expects this
behavior).

When an RT table is created after a VRF, zebra correctly assigns the
table to the VRF. However, if a VRF interface is assigned to an existing
RT table, zebra does not update the table owner, which remains as the
default VRF. As a result, existing routing entries remain under the
default VRF, while new entries are correctly assigned to the VRF. The
VRF mismatch is unexpected in the code and creates crashes and memory
related issues.

Furthermore, Linux does not automatically delete RT tables when they are
unassigned from a VRF. It is incorrect to delete these tables from zebra.

Instead, at VRF disabling, do not release the table but reassign it to
the default VRF. At VRF enabling, change the table owner back to the
appropriate VRF.

> ==2866266==ERROR: AddressSanitizer: heap-use-after-free on address 0x606000154f54 at pc 0x7fa32474b83f bp 0x7ffe94f67d90 sp 0x7ffe94f67d88
> READ of size 1 at 0x606000154f54 thread T0
>     #0 0x7fa32474b83e in rn_hash_node_const_find lib/table.c:28
>     #1 0x7fa32474bab1 in rn_hash_node_find lib/table.c:28
>     #2 0x7fa32474d783 in route_node_get lib/table.c:283
>     #3 0x7fa3247328dd in srcdest_rnode_get lib/srcdest_table.c:231
>     FRRouting#4 0x55b0e4fa8da4 in rib_find_rn_from_ctx zebra/zebra_rib.c:1957
>     FRRouting#5 0x55b0e4fa8e31 in rib_process_result zebra/zebra_rib.c:1988
>     FRRouting#6 0x55b0e4fb9d64 in rib_process_dplane_results zebra/zebra_rib.c:4894
>     FRRouting#7 0x7fa32476689c in event_call lib/event.c:1996
>     FRRouting#8 0x7fa32463b7b2 in frr_run lib/libfrr.c:1232
>     FRRouting#9 0x55b0e4e6c32a in main zebra/main.c:526
>     FRRouting#10 0x7fa32424fd09 in __libc_start_main ../csu/libc-start.c:308
>     FRRouting#11 0x55b0e4e2d649 in _start (/usr/lib/frr/zebra+0x1a1649)
>
> 0x606000154f54 is located 20 bytes inside of 56-byte region [0x606000154f40,0x606000154f78)
> freed by thread T0 here:
>     #0 0x7fa324ca9b6f in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:123
>     #1 0x7fa324668d8f in qfree lib/memory.c:130
>     #2 0x7fa32474c421 in route_table_free lib/table.c:126
>     #3 0x7fa32474bf96 in route_table_finish lib/table.c:46
>     FRRouting#4 0x55b0e4fbca3a in zebra_router_free_table zebra/zebra_router.c:191
>     FRRouting#5 0x55b0e4fbccea in zebra_router_release_table zebra/zebra_router.c:214
>     FRRouting#6 0x55b0e4fd428e in zebra_vrf_disable zebra/zebra_vrf.c:219
>     FRRouting#7 0x7fa32476fabf in vrf_disable lib/vrf.c:326
>     FRRouting#8 0x7fa32476f5d4 in vrf_delete lib/vrf.c:231
>     FRRouting#9 0x55b0e4e4ad36 in interface_vrf_change zebra/interface.c:1478
>     FRRouting#10 0x55b0e4e4d5d2 in zebra_if_dplane_ifp_handling zebra/interface.c:1949
>     FRRouting#11 0x55b0e4e4fb89 in zebra_if_dplane_result zebra/interface.c:2268
>     FRRouting#12 0x55b0e4fb9f26 in rib_process_dplane_results zebra/zebra_rib.c:4954
>     FRRouting#13 0x7fa32476689c in event_call lib/event.c:1996
>     FRRouting#14 0x7fa32463b7b2 in frr_run lib/libfrr.c:1232
>     FRRouting#15 0x55b0e4e6c32a in main zebra/main.c:526
>     FRRouting#16 0x7fa32424fd09 in __libc_start_main ../csu/libc-start.c:308
>
> previously allocated by thread T0 here:
>     #0 0x7fa324caa037 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
>     #1 0x7fa324668c4d in qcalloc lib/memory.c:105
>     #2 0x7fa32474bf33 in route_table_init_with_delegate lib/table.c:38
>     #3 0x7fa32474e73c in route_table_init lib/table.c:512
>     FRRouting#4 0x55b0e4fbc353 in zebra_router_get_table zebra/zebra_router.c:137
>     FRRouting#5 0x55b0e4fd4da0 in zebra_vrf_table_create zebra/zebra_vrf.c:358
>     FRRouting#6 0x55b0e4fd3d30 in zebra_vrf_enable zebra/zebra_vrf.c:140
>     FRRouting#7 0x7fa32476f9b2 in vrf_enable lib/vrf.c:286
>     FRRouting#8 0x55b0e4e4af76 in interface_vrf_change zebra/interface.c:1533
>     FRRouting#9 0x55b0e4e4d612 in zebra_if_dplane_ifp_handling zebra/interface.c:1968
>     FRRouting#10 0x55b0e4e4fb89 in zebra_if_dplane_result zebra/interface.c:2268
>     FRRouting#11 0x55b0e4fb9f26 in rib_process_dplane_results zebra/zebra_rib.c:4954
>     FRRouting#12 0x7fa32476689c in event_call lib/event.c:1996
>     FRRouting#13 0x7fa32463b7b2 in frr_run lib/libfrr.c:1232
>     FRRouting#14 0x55b0e4e6c32a in main zebra/main.c:526
>     FRRouting#15 0x7fa32424fd09 in __libc_start_main ../csu/libc-start.c:308

Fixes: d8612e6 ("zebra: Track tables allocated by vrf and cleanup")
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
louis-6wind pushed a commit to louis-6wind/frr that referenced this issue Oct 16, 2024
The following ASAN issue has been observed:

> ERROR: AddressSanitizer: heap-use-after-free on address 0x6160000acba4 at pc 0x55910c5694d0 bp 0x7ffe3a8ac850 sp 0x7ffe3a8ac840
> READ of size 4 at 0x6160000acba4 thread T0
>         #0 0x55910c5694cf in ctx_info_from_zns zebra/zebra_dplane.c:3315
>     #1 0x55910c569696 in dplane_ctx_ns_init zebra/zebra_dplane.c:3331
>     #2 0x55910c56bf61 in dplane_ctx_nexthop_init zebra/zebra_dplane.c:3680
>     #3 0x55910c5711ca in dplane_nexthop_update_internal zebra/zebra_dplane.c:4490
>     FRRouting#4 0x55910c571c5c in dplane_nexthop_delete zebra/zebra_dplane.c:4717
>     FRRouting#5 0x55910c61e90e in zebra_nhg_uninstall_kernel zebra/zebra_nhg.c:3413
>     FRRouting#6 0x55910c615d8a in zebra_nhg_decrement_ref zebra/zebra_nhg.c:1919
>     FRRouting#7 0x55910c6404db in route_entry_update_nhe zebra/zebra_rib.c:454
>     FRRouting#8 0x55910c64c904 in rib_re_nhg_free zebra/zebra_rib.c:2822
>     FRRouting#9 0x55910c655be2 in rib_unlink zebra/zebra_rib.c:4212
>     FRRouting#10 0x55910c6430f9 in zebra_rtable_node_cleanup zebra/zebra_rib.c:968
>     FRRouting#11 0x7f26f275b8a9 in route_node_free lib/table.c:75
>     FRRouting#12 0x7f26f275bae4 in route_table_free lib/table.c:111
>     FRRouting#13 0x7f26f275b749 in route_table_finish lib/table.c:46
>     FRRouting#14 0x55910c65db17 in zebra_router_free_table zebra/zebra_router.c:191
>     FRRouting#15 0x55910c65dfb5 in zebra_router_terminate zebra/zebra_router.c:244
>     FRRouting#16 0x55910c4f40db in zebra_finalize zebra/main.c:249
>     FRRouting#17 0x7f26f2777108 in event_call lib/event.c:2011
>     FRRouting#18 0x7f26f264180e in frr_run lib/libfrr.c:1212
>     FRRouting#19 0x55910c4f49cb in main zebra/main.c:531
>     FRRouting#20 0x7f26f2029d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>     FRRouting#21 0x7f26f2029e3f in __libc_start_main_impl ../csu/libc-start.c:392
>     FRRouting#22 0x55910c4b0114 in _start (/usr/lib/frr/zebra+0x1ae114)

It happens with FRR using the kernel. During shutdown, the
namespace identifier is attempted to be obtained by zebra, in an
attempt to prepare zebra dataplane nexthop messages.

Fix this by accessing the ns structure.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
mergify bot pushed a commit that referenced this issue Oct 16, 2024
The following ASAN issue has been observed:

> ERROR: AddressSanitizer: heap-use-after-free on address 0x6160000acba4 at pc 0x55910c5694d0 bp 0x7ffe3a8ac850 sp 0x7ffe3a8ac840
> READ of size 4 at 0x6160000acba4 thread T0
>         #0 0x55910c5694cf in ctx_info_from_zns zebra/zebra_dplane.c:3315
>     #1 0x55910c569696 in dplane_ctx_ns_init zebra/zebra_dplane.c:3331
>     #2 0x55910c56bf61 in dplane_ctx_nexthop_init zebra/zebra_dplane.c:3680
>     #3 0x55910c5711ca in dplane_nexthop_update_internal zebra/zebra_dplane.c:4490
>     #4 0x55910c571c5c in dplane_nexthop_delete zebra/zebra_dplane.c:4717
>     #5 0x55910c61e90e in zebra_nhg_uninstall_kernel zebra/zebra_nhg.c:3413
>     #6 0x55910c615d8a in zebra_nhg_decrement_ref zebra/zebra_nhg.c:1919
>     #7 0x55910c6404db in route_entry_update_nhe zebra/zebra_rib.c:454
>     #8 0x55910c64c904 in rib_re_nhg_free zebra/zebra_rib.c:2822
>     #9 0x55910c655be2 in rib_unlink zebra/zebra_rib.c:4212
>     #10 0x55910c6430f9 in zebra_rtable_node_cleanup zebra/zebra_rib.c:968
>     #11 0x7f26f275b8a9 in route_node_free lib/table.c:75
>     #12 0x7f26f275bae4 in route_table_free lib/table.c:111
>     #13 0x7f26f275b749 in route_table_finish lib/table.c:46
>     #14 0x55910c65db17 in zebra_router_free_table zebra/zebra_router.c:191
>     #15 0x55910c65dfb5 in zebra_router_terminate zebra/zebra_router.c:244
>     #16 0x55910c4f40db in zebra_finalize zebra/main.c:249
>     #17 0x7f26f2777108 in event_call lib/event.c:2011
>     #18 0x7f26f264180e in frr_run lib/libfrr.c:1212
>     #19 0x55910c4f49cb in main zebra/main.c:531
>     #20 0x7f26f2029d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>     #21 0x7f26f2029e3f in __libc_start_main_impl ../csu/libc-start.c:392
>     #22 0x55910c4b0114 in _start (/usr/lib/frr/zebra+0x1ae114)

It happens with FRR using the kernel. During shutdown, the
namespace identifier is attempted to be obtained by zebra, in an
attempt to prepare zebra dataplane nexthop messages.

Fix this by accessing the ns structure.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
(cherry picked from commit 7ae70eb)
mergify bot pushed a commit that referenced this issue Oct 16, 2024
The following ASAN issue has been observed:

> ERROR: AddressSanitizer: heap-use-after-free on address 0x6160000acba4 at pc 0x55910c5694d0 bp 0x7ffe3a8ac850 sp 0x7ffe3a8ac840
> READ of size 4 at 0x6160000acba4 thread T0
>         #0 0x55910c5694cf in ctx_info_from_zns zebra/zebra_dplane.c:3315
>     #1 0x55910c569696 in dplane_ctx_ns_init zebra/zebra_dplane.c:3331
>     #2 0x55910c56bf61 in dplane_ctx_nexthop_init zebra/zebra_dplane.c:3680
>     #3 0x55910c5711ca in dplane_nexthop_update_internal zebra/zebra_dplane.c:4490
>     #4 0x55910c571c5c in dplane_nexthop_delete zebra/zebra_dplane.c:4717
>     #5 0x55910c61e90e in zebra_nhg_uninstall_kernel zebra/zebra_nhg.c:3413
>     #6 0x55910c615d8a in zebra_nhg_decrement_ref zebra/zebra_nhg.c:1919
>     #7 0x55910c6404db in route_entry_update_nhe zebra/zebra_rib.c:454
>     #8 0x55910c64c904 in rib_re_nhg_free zebra/zebra_rib.c:2822
>     #9 0x55910c655be2 in rib_unlink zebra/zebra_rib.c:4212
>     #10 0x55910c6430f9 in zebra_rtable_node_cleanup zebra/zebra_rib.c:968
>     #11 0x7f26f275b8a9 in route_node_free lib/table.c:75
>     #12 0x7f26f275bae4 in route_table_free lib/table.c:111
>     #13 0x7f26f275b749 in route_table_finish lib/table.c:46
>     #14 0x55910c65db17 in zebra_router_free_table zebra/zebra_router.c:191
>     #15 0x55910c65dfb5 in zebra_router_terminate zebra/zebra_router.c:244
>     #16 0x55910c4f40db in zebra_finalize zebra/main.c:249
>     #17 0x7f26f2777108 in event_call lib/event.c:2011
>     #18 0x7f26f264180e in frr_run lib/libfrr.c:1212
>     #19 0x55910c4f49cb in main zebra/main.c:531
>     #20 0x7f26f2029d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>     #21 0x7f26f2029e3f in __libc_start_main_impl ../csu/libc-start.c:392
>     #22 0x55910c4b0114 in _start (/usr/lib/frr/zebra+0x1ae114)

It happens with FRR using the kernel. During shutdown, the
namespace identifier is attempted to be obtained by zebra, in an
attempt to prepare zebra dataplane nexthop messages.

Fix this by accessing the ns structure.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
(cherry picked from commit 7ae70eb)

# Conflicts:
#	zebra/main.c
#	zebra/zebra_ns.h
mergify bot pushed a commit that referenced this issue Oct 16, 2024
The following ASAN issue has been observed:

> ERROR: AddressSanitizer: heap-use-after-free on address 0x6160000acba4 at pc 0x55910c5694d0 bp 0x7ffe3a8ac850 sp 0x7ffe3a8ac840
> READ of size 4 at 0x6160000acba4 thread T0
>         #0 0x55910c5694cf in ctx_info_from_zns zebra/zebra_dplane.c:3315
>     #1 0x55910c569696 in dplane_ctx_ns_init zebra/zebra_dplane.c:3331
>     #2 0x55910c56bf61 in dplane_ctx_nexthop_init zebra/zebra_dplane.c:3680
>     #3 0x55910c5711ca in dplane_nexthop_update_internal zebra/zebra_dplane.c:4490
>     #4 0x55910c571c5c in dplane_nexthop_delete zebra/zebra_dplane.c:4717
>     #5 0x55910c61e90e in zebra_nhg_uninstall_kernel zebra/zebra_nhg.c:3413
>     #6 0x55910c615d8a in zebra_nhg_decrement_ref zebra/zebra_nhg.c:1919
>     #7 0x55910c6404db in route_entry_update_nhe zebra/zebra_rib.c:454
>     #8 0x55910c64c904 in rib_re_nhg_free zebra/zebra_rib.c:2822
>     #9 0x55910c655be2 in rib_unlink zebra/zebra_rib.c:4212
>     #10 0x55910c6430f9 in zebra_rtable_node_cleanup zebra/zebra_rib.c:968
>     #11 0x7f26f275b8a9 in route_node_free lib/table.c:75
>     #12 0x7f26f275bae4 in route_table_free lib/table.c:111
>     #13 0x7f26f275b749 in route_table_finish lib/table.c:46
>     #14 0x55910c65db17 in zebra_router_free_table zebra/zebra_router.c:191
>     #15 0x55910c65dfb5 in zebra_router_terminate zebra/zebra_router.c:244
>     #16 0x55910c4f40db in zebra_finalize zebra/main.c:249
>     #17 0x7f26f2777108 in event_call lib/event.c:2011
>     #18 0x7f26f264180e in frr_run lib/libfrr.c:1212
>     #19 0x55910c4f49cb in main zebra/main.c:531
>     #20 0x7f26f2029d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>     #21 0x7f26f2029e3f in __libc_start_main_impl ../csu/libc-start.c:392
>     #22 0x55910c4b0114 in _start (/usr/lib/frr/zebra+0x1ae114)

It happens with FRR using the kernel. During shutdown, the
namespace identifier is attempted to be obtained by zebra, in an
attempt to prepare zebra dataplane nexthop messages.

Fix this by accessing the ns structure.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
(cherry picked from commit 7ae70eb)
mergify bot pushed a commit that referenced this issue Oct 16, 2024
The following ASAN issue has been observed:

> ERROR: AddressSanitizer: heap-use-after-free on address 0x6160000acba4 at pc 0x55910c5694d0 bp 0x7ffe3a8ac850 sp 0x7ffe3a8ac840
> READ of size 4 at 0x6160000acba4 thread T0
>         #0 0x55910c5694cf in ctx_info_from_zns zebra/zebra_dplane.c:3315
>     #1 0x55910c569696 in dplane_ctx_ns_init zebra/zebra_dplane.c:3331
>     #2 0x55910c56bf61 in dplane_ctx_nexthop_init zebra/zebra_dplane.c:3680
>     #3 0x55910c5711ca in dplane_nexthop_update_internal zebra/zebra_dplane.c:4490
>     #4 0x55910c571c5c in dplane_nexthop_delete zebra/zebra_dplane.c:4717
>     #5 0x55910c61e90e in zebra_nhg_uninstall_kernel zebra/zebra_nhg.c:3413
>     #6 0x55910c615d8a in zebra_nhg_decrement_ref zebra/zebra_nhg.c:1919
>     #7 0x55910c6404db in route_entry_update_nhe zebra/zebra_rib.c:454
>     #8 0x55910c64c904 in rib_re_nhg_free zebra/zebra_rib.c:2822
>     #9 0x55910c655be2 in rib_unlink zebra/zebra_rib.c:4212
>     #10 0x55910c6430f9 in zebra_rtable_node_cleanup zebra/zebra_rib.c:968
>     #11 0x7f26f275b8a9 in route_node_free lib/table.c:75
>     #12 0x7f26f275bae4 in route_table_free lib/table.c:111
>     #13 0x7f26f275b749 in route_table_finish lib/table.c:46
>     #14 0x55910c65db17 in zebra_router_free_table zebra/zebra_router.c:191
>     #15 0x55910c65dfb5 in zebra_router_terminate zebra/zebra_router.c:244
>     #16 0x55910c4f40db in zebra_finalize zebra/main.c:249
>     #17 0x7f26f2777108 in event_call lib/event.c:2011
>     #18 0x7f26f264180e in frr_run lib/libfrr.c:1212
>     #19 0x55910c4f49cb in main zebra/main.c:531
>     #20 0x7f26f2029d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>     #21 0x7f26f2029e3f in __libc_start_main_impl ../csu/libc-start.c:392
>     #22 0x55910c4b0114 in _start (/usr/lib/frr/zebra+0x1ae114)

It happens with FRR using the kernel. During shutdown, the
namespace identifier is attempted to be obtained by zebra, in an
attempt to prepare zebra dataplane nexthop messages.

Fix this by accessing the ns structure.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
(cherry picked from commit 7ae70eb)

# Conflicts:
#	zebra/main.c
#	zebra/zebra_ns.h
louis-6wind added a commit to louis-6wind/frr that referenced this issue Oct 16, 2024
Fix a heap-after-free that causes zebra to crash even without
address-sanitizer. To reproduce:

> echo "100 my_table" | tee -a /etc/iproute2/rt_tables
> ip route add blackhole default table 100
> ip route show table 100
> ip l add red type vrf table 100
> ip l del red
> ip route del blackhole default table 100

Zebra manages routing tables for all existing Linux RT tables,
regardless of whether they are assigned to a VRF interface. When a table
is not assigned to any VRF, zebra arbitrarily assigns it to the default
VRF, even though this is not strictly accurate (the code expects this
behavior).

When an RT table is created after a VRF, zebra correctly assigns the
table to the VRF. However, if a VRF interface is assigned to an existing
RT table, zebra does not update the table owner, which remains as the
default VRF. As a result, existing routing entries remain under the
default VRF, while new entries are correctly assigned to the VRF. The
VRF mismatch is unexpected in the code and creates crashes and memory
related issues.

Furthermore, Linux does not automatically delete RT tables when they are
unassigned from a VRF. It is incorrect to delete these tables from zebra.

Instead, at VRF disabling, do not release the table but reassign it to
the default VRF. At VRF enabling, change the table owner back to the
appropriate VRF.

> ==2866266==ERROR: AddressSanitizer: heap-use-after-free on address 0x606000154f54 at pc 0x7fa32474b83f bp 0x7ffe94f67d90 sp 0x7ffe94f67d88
> READ of size 1 at 0x606000154f54 thread T0
>     #0 0x7fa32474b83e in rn_hash_node_const_find lib/table.c:28
>     #1 0x7fa32474bab1 in rn_hash_node_find lib/table.c:28
>     #2 0x7fa32474d783 in route_node_get lib/table.c:283
>     #3 0x7fa3247328dd in srcdest_rnode_get lib/srcdest_table.c:231
>     FRRouting#4 0x55b0e4fa8da4 in rib_find_rn_from_ctx zebra/zebra_rib.c:1957
>     FRRouting#5 0x55b0e4fa8e31 in rib_process_result zebra/zebra_rib.c:1988
>     FRRouting#6 0x55b0e4fb9d64 in rib_process_dplane_results zebra/zebra_rib.c:4894
>     FRRouting#7 0x7fa32476689c in event_call lib/event.c:1996
>     FRRouting#8 0x7fa32463b7b2 in frr_run lib/libfrr.c:1232
>     FRRouting#9 0x55b0e4e6c32a in main zebra/main.c:526
>     FRRouting#10 0x7fa32424fd09 in __libc_start_main ../csu/libc-start.c:308
>     FRRouting#11 0x55b0e4e2d649 in _start (/usr/lib/frr/zebra+0x1a1649)
>
> 0x606000154f54 is located 20 bytes inside of 56-byte region [0x606000154f40,0x606000154f78)
> freed by thread T0 here:
>     #0 0x7fa324ca9b6f in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:123
>     #1 0x7fa324668d8f in qfree lib/memory.c:130
>     #2 0x7fa32474c421 in route_table_free lib/table.c:126
>     #3 0x7fa32474bf96 in route_table_finish lib/table.c:46
>     FRRouting#4 0x55b0e4fbca3a in zebra_router_free_table zebra/zebra_router.c:191
>     FRRouting#5 0x55b0e4fbccea in zebra_router_release_table zebra/zebra_router.c:214
>     FRRouting#6 0x55b0e4fd428e in zebra_vrf_disable zebra/zebra_vrf.c:219
>     FRRouting#7 0x7fa32476fabf in vrf_disable lib/vrf.c:326
>     FRRouting#8 0x7fa32476f5d4 in vrf_delete lib/vrf.c:231
>     FRRouting#9 0x55b0e4e4ad36 in interface_vrf_change zebra/interface.c:1478
>     FRRouting#10 0x55b0e4e4d5d2 in zebra_if_dplane_ifp_handling zebra/interface.c:1949
>     FRRouting#11 0x55b0e4e4fb89 in zebra_if_dplane_result zebra/interface.c:2268
>     FRRouting#12 0x55b0e4fb9f26 in rib_process_dplane_results zebra/zebra_rib.c:4954
>     FRRouting#13 0x7fa32476689c in event_call lib/event.c:1996
>     FRRouting#14 0x7fa32463b7b2 in frr_run lib/libfrr.c:1232
>     FRRouting#15 0x55b0e4e6c32a in main zebra/main.c:526
>     FRRouting#16 0x7fa32424fd09 in __libc_start_main ../csu/libc-start.c:308
>
> previously allocated by thread T0 here:
>     #0 0x7fa324caa037 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
>     #1 0x7fa324668c4d in qcalloc lib/memory.c:105
>     #2 0x7fa32474bf33 in route_table_init_with_delegate lib/table.c:38
>     #3 0x7fa32474e73c in route_table_init lib/table.c:512
>     FRRouting#4 0x55b0e4fbc353 in zebra_router_get_table zebra/zebra_router.c:137
>     FRRouting#5 0x55b0e4fd4da0 in zebra_vrf_table_create zebra/zebra_vrf.c:358
>     FRRouting#6 0x55b0e4fd3d30 in zebra_vrf_enable zebra/zebra_vrf.c:140
>     FRRouting#7 0x7fa32476f9b2 in vrf_enable lib/vrf.c:286
>     FRRouting#8 0x55b0e4e4af76 in interface_vrf_change zebra/interface.c:1533
>     FRRouting#9 0x55b0e4e4d612 in zebra_if_dplane_ifp_handling zebra/interface.c:1968
>     FRRouting#10 0x55b0e4e4fb89 in zebra_if_dplane_result zebra/interface.c:2268
>     FRRouting#11 0x55b0e4fb9f26 in rib_process_dplane_results zebra/zebra_rib.c:4954
>     FRRouting#12 0x7fa32476689c in event_call lib/event.c:1996
>     FRRouting#13 0x7fa32463b7b2 in frr_run lib/libfrr.c:1232
>     FRRouting#14 0x55b0e4e6c32a in main zebra/main.c:526
>     FRRouting#15 0x7fa32424fd09 in __libc_start_main ../csu/libc-start.c:308

Fixes: d8612e6 ("zebra: Track tables allocated by vrf and cleanup")
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
mergify bot pushed a commit that referenced this issue Oct 16, 2024
The following ASAN issue has been observed:

> ERROR: AddressSanitizer: heap-use-after-free on address 0x6160000acba4 at pc 0x55910c5694d0 bp 0x7ffe3a8ac850 sp 0x7ffe3a8ac840
> READ of size 4 at 0x6160000acba4 thread T0
>         #0 0x55910c5694cf in ctx_info_from_zns zebra/zebra_dplane.c:3315
>     #1 0x55910c569696 in dplane_ctx_ns_init zebra/zebra_dplane.c:3331
>     #2 0x55910c56bf61 in dplane_ctx_nexthop_init zebra/zebra_dplane.c:3680
>     #3 0x55910c5711ca in dplane_nexthop_update_internal zebra/zebra_dplane.c:4490
>     #4 0x55910c571c5c in dplane_nexthop_delete zebra/zebra_dplane.c:4717
>     #5 0x55910c61e90e in zebra_nhg_uninstall_kernel zebra/zebra_nhg.c:3413
>     #6 0x55910c615d8a in zebra_nhg_decrement_ref zebra/zebra_nhg.c:1919
>     #7 0x55910c6404db in route_entry_update_nhe zebra/zebra_rib.c:454
>     #8 0x55910c64c904 in rib_re_nhg_free zebra/zebra_rib.c:2822
>     #9 0x55910c655be2 in rib_unlink zebra/zebra_rib.c:4212
>     #10 0x55910c6430f9 in zebra_rtable_node_cleanup zebra/zebra_rib.c:968
>     #11 0x7f26f275b8a9 in route_node_free lib/table.c:75
>     #12 0x7f26f275bae4 in route_table_free lib/table.c:111
>     #13 0x7f26f275b749 in route_table_finish lib/table.c:46
>     #14 0x55910c65db17 in zebra_router_free_table zebra/zebra_router.c:191
>     #15 0x55910c65dfb5 in zebra_router_terminate zebra/zebra_router.c:244
>     #16 0x55910c4f40db in zebra_finalize zebra/main.c:249
>     #17 0x7f26f2777108 in event_call lib/event.c:2011
>     #18 0x7f26f264180e in frr_run lib/libfrr.c:1212
>     #19 0x55910c4f49cb in main zebra/main.c:531
>     #20 0x7f26f2029d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>     #21 0x7f26f2029e3f in __libc_start_main_impl ../csu/libc-start.c:392
>     #22 0x55910c4b0114 in _start (/usr/lib/frr/zebra+0x1ae114)

It happens with FRR using the kernel. During shutdown, the
namespace identifier is attempted to be obtained by zebra, in an
attempt to prepare zebra dataplane nexthop messages.

Fix this by accessing the ns structure.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
(cherry picked from commit 7ae70eb)
pguibert6WIND added a commit to pguibert6WIND/frr that referenced this issue Oct 21, 2024
There is no control on the returned nexthop group entry, when
finding pic contexts. Actually the pic context can resolve
over itself, and this may lead to stack overflow:

The below can be found by generalizing the search of pic nhe
for all nexthops and not only for srv6 contexts.

> root@ubuntu2204hwe:~/frr# AddressSanitizer:DEADLYSIGNAL
> =================================================================
> ==247856==ERROR: AddressSanitizer: stack-overflow on address 0x7ffe4e6dcff8 (pc 0x561e05bb5653 bp 0x7ffe4e6dd020 sp 0x7ffe4e6dd000 T0)
>     #0 0x561e05bb5653 in zebra_nhg_install_kernel zebra/zebra_nhg.c:3310
>     FRRouting#1 0x561e05bb572d in zebra_nhg_install_kernel zebra/zebra_nhg.c:3329
>     FRRouting#2 0x561e05bb572d in zebra_nhg_install_kernel zebra/zebra_nhg.c:3329
>     FRRouting#3 0x561e05bb572d in zebra_nhg_install_kernel zebra/zebra_nhg.c:3329
>     FRRouting#4 0x561e05bb572d in zebra_nhg_install_kernel zebra/zebra_nhg.c:3329
>     FRRouting#5 0x561e05bb572d in zebra_nhg_install_kernel zebra/zebra_nhg.c:3329
>     FRRouting#6 0x561e05bb572d in zebra_nhg_install_kernel zebra/zebra_nhg.c:3329
>     FRRouting#7 0x561e05bb572d in zebra_nhg_install_kernel zebra/zebra_nhg.c:3329
>     FRRouting#8 0x561e05bb572d in zebra_nhg_install_kernel zebra/zebra_nhg.c:3329
>     FRRouting#9 0x561e05bb572d in zebra_nhg_install_kernel zebra/zebra_nhg.c:3329
>     FRRouting#10 0x561e05bb572d in zebra_nhg_install_kernel zebra/zebra_nhg.c:3329
>     FRRouting#11 0x561e05bb572d in zebra_nhg_install_kernel zebra/zebra_nhg.c:3329
>     FRRouting#12 0x561e05bb572d in zebra_nhg_install_kernel zebra/zebra_nhg.c:3329
>     FRRouting#13 0x561e05bb572d in zebra_nhg_install_kernel zebra/zebra_nhg.c:3329
>     FRRouting#14 0x561e05bb572d in zebra_nhg_install_kernel zebra/zebra_nhg.c:3329

Fix this by not returning a nexthop group entry when creation is
necessary for pic context.
Add a check when the pic creation is not needed and the returned
nhe has the same identifier as the requested nhe.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
pguibert6WIND added a commit to pguibert6WIND/frr that referenced this issue Oct 21, 2024
When a failover happens on ECMP paths that use the same
nexthop which is recursively resolved, ZEBRA replaces the
old NHG with a new one, and updates the pointer of all
routes using that nexthop.

Actually, if only the recursive nexthop changed, there is
no need to replace the old NHG.
Modify the zebra_nhg_proto_add() function, by updating
the recursive nexthop on the original NHG.

Using this change replaces the old method that was consisting in
allocating a new nhe. This change triggers an ASAN in the
bgp_nhg_zapi_scalability test, function
test_bgp_ipv4_simulate_r5_machine_going_down().

> r1: zebra triggered an exception by AddressSanitizer
> AddressSanitizer error in topotest `test_bgp_nhg_zapi_scalability.py`, test `teardown_module`, router `r1`
>
> ERROR: AddressSanitizer: heap-use-after-free on address 0x60e00230afa0 at pc 0x55bfebc9681e bp 0x7ffd657ceb40 sp 0x7ffd657ceb30
> READ of size 4 at 0x60e00230afa0 thread T0
>     #0 0x55bfebc9681d in zebra_nhg_decrement_ref zebra/zebra_nhg.c:1855
>     FRRouting#1 0x55bfebc967f7 in zebra_nhg_decrement_ref zebra/zebra_nhg.c:1868
>     FRRouting#2 0x55bfebcb32f6 in route_entry_update_nhe zebra/zebra_rib.c:460
>     FRRouting#3 0x55bfebcb352f in rib_handle_nhg_replace zebra/zebra_rib.c:486
>     FRRouting#4 0x55bfebc99c14 in zebra_nhg_proto_add zebra/zebra_nhg.c:3836
>     FRRouting#5 0x55bfebcc4035 in process_subq_nhg zebra/zebra_rib.c:2763
>     FRRouting#6 0x55bfebcc4035 in process_subq zebra/zebra_rib.c:3369
>     FRRouting#7 0x55bfebcc4035 in meta_queue_process zebra/zebra_rib.c:3422
>     FRRouting#8 0x7f458a518bff in work_queue_run lib/workqueue.c:282
>     FRRouting#9 0x7f458a4fa24b in event_call lib/event.c:2019
>     FRRouting#10 0x7f458a41f717 in frr_run lib/libfrr.c:1238
>     FRRouting#11 0x55bfebb82cb4 in main zebra/main.c:528
>     FRRouting#12 0x7f4589e29d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>     FRRouting#13 0x7f4589e29e3f in __libc_start_main_impl ../csu/libc-start.c:392
>     FRRouting#14 0x55bfebb85c34 in _start (/usr/lib/frr/zebra+0x1abc34)
>
> 0x60e00230afa0 is located 96 bytes inside of 160-byte region [0x60e00230af40,0x60e00230afe0)
> freed by thread T0 here:
>     #0 0x7f458a8b4537 in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:127
>     FRRouting#1 0x55bfebc967f7 in zebra_nhg_decrement_ref zebra/zebra_nhg.c:1868
>     FRRouting#2 0x55bfebcb32f6 in route_entry_update_nhe zebra/zebra_rib.c:460
>     FRRouting#3 0x55bfebcb352f in rib_handle_nhg_replace zebra/zebra_rib.c:486
>     FRRouting#4 0x55bfebc99c14 in zebra_nhg_proto_add zebra/zebra_nhg.c:3836
>     FRRouting#5 0x55bfebcc4035 in process_subq_nhg zebra/zebra_rib.c:2763
>     FRRouting#6 0x55bfebcc4035 in process_subq zebra/zebra_rib.c:3369
>     FRRouting#7 0x55bfebcc4035 in meta_queue_process zebra/zebra_rib.c:3422
>     FRRouting#8 0x7f458a518bff in work_queue_run lib/workqueue.c:282
>     FRRouting#9 0x7f458a4fa24b in event_call lib/event.c:2019
>     FRRouting#10 0x7f458a41f717 in frr_run lib/libfrr.c:1238
>     FRRouting#11 0x55bfebb82cb4 in main zebra/main.c:528
>     FRRouting#12 0x7f4589e29d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>
> previously allocated by thread T0 here:
>     #0 0x7f458a8b4a57 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
>     FRRouting#1 0x7f458a43fb7e in qcalloc lib/memory.c:106
>     FRRouting#2 0x55bfebc91f2e in zebra_nhg_alloc zebra/zebra_nhg.c:392
>     FRRouting#3 0x55bfebc91f2e in zebra_nhe_copy zebra/zebra_nhg.c:499
>     FRRouting#4 0x55bfebc922af in zebra_nhg_hash_alloc zebra/zebra_nhg.c:538
>     FRRouting#5 0x7f458a3fd0bd in hash_get lib/hash.c:147
>     FRRouting#6 0x55bfebc94d7a in zebra_nhe_find zebra/zebra_nhg.c:831
>     FRRouting#7 0x55bfebc953ef in zebra_nhg_find zebra/zebra_nhg.c:1013
>     FRRouting#8 0x55bfebc9585d in zebra_nhg_find_nexthop zebra/zebra_nhg.c:1030
>     FRRouting#9 0x55bfebc94078 in depends_find_recursive zebra/zebra_nhg.c:1511
>     FRRouting#10 0x55bfebc94078 in depends_find zebra/zebra_nhg.c:1560
>     FRRouting#11 0x55bfebc94078 in depends_find_add zebra/zebra_nhg.c:1599
>     FRRouting#12 0x55bfebc99e40 in zebra_nhg_update_nhe zebra/zebra_nhg.c:3732
>     FRRouting#13 0x55bfebc99e40 in zebra_nhg_proto_add zebra/zebra_nhg.c:3819
>     FRRouting#14 0x55bfebcc4035 in process_subq_nhg zebra/zebra_rib.c:2763
>     FRRouting#15 0x55bfebcc4035 in process_subq zebra/zebra_rib.c:3369
>     FRRouting#16 0x55bfebcc4035 in meta_queue_process zebra/zebra_rib.c:3422
>     FRRouting#17 0x7f458a518bff in work_queue_run lib/workqueue.c:282
>     FRRouting#18 0x7f458a4fa24b in event_call lib/event.c:2019
>     FRRouting#19 0x7f458a41f717 in frr_run lib/libfrr.c:1238
>     FRRouting#20 0x55bfebb82cb4 in main zebra/main.c:528
>     FRRouting#21 0x7f4589e29d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>
> SUMMARY: AddressSanitizer: heap-use-after-free zebra/zebra_nhg.c:1855 in zebra_nhg_decrement_ref
> Shadow bytes around the buggy address:
>   0x0c1c804595a0: fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa fa
>   0x0c1c804595b0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
>   0x0c1c804595c0: fd fd fd fa fa fa fa fa fa fa fa fa fd fd fd fd
>   0x0c1c804595d0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa
>   0x0c1c804595e0: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
> =>0x0c1c804595f0: fd fd fd fd[fd]fd fd fd fd fd fd fd fa fa fa fa
>   0x0c1c80459600: fa fa fa fa fd fd fd fd fd fd fd fd fd fd fd fd
>   0x0c1c80459610: fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa fa
>   0x0c1c80459620: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
>   0x0c1c80459630: fd fd fd fa fa fa fa fa fa fa fa fa 00 00 00 00
>   0x0c1c80459640: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fa
> Shadow byte legend (one shadow byte represents 8 application bytes):
>   Addressable:           00
>   Partially addressable: 01 02 03 04 05 06 07
>   Heap left redzone:       fa
>   Freed heap region:       fd
>   Stack left redzone:      f1
>   Stack mid redzone:       f2
>   Stack right redzone:     f3
>   Stack after return:      f5
>   Stack use after scope:   f8
>   Global redzone:          f9
>   Global init order:       f6
>   Poisoned by user:        f7
>   Container overflow:      fc
>   Array cookie:            ac
>   Intra object redzone:    bb
>   ASan internal:           fe
>   Left alloca redzone:     ca
>   Right alloca redzone:    cb
>   Shadow gap:              cc
>

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
pguibert6WIND added a commit to pguibert6WIND/frr that referenced this issue Oct 30, 2024
The following ASAN error can be seen.

> ERROR: AddressSanitizer: attempting to call malloc_usable_size() for pointer which is not owned: 0x608000036c20
>     #0 0x7f3d7a4b5425 in __interceptor_malloc_usable_size ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:198
>     FRRouting#1 0x7f3d7a426a16 in __sanitizer::BufferedStackTrace::Unwind(unsigned long, unsigned long, void*, bool, unsigned int) ../../../../src/libsanitizer/sanitizer_common
> /sanitizer_stacktrace.h:122
>     FRRouting#2 0x7f3d7a426a16 in __asan::asan_malloc_usable_size(void const*, unsigned long, unsigned long) ../../../../src/libsanitizer/asan/asan_allocator.cpp:1074
>     FRRouting#3 0x7f3d7a03f330 in mt_count_free lib/memory.c:78
>     FRRouting#4 0x7f3d7a03f330 in qfree lib/memory.c:130
>     FRRouting#5 0x7f3d76ccf89b in bmp_peer_status_changed bgpd/bgp_bmp.c:982
>     FRRouting#6 0x560ae2aa6a94 in hook_call_peer_status_changed bgpd/bgp_fsm.c:47
>     FRRouting#7 0x560ae2aa6a94 in bgp_fsm_change_status bgpd/bgp_fsm.c:1287
>     FRRouting#8 0x560ae2c4f2e5 in peer_delete bgpd/bgpd.c:2777
>     FRRouting#9 0x560ae2c58d24 in bgp_delete bgpd/bgpd.c:4140
>     FRRouting#10 0x560ae2bbb47e in no_router_bgp bgpd/bgp_vty.c:1764
>     FRRouting#11 0x7f3d79fb74ed in cmd_execute_command_real lib/command.c:1003
>     FRRouting#12 0x7f3d79fb78a3 in cmd_execute_command lib/command.c:1062
>     FRRouting#13 0x7f3d79fb7e03 in cmd_execute lib/command.c:1228
>     FRRouting#14 0x7f3d7a107b53 in vty_command lib/vty.c:625
>     FRRouting#15 0x7f3d7a109902 in vty_execute lib/vty.c:1388
>     FRRouting#16 0x7f3d7a10cc32 in vtysh_read lib/vty.c:2400
>     FRRouting#17 0x7f3d7a0f848b in event_call lib/event.c:2019
>     FRRouting#18 0x7f3d7a01e627 in frr_run lib/libfrr.c:1232
>     FRRouting#19 0x560ae29e0037 in main bgpd/bgp_main.c:555
>     FRRouting#20 0x7f3d79a29d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>     FRRouting#21 0x7f3d79a29e3f in __libc_start_main_impl ../csu/libc-start.c:392
>     FRRouting#22 0x560ae29e4ef4 in _start (/usr/lib/frr/bgpd+0x2eeef4)
>
> 0x608000036c20 is located 0 bytes inside of 81-byte region [0x608000036c20,0x608000036c71)
> freed by thread T0 here:
>     #0 0x7f3d7a4b4537 in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:127
>     FRRouting#1 0x7f3d76ccf85f in bmp_peer_status_changed bgpd/bgp_bmp.c:981
>     FRRouting#2 0x560ae2aa6a94 in hook_call_peer_status_changed bgpd/bgp_fsm.c:47
>     FRRouting#3 0x560ae2aa6a94 in bgp_fsm_change_status bgpd/bgp_fsm.c:1287
>     FRRouting#4 0x560ae2c4f2e5 in peer_delete bgpd/bgpd.c:2777
>     FRRouting#5 0x560ae2c58d24 in bgp_delete bgpd/bgpd.c:4140
>     FRRouting#6 0x560ae2bbb47e in no_router_bgp bgpd/bgp_vty.c:1764
>     FRRouting#7 0x7f3d79fb74ed in cmd_execute_command_real lib/command.c:1003
>     FRRouting#8 0x7f3d79fb78a3 in cmd_execute_command lib/command.c:1062
>     FRRouting#9 0x7f3d79fb7e03 in cmd_execute lib/command.c:1228
>     FRRouting#10 0x7f3d7a107b53 in vty_command lib/vty.c:625
>     FRRouting#11 0x7f3d7a109902 in vty_execute lib/vty.c:1388
>     FRRouting#12 0x7f3d7a10cc32 in vtysh_read lib/vty.c:2400
>     FRRouting#13 0x7f3d7a0f848b in event_call lib/event.c:2019
>     FRRouting#14 0x7f3d7a01e627 in frr_run lib/libfrr.c:1232
>     FRRouting#15 0x560ae29e0037 in main bgpd/bgp_main.c:555
>     FRRouting#16 0x7f3d79a29d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>
> previously allocated by thread T0 here:
>     #0 0x7f3d7a4b4887 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
>     FRRouting#1 0x7f3d7a03f0e9 in qmalloc lib/memory.c:101
>     FRRouting#2 0x7f3d76cd0166 in bmp_bgp_peer_vrf bgpd/bgp_bmp.c:2194
>     FRRouting#3 0x7f3d76cd0166 in bmp_bgp_update_vrf_status bgpd/bgp_bmp.c:2236
>     FRRouting#4 0x7f3d76cd29b8 in bmp_vrf_state_changed bgpd/bgp_bmp.c:3479
>     FRRouting#5 0x560ae2c45b34 in hook_call_bgp_instance_state bgpd/bgpd.c:88
>     FRRouting#6 0x560ae2c4d158 in bgp_instance_up bgpd/bgpd.c:3936
>     FRRouting#7 0x560ae29e5ed1 in bgp_vrf_enable bgpd/bgp_main.c:299
>     FRRouting#8 0x7f3d7a0ff8b1 in vrf_enable lib/vrf.c:286
>     FRRouting#9 0x7f3d7a0ff8b1 in vrf_enable lib/vrf.c:275
>     FRRouting#10 0x7f3d7a12ab66 in zclient_vrf_add lib/zclient.c:2561
>     FRRouting#11 0x7f3d7a12eb43 in zclient_read lib/zclient.c:4624
>     FRRouting#12 0x7f3d7a0f848b in event_call lib/event.c:2019
>     FRRouting#13 0x7f3d7a01e627 in frr_run lib/libfrr.c:1232
>     FRRouting#14 0x560ae29e0037 in main bgpd/bgp_main.c:555
>     FRRouting#15 0x7f3d79a29d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
pguibert6WIND added a commit to pguibert6WIND/frr that referenced this issue Oct 30, 2024
The following ASAN error can be seen.

> ERROR: AddressSanitizer: attempting to call malloc_usable_size() for pointer which is not owned: 0x608000036c20
>     #0 0x7f3d7a4b5425 in __interceptor_malloc_usable_size ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:198
>     FRRouting#1 0x7f3d7a426a16 in __sanitizer::BufferedStackTrace::Unwind(unsigned long, unsigned long, void*, bool, unsigned int) ../../../../src/libsanitizer/sanitizer_common
> /sanitizer_stacktrace.h:122
>     FRRouting#2 0x7f3d7a426a16 in __asan::asan_malloc_usable_size(void const*, unsigned long, unsigned long) ../../../../src/libsanitizer/asan/asan_allocator.cpp:1074
>     FRRouting#3 0x7f3d7a03f330 in mt_count_free lib/memory.c:78
>     FRRouting#4 0x7f3d7a03f330 in qfree lib/memory.c:130
>     FRRouting#5 0x7f3d76ccf89b in bmp_peer_status_changed bgpd/bgp_bmp.c:982
>     FRRouting#6 0x560ae2aa6a94 in hook_call_peer_status_changed bgpd/bgp_fsm.c:47
>     FRRouting#7 0x560ae2aa6a94 in bgp_fsm_change_status bgpd/bgp_fsm.c:1287
>     FRRouting#8 0x560ae2c4f2e5 in peer_delete bgpd/bgpd.c:2777
>     FRRouting#9 0x560ae2c58d24 in bgp_delete bgpd/bgpd.c:4140
>     FRRouting#10 0x560ae2bbb47e in no_router_bgp bgpd/bgp_vty.c:1764
>     FRRouting#11 0x7f3d79fb74ed in cmd_execute_command_real lib/command.c:1003
>     FRRouting#12 0x7f3d79fb78a3 in cmd_execute_command lib/command.c:1062
>     FRRouting#13 0x7f3d79fb7e03 in cmd_execute lib/command.c:1228
>     FRRouting#14 0x7f3d7a107b53 in vty_command lib/vty.c:625
>     FRRouting#15 0x7f3d7a109902 in vty_execute lib/vty.c:1388
>     FRRouting#16 0x7f3d7a10cc32 in vtysh_read lib/vty.c:2400
>     FRRouting#17 0x7f3d7a0f848b in event_call lib/event.c:2019
>     FRRouting#18 0x7f3d7a01e627 in frr_run lib/libfrr.c:1232
>     FRRouting#19 0x560ae29e0037 in main bgpd/bgp_main.c:555
>     FRRouting#20 0x7f3d79a29d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>     FRRouting#21 0x7f3d79a29e3f in __libc_start_main_impl ../csu/libc-start.c:392
>     FRRouting#22 0x560ae29e4ef4 in _start (/usr/lib/frr/bgpd+0x2eeef4)
>
> 0x608000036c20 is located 0 bytes inside of 81-byte region [0x608000036c20,0x608000036c71)
> freed by thread T0 here:
>     #0 0x7f3d7a4b4537 in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:127
>     FRRouting#1 0x7f3d76ccf85f in bmp_peer_status_changed bgpd/bgp_bmp.c:981
>     FRRouting#2 0x560ae2aa6a94 in hook_call_peer_status_changed bgpd/bgp_fsm.c:47
>     FRRouting#3 0x560ae2aa6a94 in bgp_fsm_change_status bgpd/bgp_fsm.c:1287
>     FRRouting#4 0x560ae2c4f2e5 in peer_delete bgpd/bgpd.c:2777
>     FRRouting#5 0x560ae2c58d24 in bgp_delete bgpd/bgpd.c:4140
>     FRRouting#6 0x560ae2bbb47e in no_router_bgp bgpd/bgp_vty.c:1764
>     FRRouting#7 0x7f3d79fb74ed in cmd_execute_command_real lib/command.c:1003
>     FRRouting#8 0x7f3d79fb78a3 in cmd_execute_command lib/command.c:1062
>     FRRouting#9 0x7f3d79fb7e03 in cmd_execute lib/command.c:1228
>     FRRouting#10 0x7f3d7a107b53 in vty_command lib/vty.c:625
>     FRRouting#11 0x7f3d7a109902 in vty_execute lib/vty.c:1388
>     FRRouting#12 0x7f3d7a10cc32 in vtysh_read lib/vty.c:2400
>     FRRouting#13 0x7f3d7a0f848b in event_call lib/event.c:2019
>     FRRouting#14 0x7f3d7a01e627 in frr_run lib/libfrr.c:1232
>     FRRouting#15 0x560ae29e0037 in main bgpd/bgp_main.c:555
>     FRRouting#16 0x7f3d79a29d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>
> previously allocated by thread T0 here:
>     #0 0x7f3d7a4b4887 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
>     FRRouting#1 0x7f3d7a03f0e9 in qmalloc lib/memory.c:101
>     FRRouting#2 0x7f3d76cd0166 in bmp_bgp_peer_vrf bgpd/bgp_bmp.c:2194
>     FRRouting#3 0x7f3d76cd0166 in bmp_bgp_update_vrf_status bgpd/bgp_bmp.c:2236
>     FRRouting#4 0x7f3d76cd29b8 in bmp_vrf_state_changed bgpd/bgp_bmp.c:3479
>     FRRouting#5 0x560ae2c45b34 in hook_call_bgp_instance_state bgpd/bgpd.c:88
>     FRRouting#6 0x560ae2c4d158 in bgp_instance_up bgpd/bgpd.c:3936
>     FRRouting#7 0x560ae29e5ed1 in bgp_vrf_enable bgpd/bgp_main.c:299
>     FRRouting#8 0x7f3d7a0ff8b1 in vrf_enable lib/vrf.c:286
>     FRRouting#9 0x7f3d7a0ff8b1 in vrf_enable lib/vrf.c:275
>     FRRouting#10 0x7f3d7a12ab66 in zclient_vrf_add lib/zclient.c:2561
>     FRRouting#11 0x7f3d7a12eb43 in zclient_read lib/zclient.c:4624
>     FRRouting#12 0x7f3d7a0f848b in event_call lib/event.c:2019
>     FRRouting#13 0x7f3d7a01e627 in frr_run lib/libfrr.c:1232
>     FRRouting#14 0x560ae29e0037 in main bgpd/bgp_main.c:555
>     FRRouting#15 0x7f3d79a29d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants