bgpd: avoid clearing routes for peers that were never established #16271

lsang6WIND · 2024-06-24T09:10:17Z

Under heavy system load with many peers in passive mode and a large number of routes, bgpd can enter an infinite loop. This occurs while processing timeout BGP_OPEN messages, which prevents it from accepting new connections. The following log entries illustrate the issue:

bgpd[6151]: [VX6SM-8YE5W][EC 33554460] 3.3.2.224: nexthop_set failed, resetting connection - intf 0x0
bgpd[6151]: [P790V-THJKS][EC 100663299] bgp_open_receive: bgp_getsockname() failed for peer: 3.3.2.224
bgpd[6151]: [HTQD2-0R1WR][EC 33554451] bgp_process_packet: BGP OPEN receipt failed for peer: 3.3.2.224
... repeating

The issue occurs when bgpd handles a massive number of routes in the RIB while receiving numerous BGP_OPEN packets. If bgpd is overloaded, it fails to process these packets promptly, leading the remote peer to close the connection and resend BGP_OPEN packets.

When bgpd eventually starts processing these timeout BGP_OPEN packets, it finds the TCP connection closed by the remote peer, resulting in "bgp_stop()" being called. For each timeout peer, bgpd must iterate through the routing table, which is time-consuming and causes new incoming BGP_OPEN packets to timeout, perpetuating the infinite loop.

To address this issue, the code is modified to check if the peer has been established at least once before calling "bgp_clear_route_all()". This ensures that routes are only cleared for peers that had a successful session, preventing unnecessary iterations over the routing table for peers that never established a connection.

With this change, BGP_OPEN timeout messages may still occur, but in the worst case, bgpd will stabilize. Before this patch, bgpd could enter a loop where it was unable to accpet any new connections.

ton31337

Makes sense.

riw777

looks good

fdumontet6WIND · 2024-06-25T08:57:44Z

improve performance

ton31337 · 2024-06-25T09:21:47Z

@Mergifyio backport dev/10.1

mergify · 2024-06-25T09:21:56Z

backport dev/10.1

✅ Backports have been created

#16302 bgpd: avoid clearing routes for peers that were never established (backport #16271) has been created for branch dev/10.1

riw777 · 2024-06-26T12:31:42Z

lint errors need to be fixed ... still trying to get ci to pass (it's failing in ospf)

Under heavy system load with many peers in passive mode and a large number of routes, bgpd can enter an infinite loop. This occurs while processing timeout BGP_OPEN messages, which prevents it from accepting new connections. The following log entries illustrate the issue: >bgpd[6151]: [VX6SM-8YE5W][EC 33554460] 3.3.2.224: nexthop_set failed, resetting connection - intf 0x0 >bgpd[6151]: [P790V-THJKS][EC 100663299] bgp_open_receive: bgp_getsockname() failed for peer: 3.3.2.224 >bgpd[6151]: [HTQD2-0R1WR][EC 33554451] bgp_process_packet: BGP OPEN receipt failed for peer: 3.3.2.224 ... repeating The issue occurs when bgpd handles a massive number of routes in the RIB while receiving numerous BGP_OPEN packets. If bgpd is overloaded, it fails to process these packets promptly, leading the remote peer to close the connection and resend BGP_OPEN packets. When bgpd eventually starts processing these timeout BGP_OPEN packets, it finds the TCP connection closed by the remote peer, resulting in "bgp_stop()" being called. For each timeout peer, bgpd must iterate through the routing table, which is time-consuming and causes new incoming BGP_OPEN packets to timeout, perpetuating the infinite loop. To address this issue, the code is modified to check if the peer has been established at least once before calling "bgp_clear_route_all()". This ensures that routes are only cleared for peers that had a successful session, preventing unnecessary iterations over the routing table for peers that never established a connection. With this change, BGP_OPEN timeout messages may still occur, but in the worst case, bgpd will stabilize. Before this patch, bgpd could enter a loop where it was unable to accpet any new connections. Signed-off-by: Loïc Sang <loic.sang@6wind.com>

lsang6WIND · 2024-06-26T15:20:08Z

Previous checks are all okay except for the linter.

The following topotest failures are not related to this PR:
bgp_gr_functionality_topo1.test_bgp_gr_functionality_topo1-3 test_BGP_GR_TC_31_1_p1
bgp_peer_type_multipath_relax.test_bgp_peer-type_multipath-relax test_bgp_peer_type_multipath_relax_test10

bgpd: avoid clearing routes for peers that were never established (backport #16271)

github-actions bot added master size/XS labels Jun 24, 2024

ton31337 approved these changes Jun 24, 2024

View reviewed changes

riw777 approved these changes Jun 24, 2024

View reviewed changes

github-actions bot added the backport label Jun 25, 2024

frrbot bot added the bgp label Jun 25, 2024

lsang6WIND force-pushed the avoid-loop branch from fb19324 to 88d7eb5 Compare June 26, 2024 09:21

lsang6WIND force-pushed the avoid-loop branch from 88d7eb5 to e0ae285 Compare June 26, 2024 14:11

riw777 merged commit 40f7926 into FRRouting:master Jun 26, 2024
11 checks passed

mergify bot mentioned this pull request Jun 26, 2024

bgpd: avoid clearing routes for peers that were never established (backport #16271) #16302

Merged

donaldsharp added a commit that referenced this pull request Jun 27, 2024

Merge pull request #16302 from FRRouting/mergify/bp/dev/10.1/pr-16271

6d24756

bgpd: avoid clearing routes for peers that were never established (backport #16271)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bgpd: avoid clearing routes for peers that were never established #16271

bgpd: avoid clearing routes for peers that were never established #16271

lsang6WIND commented Jun 24, 2024

ton31337 left a comment

riw777 left a comment

fdumontet6WIND commented Jun 25, 2024

ton31337 commented Jun 25, 2024

mergify bot commented Jun 25, 2024 •

edited

Loading

riw777 commented Jun 26, 2024

lsang6WIND commented Jun 26, 2024

bgpd: avoid clearing routes for peers that were never established #16271

bgpd: avoid clearing routes for peers that were never established #16271

Conversation

lsang6WIND commented Jun 24, 2024

ton31337 left a comment

Choose a reason for hiding this comment

riw777 left a comment

Choose a reason for hiding this comment

fdumontet6WIND commented Jun 25, 2024

ton31337 commented Jun 25, 2024

mergify bot commented Jun 25, 2024 • edited Loading

✅ Backports have been created

riw777 commented Jun 26, 2024

lsang6WIND commented Jun 26, 2024

mergify bot commented Jun 25, 2024 •

edited

Loading