Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bgpd: avoid clearing routes for peers that were never established (backport #16271) #16302

Merged
merged 1 commit into from
Jun 27, 2024

Conversation

mergify[bot]
Copy link

@mergify mergify bot commented Jun 26, 2024

Under heavy system load with many peers in passive mode and a large number of routes, bgpd can enter an infinite loop. This occurs while processing timeout BGP_OPEN messages, which prevents it from accepting new connections. The following log entries illustrate the issue:

bgpd[6151]: [VX6SM-8YE5W][EC 33554460] 3.3.2.224: nexthop_set failed, resetting connection - intf 0x0
bgpd[6151]: [P790V-THJKS][EC 100663299] bgp_open_receive: bgp_getsockname() failed for peer: 3.3.2.224
bgpd[6151]: [HTQD2-0R1WR][EC 33554451] bgp_process_packet: BGP OPEN receipt failed for peer: 3.3.2.224
... repeating

The issue occurs when bgpd handles a massive number of routes in the RIB while receiving numerous BGP_OPEN packets. If bgpd is overloaded, it fails to process these packets promptly, leading the remote peer to close the connection and resend BGP_OPEN packets.

When bgpd eventually starts processing these timeout BGP_OPEN packets, it finds the TCP connection closed by the remote peer, resulting in "bgp_stop()" being called. For each timeout peer, bgpd must iterate through the routing table, which is time-consuming and causes new incoming BGP_OPEN packets to timeout, perpetuating the infinite loop.

To address this issue, the code is modified to check if the peer has been established at least once before calling "bgp_clear_route_all()". This ensures that routes are only cleared for peers that had a successful session, preventing unnecessary iterations over the routing table for peers that never established a connection.

With this change, BGP_OPEN timeout messages may still occur, but in the worst case, bgpd will stabilize. Before this patch, bgpd could enter a loop where it was unable to accpet any new connections.


This is an automatic backport of pull request #16271 done by Mergify.

Under heavy system load with many peers in passive mode and a large
number of routes, bgpd can enter an infinite loop. This occurs while
processing timeout BGP_OPEN messages, which prevents it from accepting
new connections. The following log entries illustrate the issue:
>bgpd[6151]: [VX6SM-8YE5W][EC 33554460] 3.3.2.224: nexthop_set failed, resetting connection - intf 0x0
>bgpd[6151]: [P790V-THJKS][EC 100663299] bgp_open_receive: bgp_getsockname() failed for peer: 3.3.2.224
>bgpd[6151]: [HTQD2-0R1WR][EC 33554451] bgp_process_packet: BGP OPEN receipt failed for peer: 3.3.2.224
... repeating

The issue occurs when bgpd handles a massive number of routes in the RIB
while receiving numerous BGP_OPEN packets. If bgpd is overloaded, it
fails to process these packets promptly, leading the remote peer to
close the connection and resend BGP_OPEN packets.

When bgpd eventually starts processing these timeout BGP_OPEN packets,
it finds the TCP connection closed by the remote peer, resulting in
"bgp_stop()" being called. For each timeout peer, bgpd must iterate
through the routing table, which is time-consuming and causes new
incoming BGP_OPEN packets to timeout, perpetuating the infinite loop.

To address this issue, the code is modified to check if the peer has
been established at least once before calling "bgp_clear_route_all()".
This ensures that routes are only cleared for peers that had a
successful session, preventing unnecessary iterations over the routing
table for peers that never established a connection.

With this change, BGP_OPEN timeout messages may still occur, but in the
worst case, bgpd will stabilize. Before this patch, bgpd could enter a
loop where it was unable to accpet any new connections.

Signed-off-by: Loïc Sang <loic.sang@6wind.com>
(cherry picked from commit e0ae285)
@frrbot frrbot bot added the bgp label Jun 26, 2024
@donaldsharp donaldsharp merged commit 6d24756 into dev/10.1 Jun 27, 2024
14 checks passed
@mergify mergify bot deleted the mergify/bp/dev/10.1/pr-16271 branch June 27, 2024 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant