FIB Suppress Announcements of Routes Not Installed in HW #1103

stepanblyschak · 2022-10-25T16:24:22Z

This document describes a feedback mechanism that allows BGP not to advertise routes that haven't been programmed yet.

PR title	state	context
[fpm] Fix FpmLink to read all netlink messages from FPM message
[NotificationProducer] add pipeline support
[ResponsePublisher] add pipeline support
[RouteOrch] Record ROUTE_TABLE entry programming status to APPL_STATE_DB
[config/show] Add command to control pending FIB suppression
[FRR] Switch to dplane_fpm_nl plugin instead of fpm
[BGP] support BGP pending FIB suppression
[fpmsyncd] Implement pending route suppression feature
[route_check] implement a check for FRR routes not marked offloaded
Add BGP Suppress FIB Pending test plan
BGP suppress fib pending script

For stress and performance test implementation, the implementation plan is under discussion, add a PR to track this requirement:

Issue	state
[fpm] Fix FpmLink to read all netlink messages from FPM message

Signed-off-by: stepanblyschak <stepanb@nvidia.com>

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>

stepanblyschak · 2022-11-28T13:57:35Z

FRR only(no SONiC involved) with a simple FPM application that sends reply right after route msg received.
The performance plot comparison:

liat-grozovik · 2022-12-05T08:57:53Z

@StormLiangMS could you please help to review and approve?

venkatmahalingam · 2022-12-09T16:20:41Z

doc/BGP/BGP-supress-fib-pending.md

+```abnf
+; DEVICE_METADATA table
+key                      = DEVICE_METADATA|localhost ; Device metadata configuration table
+suppress-fib-pending     = "enabled"/"disabled"      ; Globally enable/disable routes announcement suppression feature,


If this controls only for BGP routes, please be explicit in the field name.

It is implemented in fpmsyncd/zebra which does not care if it receives BGP or other protocol routes, so, if SONiC supports, the same feature could be applied to other routing protocols.

Since we talk only about BGP routes in the HLD, I thought it is better to have a name like "suppress-bgp-routes-fib-pending" and handle only BGP routes case. If you think, that's not the case and you'll enable support for all route types (BGP, Static, OSPF..etc), we can have the generic field name (suppress-fib-pending)

venkatmahalingam · 2022-12-09T16:44:12Z

doc/BGP/BGP-supress-fib-pending.md

+
+Example data:
+```
+127.0.0.1:6379[14]> hgetall "ROUTE_TABLE:193.5.96.0/25"


Why dont we have 'vrf' as the key in the ROUTE_TABLE? we could be supporting only 'default' VRF now, but it's important to have the VRF as the key for future enhancements with data VRF.

Any plan for bulking the routes in the APPL STATE_DB ROUTE_TABLE?

Why dont we have 'vrf' as the key in the ROUTE_TABLE? we could be supporting only 'default' VRF now, but it's important to have the VRF as the key for future enhancements with data VRF.

This example is for default VRF. The key in APPL_STATE_DB is the same as in APPL_DB, with VRF included (if not default), so VRF is supported.

Any plan for bulking the routes in the APPL STATE_DB ROUTE_TABLE?

The bulking is done by using redis pipeline for ResponsePublisher.

ROUTE_TABLE in APPL_DB is legacy and we did not have VRF support originally and when we added the VRF support, we did not want to disrupt the existing schema and added the VRF key only for user VRF cases but APPL STATE_DB ROUTE_TABLE is a new schema, not sure why we still need to follow ROUTE_TABLE (APPL_DB) approach rather than having the VRF as the key by default even for 'default' VRF case as well. IMO. having VRF as one of the key by default is cleaner.

Why ROUTE_TABLE in APPL_DB is legacy? Is there a new table with a new schema?

Here's the latest master and ROUTE_TABLE entry in APPL_DB:

127.0.0.1:6379> keys "ROUTE_TABLE:193.8.224.0/25" 1) "ROUTE_TABLE:193.8.224.0/25"

If it were a user created VRF route the key would be in a format :.
The idea of APPL_STATE_DB is to reflect the same schema that APPL_DB has.

The idea of APPL_STATE_DB is to reflect the same schema that APPL_DB has.

APPL_DB: ------------ Default VRF: ROUTE_TABLE:<prefix> Non-default VRF: ROUTE_TABLE:<vrf>:<prefix> IMO, having the below schema helps to cleanly get the VRF (e.g default) for every route instead of assuming if the VRF is not there in the key, the route is meant for default route, we don't need to follow APPL_DB ROUTE_TABLE schema for new tables but I leave it to you to decide on the final schema. APPL_STATE_DB: --------------------- All VRFs including default VRF: ROUTE_TABLE:<vrf>:<prefix>

venkatmahalingam · 2022-12-09T16:52:17Z

doc/BGP/BGP-supress-fib-pending.md

+
+### 3. Overview
+
+As of today, SONiC BGP advertises learnt prefixes regardless of whether these prefixes were successfully programmed into ASIC. While route programming failure is followed by orchagent crash and all services restart, even for successfully created routes there is a short period of time when the peer will be black holing traffic. Also, in the following scenario, a credit loop occurs:


When there are multiple updates for the given route (there are NH changes) from fpmsyncd to route-orch, route-orch will send the response to every route changes after ASIC programming? or it will send the response to the latest route-update to fpmsyncd? please clarify.
Also, what happens in the case where we get the route-del from zebra before updating the RTM_NEWROUTE message with RTM_F_OFFLOAD flag?

Orchagent will send a response for every route update it gets from APPL_DB. As far as I know there is a logic inside ProducerStateTable/ConsumerStateTable that will consolidate multiple SET/DEL operations for a single key which makes orchagent get the most up to date requested field values for a route entry. FRR, on the other hand, currently is only interested in the first route creation status. It does not care about the installation status after the route next hop or other attributes have changed.

If zebra sends RTM_DELROUTE before fpmsyncd sends RTM_NEWROUTE with RTM_F_OFFLOAD, zebra will ignore it, considering it as a stale update.

Good to know, thanks for the clarification.

venkatmahalingam · 2022-12-09T17:02:10Z

doc/BGP/BGP-supress-fib-pending.md

+     2. Withdraw 10 prefix to DUT through *exabgp* from T0 Arista VM
+  3. Once reached the required number of cycles the loop breaks after first step
+  4. Consistency check is applied, there are no withdrawn routes and announced routes were successfully installed and correctly marked as offloaded in zebra (In case a race condition happens or a notification is missed for some reason this test will try to catch it)
+


Don't we have any test case to simulate orchagent crash and delay the response to fpmsyncd from route-orch?

In case orchagent crash there will be no response from routeorch. Which tests case do you propose?

What will happen if fpmsyncd doesnot notify RTM_F_OFFLOAD to Zebra due to orchagent crash? Is there any timeout in Zebra to declare FIB programming failed? what would happen for those routes in Zebra that are waiting for the FIB programming after OrchAgent back up?

There's no logic in zebra that considers a route beeing failed in HW if it did not receive RTM_F_OFFLOAD flag within some time interval. These routes will stay in queue-ed state.
Orchagent crash itself triggers BGP container(zebra + bgpd) restart. So in that case, zebra starts fresh with no routes.

venkatmahalingam · 2022-12-09T17:10:23Z

doc/BGP/BGP-supress-fib-pending.md

+
+Routing is a crucial part of a network switch. This feature adds additional flows in existing route processing pipeline and so along the way there might be unexpected failures leading to routes being not marked as offloaded in zebra - missed notification, race conditions leading to in-consistency, etc. It is required to monitor the consistency of routes state periodically and mitigate problems.
+
+At the moment *route_check.py* verifies routes between APPL_DB & ASIC_DB and makes sure they are in sync. In addition to that it is required to extend this script to check for consistency APPL_DB, APPL_STATE_DB and zebra by calling ```show ip route json``` and verify every route installed in ASIC_DB is marked as offloaded. The script will retry the check for 3 times every 15 sec and if zebra FIB is not in sync with ASIC_DB a mitigation action is performed. The mitigation action publishes required notifications to trigger fpmsyncd flows to send RTM_NEWROUTE message to zebra and log alerts in the syslog.


There can be Static/OSPF/ISIS route updates apart from BGP routes as well, hope you'll allow offloaded/not-offloaded routes to be present in the DB.

Could you please clarify your comment? What do you mean by allowing offloaded/not-offloaded routes to be present in the DB?

I think, this HLD talks about adding "suppress-fib-pending" support only for BGP routes, what happens if we have other protocol routes (e.g OSPF) in the switch, what would be the behavior?

I would assume OSPF to implement a similar logic. We dropped the "bgp" word in configuration flag because from our point of view there's nothing related to BGP in these new flows. It is on the level of zebra/fpmsyncd/orchagent and they do not care which protocol the route they are working with is originated.

venkatmahalingam · 2022-12-09T17:12:40Z

doc/BGP/BGP-supress-fib-pending.md

+
+During fast reboot BGP session is closed by SONiC device without the notification. BGP session is preserved in graceful restart mode. BGP routes on the peer are still active, because nexthop interfaces are up. Once interfaces go down, received BGP routes on the peer are removed from the routing table. Nothing is sent to SONiC device since then. After interfaces go up and BGP sessions re-establish the peer's BGP re-learns advertised routes.
+
+Due to additional response publishing in orchagent there might be a slight delay in fast reboot reconfiguration.


What's the fast boot time with this change?

Must be less than 30 sec down time

liat-grozovik · 2023-01-29T09:39:40Z

@StormLiangMS @prsunny please provide your inputs. If no additional feedback please approve.

liat-grozovik · 2023-01-29T09:39:48Z

The feature and its current content is expected to be in 202305 and under discussion if to take into 202211 as originally aimed.

prsunny · 2023-02-09T03:24:46Z

@venkatmahalingam , could you please sign off?

prsunny · 2023-02-16T17:30:37Z

@stepanblyschak , can you please update the Sonic-mgmt PR to the description?

prsunny · 2023-02-17T01:51:29Z

@abdosi , @arlakshm , @judyjoseph for viz

zhangyanzhao · 2023-05-08T04:50:56Z

Two test PRs are still open. Need be merged @echuawu

echuawu · 2023-05-09T02:38:34Z

@zhangyanzhao , we are handling the comments from @StormLiangMS, and investigating the possiblity of implementing performance test in sonic-mgmt.

echuawu · 2023-05-24T08:06:08Z

For stress and performance test implementation, the implementation plan is under discussion, add a PR to track this requirement: sonic-net/sonic-mgmt#8409

StormLiangMS · 2023-05-25T02:29:57Z

For stress and performance test implementation, the implementation plan is under discussion, add a PR to track this requirement: sonic-net/sonic-mgmt#8409

@echuawu could you put this to the description of this PR.

echuawu · 2023-05-25T06:48:09Z

@stepanblyschak , please help do that as @StormLiangMS suggested.

stepanblyschak · 2023-05-25T14:49:10Z

@echuawu @StormLiangMS Added

stepanblyschak and others added 30 commits August 12, 2022 13:25

[BGP] Add BGP suppress FIB pending

7002b69

Signed-off-by: stepanblyschak <stepanb@nvidia.com>

Merge branch 'sonic-net:master' into bgp-suppress-fib-pending

6df6f41

Update BGP-supress-fib-pending.md

ee90f16

Update BGP-supress-fib-pending.md

3aae79f

Update BGP-supress-fib-pending.md

3ca8282

Update BGP-supress-fib-pending.md

5707bd3

Update BGP-supress-fib-pending.md

ff9a9f4

Update BGP-supress-fib-pending.md

bee0a9e

Update BGP-supress-fib-pending.md

576335c

Update BGP-supress-fib-pending.md

1b1d8c1

Update BGP-supress-fib-pending.md

52d683a

Update BGP-supress-fib-pending.md

7591fb2

Add files via upload

9195060

Update BGP-supress-fib-pending.md

8fd23f8

Update BGP-supress-fib-pending.md

eb83fdf

Update BGP-supress-fib-pending.md

e667886

Update BGP-supress-fib-pending.md

88315ca

Update BGP-supress-fib-pending.md

708ea68

Update BGP-supress-fib-pending.md

e43bf41

Update BGP-supress-fib-pending.md

a529a55

Update BGP-supress-fib-pending.md

b97ba7a

Update BGP-supress-fib-pending.md

ab3d6fe

Update BGP-supress-fib-pending.md

6f38f57

Update BGP-supress-fib-pending.md

1d07b68

Update BGP-supress-fib-pending.md

a52c409

Update BGP-supress-fib-pending.md

5f33539

Update BGP-supress-fib-pending.md

61d159c

Update BGP-supress-fib-pending.md

d7e8900

Update BGP-supress-fib-pending.md

2aaf0bb

Update BGP-supress-fib-pending.md

f4aa91f

stepanblyschak added 3 commits November 25, 2022 15:12

convert line endings

1abee24

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>

rename flag to suppress-fib-pending

830f953

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>

add performance section

3ae161c

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>

venkatmahalingam reviewed Dec 9, 2022

View reviewed changes

liat-grozovik added the feature request label Jan 2, 2023

Merge branch 'master' into bgp-suppress-fib-pending

f7689f0

prsunny approved these changes Feb 9, 2023

View reviewed changes

Merge branch 'master' into bgp-suppress-fib-pending

489f726

liat-grozovik approved these changes Feb 23, 2023

View reviewed changes

Merge branch 'master' into bgp-suppress-fib-pending

146d7f0

liat-grozovik merged commit ab996b4 into sonic-net:master Feb 23, 2023

StormLiangMS mentioned this pull request May 24, 2023

Add BGP Suppress FIB Pending test plan sonic-net/sonic-mgmt#7475

Merged

6 tasks


		### 3. Overview

		As of today, SONiC BGP advertises learnt prefixes regardless of whether these prefixes were successfully programmed into ASIC. While route programming failure is followed by orchagent crash and all services restart, even for successfully created routes there is a short period of time when the peer will be black holing traffic. Also, in the following scenario, a credit loop occurs:


		Routing is a crucial part of a network switch. This feature adds additional flows in existing route processing pipeline and so along the way there might be unexpected failures leading to routes being not marked as offloaded in zebra - missed notification, race conditions leading to in-consistency, etc. It is required to monitor the consistency of routes state periodically and mitigate problems.

		At the moment route_check.py verifies routes between APPL_DB & ASIC_DB and makes sure they are in sync. In addition to that it is required to extend this script to check for consistency APPL_DB, APPL_STATE_DB and zebra by calling ```show ip route json``` and verify every route installed in ASIC_DB is marked as offloaded. The script will retry the check for 3 times every 15 sec and if zebra FIB is not in sync with ASIC_DB a mitigation action is performed. The mitigation action publishes required notifications to trigger fpmsyncd flows to send RTM_NEWROUTE message to zebra and log alerts in the syslog.


		During fast reboot BGP session is closed by SONiC device without the notification. BGP session is preserved in graceful restart mode. BGP routes on the peer are still active, because nexthop interfaces are up. Once interfaces go down, received BGP routes on the peer are removed from the routing table. Nothing is sent to SONiC device since then. After interfaces go up and BGP sessions re-establish the peer's BGP re-learns advertised routes.

		Due to additional response publishing in orchagent there might be a slight delay in fast reboot reconfiguration.

FIB Suppress Announcements of Routes Not Installed in HW #1103

FIB Suppress Announcements of Routes Not Installed in HW #1103

Conversation

stepanblyschak commented Oct 25, 2022 • edited Loading

stepanblyschak commented Nov 28, 2022

liat-grozovik commented Dec 5, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

venkatmahalingam Dec 17, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stepanblyschak Dec 15, 2022 • edited Loading

Choose a reason for hiding this comment

venkatmahalingam Dec 17, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

venkatmahalingam Dec 19, 2022 • edited Loading

Choose a reason for hiding this comment

venkatmahalingam Dec 9, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

venkatmahalingam Dec 9, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

venkatmahalingam Dec 17, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liat-grozovik commented Jan 29, 2023

liat-grozovik commented Jan 29, 2023

prsunny commented Feb 9, 2023

prsunny commented Feb 16, 2023

prsunny commented Feb 17, 2023

zhangyanzhao commented May 8, 2023

echuawu commented May 9, 2023

echuawu commented May 24, 2023

StormLiangMS commented May 25, 2023

echuawu commented May 25, 2023

stepanblyschak commented May 25, 2023

stepanblyschak commented Oct 25, 2022 •

edited

Loading

venkatmahalingam Dec 17, 2022 •

edited

Loading

stepanblyschak Dec 15, 2022 •

edited

Loading

venkatmahalingam Dec 17, 2022 •

edited

Loading

venkatmahalingam Dec 19, 2022 •

edited

Loading

venkatmahalingam Dec 9, 2022 •

edited

Loading

venkatmahalingam Dec 9, 2022 •

edited

Loading

venkatmahalingam Dec 17, 2022 •

edited

Loading