-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[staticroutebfd]fix an issue on deleting a non-bfd static route #15269
Conversation
self.remove_from_local_db(LOCAL_BFD_TABLE, bfd_key) | ||
self.del_bfd_session_from_appl_db(bfd_key) | ||
# do not delete it from appl_db if the route is not bfd enabled | ||
if bfd_enabled: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why aren't we removing the static route if bfd not enabled? What was the previous behavior? Does this mean if the user removes static route via CLI, won't it get removed from APP_DB?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If BFD is not enabled, staticroutebfd does not write static route entry to appl_db. so it should not delete it from appl_db. In general, prefix in config_db should not in appl_db. but to be safe, it is better that staticroutebfd does not touch that route in appl_db if it is not created by staticroutebfd.
set_del_test(dut, "srt", | ||
"DEL", | ||
("3.3.3.0/24", { | ||
"nexthop": "192.168.1.2 , 192.168.2.2", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fv-pairs for del notification will be empty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will remove that to prevent DUT using these values in case
@@ -169,6 +168,28 @@ def test_set_del(): | |||
{'del_default:2.2.2.0/24': {}} | |||
) | |||
|
|||
# test add a non-bfd static route | |||
set_del_test(dut, "srt", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also suggest adding a testcase which has nexthop-vrf parameter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new testcases are added for nexthop-vrf list. thanks
@@ -9,5 +9,4 @@ program:pimd | |||
program:frrcfgd | |||
{%- else %} | |||
program:bgpcfgd | |||
program:staticroutebfd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But Isn't this a critical process? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
staticroutebfd itself supports recovery from restart/crash. So it is not necessary to set it to critical process. When a critical process crashes, the bgp container itself will restart, which has more impact. to reduce impact to the bgp container, removed it from critical process, and let supervisord restart staticroutebfd only (see change in supervisord.conf.j2 below).
@prsunny Can we merge this? |
@gechiang needed for msft repo 202205 branch |
Double commit the PRs from sonic-net/sonic-buildimage. Original PR for staticroutebfd: sonic-net/sonic-buildimage#13789 And a fix for staticroutebfd: sonic-net/sonic-buildimage#15269
…c-net#15269) * [static_route][staticroutebfd]fix an issue on deleting a non-bfd static route Fix an issue for deleting a non-bfd static route also remove the staticroutebfd from critical_processes list and make it auto restart in the case of crash.
…c-buildimage
What I did it
Fix an issue for deleting a non-bfd static route
also remove the staticroutebfd from critical_processes list and make it auto restart in the case of crash.
fixes #15267
Why I did it
Current design access "None" object (nh_vrf_list) when delete a non-bfd static route
For supervisord, because staticroutebfd support restart and recovery. so it is not necessary to restart bgp container if there is any crash inside staticroutebfd. supervisord will restart it after crash.
Work item tracking
Microsoft ADO: https://msazure.visualstudio.com/One/_workitems/edit/17793093
How I did it
check if the route is BFD protected when delete this static route, and check if the nh_vrf_list is None.
How to verify it
add a UT deleting a non-bfd static route to reproduce the issue:
fix the issue and run the UT:
For supervisord, after changed the critical_processes.j2 and supervisord.conf.j2, restart bgp (sudo systemctl restart bgp),
manually kill the staticroutebfd process inside bgp container, it can be restarted without restart the whole bgp container.
Which release branch to backport (provide reason below if selected)
Tested branch (Please provide the tested image version)
Description for the changelog
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)