-
Notifications
You must be signed in to change notification settings - Fork 661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[202205] Fast-reboot: preserve connected routes and set teamd timer to minimum. #2744
[202205] Fast-reboot: preserve connected routes and set teamd timer to minimum. #2744
Conversation
@arfeigin to remove draft indication once all testing is done |
@prsunny , @vaibhavhd , Can you please review ? |
|
||
def generate_default_route_entries(): | ||
db = ConfigDBConnector() | ||
db.db_connect(db.APPL_DB) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would fail on multi asic env as namesapce handling isn't done. Not a requirement to handle that in IMO (as fast-reboot may not be supported on multi ASIC), but just FYI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the heads-up.
connected_routes.append(route) | ||
except Exception: | ||
ctx = click.get_current_context() | ||
ctx.fail("Unable to get connected routes from bgp") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What in intended behavior when we hit this exception? If this returns non-zero code to bash, fast-reboot script will still continue as the call to filer-routes is after set +e
in bash script. Do you still want the fast-reboot to proceed when routes are not filtered?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fast-reboot requires BGP is enabled, this will return non-zero code only in case BGP is not used and hence is not relevant for fast-reboot if I understand correctly.
However this won't fast-reboot to fail but it might make it exceed the dataplane downtime if filtering failed and switch has rather large amount of routes stored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will leave it to you. To me it sounds like this will impact dataplane downtime as the PR is fixing that when the connected routes aren't preserved. So if you wish you could set this as a required criteria - by moving the call the preserve before set +e. Just FYI.
output, ret = clicommon.run_command(cmd, return_cmd=True) | ||
if ret != 0: | ||
click.echo(output.rstrip('\n')) | ||
sys.exit(ret) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What in intended behavior when we hit this sys.exit? this returns non-zero code to bash, fast-reboot script will still continue as the call to filer-routes is after set +e in bash script. Do you still want the fast-reboot to proceed when routes are not filtered?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fast-reboot requires BGP is enabled, this will return non-zero code only in case BGP is not used and hence is not relevant for fast-reboot if I understand correctly.
However this won't fast-reboot to fail but it might make it exceed the dataplane downtime if filtering failed and switch has rather large amount of routes stored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, I think the bash script should check for the return code of the python script. And accordingly add a log to denote pass/fail status of route filtering.
Update sonic-utilities submodule pointer to include the following: * 32938665 [202205] Fast-reboot: preserve connected routes and set teamd timer to minimum. ([sonic-net#2744](sonic-net/sonic-utilities#2744)) Signed-off-by: dprital <drorp@nvidia.com>
Update sonic-utilities submodule pointer to include the following: * 32938665 [202205] Fast-reboot: preserve connected routes and set teamd timer to minimum. ([#2744](sonic-net/sonic-utilities#2744))
…2760) Part of sonic-net/sonic-buildimage#14583 Similar to #2744 - What I did Added a script to filter routes: preserve default routes (was already done as part of fast-reboot script) and connected routes. Set teamd timer to minimal allowed value (1 second) for fast-reboot. Both made in order to shorten dataplane downtime. - How I did it fast-reboot-filter-routes.py was added to preserve connected and default routes and is being called from fast-reboot script. teamd-timer is set when setting fast-reboot.
…2760) Part of sonic-net/sonic-buildimage#14583 Similar to #2744 - What I did Added a script to filter routes: preserve default routes (was already done as part of fast-reboot script) and connected routes. Set teamd timer to minimal allowed value (1 second) for fast-reboot. Both made in order to shorten dataplane downtime. - How I did it fast-reboot-filter-routes.py was added to preserve connected and default routes and is being called from fast-reboot script. teamd-timer is set when setting fast-reboot.
What I did
Added a script to filter routes: preserve default routes (was already done as part of fast-reboot script) and connected routes.
Set teamd timer to minimal allowed value (1 second) for fast-reboot.
Both made in order to shorten dataplane downtime.
How I did it
fast-reboot-filter-routes.py was added to preserve connected and default routes and is being called from fast-reboot script.
teamd-timer is set when setting fast-reboot.
How to verify it
Community fast-reboot test.
Previous command output (if the output of a command-line utility has changed)
New command output (if the output of a command-line utility has changed)