Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[202205] Fast-reboot: preserve connected routes and set teamd timer to minimum. #2744

Merged

Conversation

stepanblyschak
Copy link
Contributor

What I did

Added a script to filter routes: preserve default routes (was already done as part of fast-reboot script) and connected routes.
Set teamd timer to minimal allowed value (1 second) for fast-reboot.
Both made in order to shorten dataplane downtime.

How I did it

fast-reboot-filter-routes.py was added to preserve connected and default routes and is being called from fast-reboot script.
teamd-timer is set when setting fast-reboot.

How to verify it

Community fast-reboot test.

Previous command output (if the output of a command-line utility has changed)

New command output (if the output of a command-line utility has changed)

@stepanblyschak stepanblyschak changed the title fast-reboot: set teamd timer to minimum, preserve connected routes [202205] Fast-reboot: preserve connected routes and set teamd timer to minimum. Mar 17, 2023
@stepanblyschak stepanblyschak marked this pull request as draft March 17, 2023 16:57
@stepanblyschak
Copy link
Contributor Author

@arfeigin to remove draft indication once all testing is done

@dprital dprital requested a review from vaibhavhd March 19, 2023 12:05
@stepanblyschak stepanblyschak marked this pull request as ready for review March 20, 2023 12:56
@dprital dprital requested a review from prsunny March 20, 2023 18:54
@dprital
Copy link
Collaborator

dprital commented Mar 20, 2023

@prsunny , @vaibhavhd , Can you please review ?

scripts/fast-reboot-filter-routes.py Outdated Show resolved Hide resolved

def generate_default_route_entries():
db = ConfigDBConnector()
db.db_connect(db.APPL_DB)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would fail on multi asic env as namesapce handling isn't done. Not a requirement to handle that in IMO (as fast-reboot may not be supported on multi ASIC), but just FYI.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the heads-up.

scripts/fast-reboot-filter-routes.py Show resolved Hide resolved
connected_routes.append(route)
except Exception:
ctx = click.get_current_context()
ctx.fail("Unable to get connected routes from bgp")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What in intended behavior when we hit this exception? If this returns non-zero code to bash, fast-reboot script will still continue as the call to filer-routes is after set +e in bash script. Do you still want the fast-reboot to proceed when routes are not filtered?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fast-reboot requires BGP is enabled, this will return non-zero code only in case BGP is not used and hence is not relevant for fast-reboot if I understand correctly.
However this won't fast-reboot to fail but it might make it exceed the dataplane downtime if filtering failed and switch has rather large amount of routes stored.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will leave it to you. To me it sounds like this will impact dataplane downtime as the PR is fixing that when the connected routes aren't preserved. So if you wish you could set this as a required criteria - by moving the call the preserve before set +e. Just FYI.

output, ret = clicommon.run_command(cmd, return_cmd=True)
if ret != 0:
click.echo(output.rstrip('\n'))
sys.exit(ret)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What in intended behavior when we hit this sys.exit? this returns non-zero code to bash, fast-reboot script will still continue as the call to filer-routes is after set +e in bash script. Do you still want the fast-reboot to proceed when routes are not filtered?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fast-reboot requires BGP is enabled, this will return non-zero code only in case BGP is not used and hence is not relevant for fast-reboot if I understand correctly.
However this won't fast-reboot to fail but it might make it exceed the dataplane downtime if filtering failed and switch has rather large amount of routes stored.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, I think the bash script should check for the return code of the python script. And accordingly add a log to denote pass/fail status of route filtering.

scripts/fast-reboot-filter-routes.py Outdated Show resolved Hide resolved
@dprital dprital requested a review from vaibhavhd March 21, 2023 17:38
@vaibhavhd vaibhavhd merged commit 3293866 into sonic-net:202205 Mar 21, 2023
dprital added a commit to dprital/sonic-buildimage that referenced this pull request Mar 21, 2023
Update sonic-utilities submodule pointer to include the following:
* 32938665 [202205] Fast-reboot: preserve connected routes and set teamd timer to minimum. ([sonic-net#2744](sonic-net/sonic-utilities#2744))

Signed-off-by: dprital <drorp@nvidia.com>
prsunny pushed a commit to sonic-net/sonic-buildimage that referenced this pull request Mar 22, 2023
Update sonic-utilities submodule pointer to include the following:
* 32938665 [202205] Fast-reboot: preserve connected routes and set teamd timer to minimum. ([#2744](sonic-net/sonic-utilities#2744))
liat-grozovik pushed a commit that referenced this pull request Apr 18, 2023
…2760)

Part of sonic-net/sonic-buildimage#14583
Similar to #2744

- What I did
Added a script to filter routes: preserve default routes (was already done as part of fast-reboot script) and connected routes.
Set teamd timer to minimal allowed value (1 second) for fast-reboot.
Both made in order to shorten dataplane downtime.

- How I did it
fast-reboot-filter-routes.py was added to preserve connected and default routes and is being called from fast-reboot script.
teamd-timer is set when setting fast-reboot.
StormLiangMS pushed a commit that referenced this pull request Apr 20, 2023
…2760)

Part of sonic-net/sonic-buildimage#14583
Similar to #2744

- What I did
Added a script to filter routes: preserve default routes (was already done as part of fast-reboot script) and connected routes.
Set teamd timer to minimal allowed value (1 second) for fast-reboot.
Both made in order to shorten dataplane downtime.

- How I did it
fast-reboot-filter-routes.py was added to preserve connected and default routes and is being called from fast-reboot script.
teamd-timer is set when setting fast-reboot.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants