Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for reconciliation after warm restart #76

Merged
merged 2 commits into from
Jul 18, 2022

Conversation

zjswhhh
Copy link
Contributor

@zjswhhh zjswhhh commented May 16, 2022

Description of PR

Summary:
Fixes # (issue)

This PR is to add support for linkmgrd process reconciliation after warm restart.

sign-off: Jing Zhang zhangjing@microsoft.com

Type of change

  • Bug fix
  • New feature
  • Doc/Design
  • Unit test

Approach

What is the motivation for this PR?

One step of warm reboot procedure for dual ToR is to config the switch into manual mode. Before warm reboot finalizer executes config save, we want to config the switch back into auto mode, so config_db.json will be consistent before and after the reboot.

How did you do it?

  1. When linkmgrd is initializing, get the systemwide warm reboot flag from WARM_RESTART_ENABLE_TABLE. If flag == true, start a reconciliation timer.
  2. Maintenance a mux port count based on MUX_CABLE|PORTNAME count. When one port completes reconciliation, if warm restart flag == true, config it back into auto mode, reduce reconciliation port count by 1.
  3. If reconciliation timer expires or port count == 0, set state to reconciled in WARM_RESTART_TABLE|linkmgrd.

How did you verify/test it?

  1. Unit tests
  2. Tested on dual ToR testbed. Ports were auto mode after warm restart completed. Entry WARM_RESTART_TABLE|linkmgrd was added as expected.

Any platform specific information?

Documentation

@zjswhhh zjswhhh force-pushed the warmreboot branch 2 times, most recently from b88c275 to 4a5aa6a Compare May 18, 2022 22:36
Copy link

@vaibhavhd vaibhavhd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments posted to clear my doubts as I I am not well versed with this part of code.

Other comments:
Please define what reconciliation means in this context.
Will there be separate PR for warmboot_finalizer where it should also wait for linkmgrd to reconcile?

src/DbInterface.h Outdated Show resolved Hide resolved
src/common/MuxConfig.h Outdated Show resolved Hide resolved
src/DbInterface.cpp Show resolved Hide resolved
src/MuxManager.cpp Show resolved Hide resolved
src/link_manager/LinkManagerStateMachineActiveStandby.cpp Outdated Show resolved Hide resolved
src/MuxManager.cpp Show resolved Hide resolved
@vaibhavhd
Copy link

Please address the PR merge conflicts.

@zjswhhh
Copy link
Contributor Author

zjswhhh commented May 27, 2022

Please address the PR merge conflicts.

Updated accordingly.

@zjswhhh zjswhhh requested a review from vaibhavhd May 31, 2022 14:53
Copy link

@vaibhavhd vaibhavhd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM, some non-blocking comments.

Will you raise a separate PR for sonic-utilities where FINALIZER would wait for linkmgrd to reconcile? Or is that not needed now since 10s timeout will trigger sooner than FINALIZER anyway?

src/DbInterface.cpp Show resolved Hide resolved
src/MuxManager.cpp Show resolved Hide resolved
src/common/MuxConfig.h Show resolved Hide resolved
src/DbInterface.cpp Show resolved Hide resolved
@zjswhhh
Copy link
Contributor Author

zjswhhh commented Jul 18, 2022

Mostly LGTM, some non-blocking comments.

Will you raise a separate PR for sonic-utilities where FINALIZER would wait for linkmgrd to reconcile? Or is that not needed now since 10s timeout will trigger sooner than FINALIZER anyway?

Thanks for the reminder! I think we need raise a separate PR for it, will do!

@qiluo-msft
Copy link

This commit could not be cleanly cherry-picked to 202012. Please submit another PR.

zjswhhh added a commit to zjswhhh/sonic-linkmgrd that referenced this pull request Jul 21, 2022
Description of PR
Summary:
Fixes # (issue)

This PR is to add support for linkmgrd process reconciliation after warm restart.

sign-off: Jing Zhang zhangjing@microsoft.com

Type of change
 Bug fix
 New feature
 Doc/Design
 Unit test
Approach
What is the motivation for this PR?
One step of warm reboot procedure for dual ToR is to config the switch into manual mode. Before warm reboot finalizer executes config save, we want to config the switch back into auto mode, so config_db.json will be consistent before and after the reboot.

How did you do it?
When linkmgrd is initializing, get the systemwide warm reboot flag from WARM_RESTART_ENABLE_TABLE. If flag == true, start a reconciliation timer.
Maintenance a mux port count based on MUX_CABLE|PORTNAME count. When one port completes reconciliation, if warm restart flag == true, config it back into auto mode, reduce reconciliation port count by 1.
If reconciliation timer expires or port count == 0, set state to reconciled in WARM_RESTART_TABLE|linkmgrd.
How did you verify/test it?
Unit tests
Tested on dual ToR testbed. Ports were auto mode after warm restart completed. Entry WARM_RESTART_TABLE|linkmgrd was added as expected.
yxieca pushed a commit that referenced this pull request Jul 22, 2022
Description of PR
Summary:
Fixes # (issue)

This PR is to add support for linkmgrd process reconciliation after warm restart.

sign-off: Jing Zhang zhangjing@microsoft.com

Type of change
 Bug fix
 New feature
 Doc/Design
 Unit test
Approach
What is the motivation for this PR?
One step of warm reboot procedure for dual ToR is to config the switch into manual mode. Before warm reboot finalizer executes config save, we want to config the switch back into auto mode, so config_db.json will be consistent before and after the reboot.

How did you do it?
When linkmgrd is initializing, get the systemwide warm reboot flag from WARM_RESTART_ENABLE_TABLE. If flag == true, start a reconciliation timer.
Maintenance a mux port count based on MUX_CABLE|PORTNAME count. When one port completes reconciliation, if warm restart flag == true, config it back into auto mode, reduce reconciliation port count by 1.
If reconciliation timer expires or port count == 0, set state to reconciled in WARM_RESTART_TABLE|linkmgrd.
How did you verify/test it?
Unit tests
Tested on dual ToR testbed. Ports were auto mode after warm restart completed. Entry WARM_RESTART_TABLE|linkmgrd was added as expected.
zjswhhh added a commit to sonic-net/sonic-buildimage that referenced this pull request Jul 28, 2022
Spanning from sonic-net/sonic-linkmgrd#76, this PR is to update warm restart finalizer to wait for linkmgrd to be reconciled.

sign-off: Jing Zhang zhangjing@microsoft.com

Why I did it
To make sure finalizer save config after linkmgrd's reconciliation.

How I did it
Add linkmgrd to the reconciliation wait list of warmboot finalizer.

How to verify it
Verified on lab device, linkmgrd reconciled as expected.
yxieca pushed a commit to sonic-net/sonic-buildimage that referenced this pull request Jul 28, 2022
Spanning from sonic-net/sonic-linkmgrd#76, this PR is to update warm restart finalizer to wait for linkmgrd to be reconciled.

sign-off: Jing Zhang zhangjing@microsoft.com

Why I did it
To make sure finalizer save config after linkmgrd's reconciliation.

How I did it
Add linkmgrd to the reconciliation wait list of warmboot finalizer.

How to verify it
Verified on lab device, linkmgrd reconciled as expected.
zjswhhh added a commit to zjswhhh/sonic-linkmgrd that referenced this pull request Aug 3, 2022
…ic-net#97)

Picking up commit below from master branch:
9044962 Jing Zhang Mon Jul 18 15:38:04 2022 -0700 Add support for reconciliation after warm restart (sonic-net#76)

Description of PR
Summary:
Fixes # (issue)

This PR is to add support for linkmgrd process reconciliation after warm restart.

sign-off: Jing Zhang zhangjing@microsoft.com

What is the motivation for this PR?
One step of warm reboot procedure for dual ToR is to config the switch into manual mode. Before warm reboot finalizer executes config save, we want to config the switch back into auto mode, so config_db.json will be consistent before and after the reboot.

How did you do it?
When linkmgrd is initializing, get the systemwide warm reboot flag from WARM_RESTART_ENABLE_TABLE. If flag == true, start a reconciliation timer.
Maintenance a mux port count based on MUX_CABLE|PORTNAME count. When one port completes reconciliation, if warm restart flag == true, config it back into auto mode, reduce reconciliation port count by 1.
If reconciliation timer expires or port count == 0, set state to reconciled in WARM_RESTART_TABLE|linkmgrd.
How did you verify/test it?
Unit tests
Tested on dual ToR testbed. Ports were auto mode after warm restart completed. Entry WARM_RESTART_TABLE|linkmgrd was added as expected.
qiluo-msft pushed a commit to sonic-net/sonic-buildimage that referenced this pull request Aug 9, 2022
Spanning from sonic-net/sonic-linkmgrd#76, this PR is to update warm restart finalizer to wait for linkmgrd to be reconciled.

sign-off: Jing Zhang zhangjing@microsoft.com

Why I did it
To make sure finalizer save config after linkmgrd's reconciliation.

How I did it
Add linkmgrd to the reconciliation wait list of warmboot finalizer.

How to verify it
Verified on lab device, linkmgrd reconciled as expected.
skbarista pushed a commit to skbarista/sonic-buildimage that referenced this pull request Aug 17, 2022
…net#11477)

Spanning from sonic-net/sonic-linkmgrd#76, this PR is to update warm restart finalizer to wait for linkmgrd to be reconciled.

sign-off: Jing Zhang zhangjing@microsoft.com

Why I did it
To make sure finalizer save config after linkmgrd's reconciliation.

How I did it
Add linkmgrd to the reconciliation wait list of warmboot finalizer.

How to verify it
Verified on lab device, linkmgrd reconciled as expected.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants