Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[warm boot finalizer] only wait for enabled components to reconcile #6454

Merged
merged 7 commits into from
Jan 15, 2021

Conversation

yxieca
Copy link
Contributor

@yxieca yxieca commented Jan 14, 2021

- Why I did it
Fix issue #6383

- How I did it

Define the component with its associated service. Only wait for components that have associated service enabled to reconcile during warm reboot.

Signed-off-by: Ying Xie ying.xie@microsoft.com

- How to verify it
Run warm reboot with the change (note that 'natsyncd' is no longer in the wait list:
/var/log/syslog.1:Jan 14 18:28:41.239710 str-7260cx3-acs-1 NOTICE root: WARMBOOT_FINALIZER : Wait for database to become ready...
/var/log/syslog.1:Jan 14 18:28:41.730254 str-7260cx3-acs-1 NOTICE root: WARMBOOT_FINALIZER : Database is ready...
/var/log/syslog.1:Jan 14 18:28:42.104362 str-7260cx3-acs-1 NOTICE root: WARMBOOT_FINALIZER : Restoring counters folder after warmboot...
/var/log/syslog.1:Jan 14 18:28:46.043684 str-7260cx3-acs-1 NOTICE root: WARMBOOT_FINALIZER : Waiting for components: ' neighsyncd bgp orchagent' to reconcile ...
/var/log/syslog:Jan 14 18:30:55.913965 str-7260cx3-acs-1 NOTICE root: WARMBOOT_FINALIZER : Tearing down control plane assistant ...
/var/log/syslog:Jan 14 18:30:59.123504 str-7260cx3-acs-1 NOTICE root: WARMBOOT_FINALIZER : Save in-memory database after warm reboot ...
/var/log/syslog:Jan 14 18:30:59.727347 str-7260cx3-acs-1 NOTICE root: WARMBOOT_FINALIZER : Finalizing warmboot...

  • 201811
  • 201911
  • 202006
  • 202012

Define the component with its associated service. Only wait for
components that have associated service enabled to reconcile
during warm reboot.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
COMPONENT_LIST=""
for cp in ${CP_LIST}; do
service=${RECONCILE_COMPONENTS[${cp}]}
status=$(show feature status | grep "^${service}" | awk '{ print $2 }')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic depends on the raw output format of show feature status. If anything changes in this show api, awk might return some other column value. Can this rather be obtained directly from the DB?

$ redis-cli -n 4 HMGET "FEATURE|nat" state
1) "disabled"

$ redis-cli -n 4 HMGET "FEATURE|bgp" state
1) "enabled"

...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea! changed. Thanks!

COMPONENT_LIST=""
for cp in ${CP_LIST}; do
service=${RECONCILE_COMPONENTS[${cp}]}
status=$(sonic-db-cli CONFIG_DB HGET "FEATURE|${service}" state)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indentation looks a little off.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, editing on the target dut. vim setting is wrong there.

vaibhavhd
vaibhavhd previously approved these changes Jan 14, 2021
@yxieca
Copy link
Contributor Author

yxieca commented Jan 15, 2021

retest mellanox please

@yxieca yxieca merged commit 054f5b7 into sonic-net:master Jan 15, 2021
@yxieca yxieca deleted the finalizer branch January 15, 2021 15:48
lguohan pushed a commit that referenced this pull request Jan 15, 2021
…6454)

* [warm boot finalizer] only wait for enabled components to reconcile

Define the component with its associated service. Only wait for components that have associated service enabled to reconcile during warm reboot.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
@vaibhavhd
Copy link
Contributor

This is needed for 201911 branch too. Without this fix finalizer spends more than 5 minutes to finish warmboot.

Letting finalizer wait for so long can cause:

  1. Delayed config save after reboot
  2. Delayed disable of system-wide warm_restart flag.

Checked in 2700 device running 201911 with this fix added:
Finalizer doesn't wait for natsyncd and completes processing in around 120s (vs 300s without fix).

Mar 23 20:51:32.923964 str-msn2700-04 NOTICE root: SONiC version 20191130.82 starting up...
Mar 23 20:51:45.785061 str-msn2700-04 NOTICE root: WARMBOOT_FINALIZER : Wait for database to become ready...
Mar 23 20:51:46.723882 str-msn2700-04 NOTICE root: WARMBOOT_FINALIZER : Database is ready...
Mar 23 20:51:49.807271 str-msn2700-04 NOTICE root: WARMBOOT_FINALIZER : Waiting for components: ' bgp orchagent neighsyncd' to reconcile ...
Mar 23 20:54:07.489718 str-msn2700-04 NOTICE root: WARMBOOT_FINALIZER : Tearing down control plane assistant ...
Mar 23 20:54:10.853318 str-msn2700-04 NOTICE root: WARMBOOT_FINALIZER : Save in-memory database after warm reboot ...
Mar 23 20:54:12.070074 str-msn2700-04 NOTICE root: WARMBOOT_FINALIZER : Finalizing warmboot...

abdosi pushed a commit that referenced this pull request Mar 31, 2022
…6454)

* [warm boot finalizer] only wait for enabled components to reconcile

Define the component with its associated service. Only wait for components that have associated service enabled to reconcile during warm reboot.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

warm reboot finalizer service could wait for not enabled component for 5 minutes
4 participants