Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[hostcfgd] Initialize Restart= in feature's systemd config by the value of auto_restart in CONFIG_DB #10915

Merged

Conversation

yozhao101
Copy link
Contributor

@yozhao101 yozhao101 commented May 24, 2022

Signed-off-by: Yong Zhao yozhao@microsoft.com

Why I did it

Recently the nightly testing pipeline found that the autorestart test case was failed when it was run against master image. The reason is Restart= field in each container's systemd configuration file was set to Restart=no even the value of auto_restart field in FEATURE table of CONFIG_DB is enabled.

This issue introduced by #10168 can be reproduced by the following steps:

  1. Issues the config command to disable the auto-restart feature of a container
  2. Runs command config reload or config reload minigraph to enable auto-restart of the container
  3. Checks Restart= field in the container's systemd config file mentioned in step 1 by running the command
    sudo systemctl cat <container_name>.service

Initially this PR (#10168) wants to revert the changes proposed by this: #8861. However, it did not fully revert all the changes.

Following is the full story to tell how this regression did happen:

Step 1: Initially the field Restart=always was set in each container's systemd configuration file. Then Nvidia team submitted a
PR ([hostcfgd] Configure service auto-restart in hostcfgd. by stepanblyschak · Pull Request #5744 · Azure/sonic-buildimage (github.com)) to dynamically change this field according to the value of auto_restart field in CONFIG_DB. I agreed with this proposal.

In this PR, the Restart= field in each container's systemd configuration file was set when either hostcfgd service was restarted
(https://github.com/stepanblyschak/sonic-buildimage/blob/32df167af7e5c494b4a8585abebbcd65f05ef0a3/src/sonic-host-services/scripts/hostcfgd#L150) or a user issued config command to change the auto_restart field in CONFIG_DB.

If hostcfgd service was started/restarted due to device was rebooted or other reasons, the value of Restart= field in systemd
configuration file will be reset according to value of auto_restart field in CONFIG_DB. After this, systemd daemon should be
reloaded since its configuration files are changed.

Step 2: However, reloading systemd daemon will need around 10 seconds as stated by this issue
[hostcfgd] hoscfgd doesn't honor CFG DB updates if they arrive in a specific time interval · Issue #8619 · Azure/sonic-buildimage (github.com).

Since hostcfgd service will listen to the notifications from CONFIG_DB only after systemd daemon was reloaded, any change in tables of CONFIG_DB during systemd daemon reload will be lost. As such, another PR was submitted to address this issue
[hostcfgd] Fixed the brief blackout in hostcfgd using SubscriberStateTable by vivekreddynv · Pull Request #8861 · Azure/sonic-buildimage (github.com).

In this PR, SubscriberStateTable and Selector were used to send and handle notifications from CONFIG_DB instead of config_db.subscribe() and config_db.listen(). The benefits of this change are: any existing data and new change in tables of CONFIG_DB will be processed; do not need explicitly initialize Restart= field in each container's systemd configuration file.

Step 3: However, SusbscriberStateTable will create multiple file descriptors against the Redis DB which is inefficient compared to ConfigDBConnector which only opens a single file descriptor.

As discussed in Step 2, disadvantages of config_db.subcribe() and config_db.listen() is that any change in the tables of
CONFIG_DB will be lost before config_db.listen() was called. Then Nvidia team submitted a PR to fix this issue:
Add API endpoints to ConfigDBConnector to support pre-loading data without blackout by alexrallen · Pull Request #587 · Azure/sonic-swss-common (github.com). At the same time, a PR was submitted to revert the change proposed in Step 2: [hostcfgd] Move hostcfgd back to ConfigDBConnector for subscribing to updates by alexrallen · Pull Request #10168 · Azure/sonic-buildimage (github.com). However, the change was not fully reverted.

Specifically in this PR, the Restart= field in each container's systemd configuration file only needs to be initialized according to the value of auto_restart field in CONFIG_DB. But the change (line 269 ~ 282) proposed in Step 2 was not removed.

How I did it

When hostcfgd started or was restarted, the Restart= field in each container's systemd configuration file should be initialized according to the value of auto_restart field in FEATURE table of CONFIG_DB.

How to verify it

I verified this change by running auto-restart test case against newly built master image and also ran the unittest:

    tests/determine-reboot-cause_test.py .........          [ 20%]
    tests/procdockerstatsd_test.py .                            [ 22%]
    tests/caclmgrd/caclmgrd_bfd_test.py .                  [ 25%]
    tests/caclmgrd/caclmgrd_dhcp_test.py ............      [ 52%]
    tests/hostcfgd/hostcfgd_radius_test.py ..              [ 56%]
    tests/hostcfgd/hostcfgd_tacacs_test.py .              [ 59%]
    tests/hostcfgd/hostcfgd_test.py ..................            [100%]

Which release branch to backport (provide reason below if selected)

N/A

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

… `hostcfgd` was

started/restarted.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
@yozhao101 yozhao101 requested a review from yxieca May 24, 2022 23:59
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
@yozhao101 yozhao101 marked this pull request as ready for review May 30, 2022 07:38
@yozhao101 yozhao101 requested a review from lguohan as a code owner May 30, 2022 07:38
@yozhao101
Copy link
Contributor Author

/AzurePipelines run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@yozhao101
Copy link
Contributor Author

stepanblyschak Can you please help me review this PR?

@yozhao101
Copy link
Contributor Author

alexrallen Can you please help me review this PR?

@yozhao101 yozhao101 requested review from a1exwang and removed request for a1exwang May 31, 2022 14:51
@yozhao101
Copy link
Contributor Author

@yxieca Can you please help me review this PR?

different namespace.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
@yozhao101 yozhao101 merged commit 4ef8b38 into sonic-net:master Jun 2, 2022
yozhao101 added a commit that referenced this pull request Jun 2, 2022
…by the value of `auto_restart` in `CONFIG_DB` (#10915)"

This reverts commit 4ef8b38.
@yozhao101 yozhao101 deleted the initialize_restart_field_systemd branch June 2, 2022 22:59
yxieca pushed a commit that referenced this pull request Jun 17, 2022
…alue of `auto_restart` in `CONFIG_DB` (#10915)

Why I did it
Recently the nightly testing pipeline found that the autorestart test case was failed when it was run against master image. The reason is Restart= field in each container's systemd configuration file was set to Restart=no even the value of auto_restart field in FEATURE table of CONFIG_DB is enabled.

This issue introduced by #10168 can be reproduced by the following steps:

Issues the config command to disable the auto-restart feature of a container
Runs command config reload or config reload minigraph to enable auto-restart of the container
Checks Restart= field in the container's systemd config file mentioned in step 1 by running the command
sudo systemctl cat <container_name>.service
Initially this PR (#10168) wants to revert the changes proposed by this: #8861. However, it did not fully revert all the changes.

How I did it
When hostcfgd started or was restarted, the Restart= field in each container's systemd configuration file should be initialized according to the value of auto_restart field in FEATURE table of CONFIG_DB.

How to verify it
I verified this change by running auto-restart test case against newly built master image and also ran the unittest:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants