-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[hostcfgd] Initialize Restart=
in feature's systemd config by the value of auto_restart
in CONFIG_DB
#10915
Merged
yozhao101
merged 5 commits into
sonic-net:master
from
yozhao101:initialize_restart_field_systemd
Jun 2, 2022
Merged
[hostcfgd] Initialize Restart=
in feature's systemd config by the value of auto_restart
in CONFIG_DB
#10915
yozhao101
merged 5 commits into
sonic-net:master
from
yozhao101:initialize_restart_field_systemd
Jun 2, 2022
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
… `hostcfgd` was started/restarted. Signed-off-by: Yong Zhao <yozhao@microsoft.com>
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
/AzurePipelines run Azure.sonic-buildimage |
Azure Pipelines successfully started running 1 pipeline(s). |
stepanblyschak Can you please help me review this PR? |
alexrallen Can you please help me review this PR? |
@yxieca Can you please help me review this PR? |
different namespace. Signed-off-by: Yong Zhao <yozhao@microsoft.com>
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
yxieca
approved these changes
Jun 1, 2022
yxieca
pushed a commit
that referenced
this pull request
Jun 17, 2022
…alue of `auto_restart` in `CONFIG_DB` (#10915) Why I did it Recently the nightly testing pipeline found that the autorestart test case was failed when it was run against master image. The reason is Restart= field in each container's systemd configuration file was set to Restart=no even the value of auto_restart field in FEATURE table of CONFIG_DB is enabled. This issue introduced by #10168 can be reproduced by the following steps: Issues the config command to disable the auto-restart feature of a container Runs command config reload or config reload minigraph to enable auto-restart of the container Checks Restart= field in the container's systemd config file mentioned in step 1 by running the command sudo systemctl cat <container_name>.service Initially this PR (#10168) wants to revert the changes proposed by this: #8861. However, it did not fully revert all the changes. How I did it When hostcfgd started or was restarted, the Restart= field in each container's systemd configuration file should be initialized according to the value of auto_restart field in FEATURE table of CONFIG_DB. How to verify it I verified this change by running auto-restart test case against newly built master image and also ran the unittest:
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Signed-off-by: Yong Zhao yozhao@microsoft.com
Why I did it
Recently the nightly testing pipeline found that the
autorestart
test case was failed when it was run against master image. The reason isRestart=
field in each container's systemd configuration file was set toRestart=no
even the value ofauto_restart
field inFEATURE
table ofCONFIG_DB
isenabled
.This issue introduced by #10168 can be reproduced by the following steps:
config
command to disable theauto-restart
feature of a containerconfig reload
orconfig reload minigraph
to enableauto-restart
of the containerRestart=
field in the container's systemd config file mentioned in step 1 by running the commandsudo systemctl cat <container_name>.service
Initially this PR (#10168) wants to revert the changes proposed by this: #8861. However, it did not fully revert all the changes.
Following is the full story to tell how this regression did happen:
Step 1: Initially the field
Restart=always
was set in each container's systemd configuration file. Then Nvidia team submitted aPR ([hostcfgd] Configure service auto-restart in hostcfgd. by stepanblyschak · Pull Request #5744 · Azure/sonic-buildimage (github.com)) to dynamically change this field according to the value of
auto_restart
field in CONFIG_DB. I agreed with this proposal.In this PR, the
Restart=
field in each container's systemd configuration file was set when eitherhostcfgd
service was restarted(https://github.com/stepanblyschak/sonic-buildimage/blob/32df167af7e5c494b4a8585abebbcd65f05ef0a3/src/sonic-host-services/scripts/hostcfgd#L150) or a user issued
config
command to change theauto_restart
field inCONFIG_DB
.If
hostcfgd
service was started/restarted due to device was rebooted or other reasons, the value ofRestart=
field in systemdconfiguration file will be reset according to value of
auto_restart
field inCONFIG_DB
. After this, systemd daemon should bereloaded since its configuration files are changed.
Step 2: However, reloading systemd daemon will need around 10 seconds as stated by this issue
[hostcfgd] hoscfgd doesn't honor CFG DB updates if they arrive in a specific time interval · Issue #8619 · Azure/sonic-buildimage (github.com).
Since
hostcfgd
service will listen to the notifications fromCONFIG_DB
only after systemd daemon was reloaded, any change in tables of CONFIG_DB during systemd daemon reload will be lost. As such, another PR was submitted to address this issue[hostcfgd] Fixed the brief blackout in hostcfgd using SubscriberStateTable by vivekreddynv · Pull Request #8861 · Azure/sonic-buildimage (github.com).
In this PR,
SubscriberStateTable and Selector
were used to send and handle notifications fromCONFIG_DB
instead ofconfig_db.subscribe() and config_db.listen()
. The benefits of this change are: any existing data and new change in tables ofCONFIG_DB
will be processed; do not need explicitly initializeRestart=
field in each container's systemd configuration file.Step 3: However,
SusbscriberStateTable
will create multiple file descriptors against the Redis DB which is inefficient compared toConfigDBConnector
which only opens a single file descriptor.As discussed in Step 2, disadvantages of
config_db.subcribe() and config_db.listen()
is that any change in the tables ofCONFIG_DB
will be lost beforeconfig_db.listen()
was called. Then Nvidia team submitted a PR to fix this issue:Add API endpoints to ConfigDBConnector to support pre-loading data without blackout by alexrallen · Pull Request #587 · Azure/sonic-swss-common (github.com). At the same time, a PR was submitted to revert the change proposed in Step 2: [hostcfgd] Move hostcfgd back to ConfigDBConnector for subscribing to updates by alexrallen · Pull Request #10168 · Azure/sonic-buildimage (github.com). However, the change was not fully reverted.
Specifically in this PR, the
Restart=
field in each container's systemd configuration file only needs to be initialized according to the value ofauto_restart
field inCONFIG_DB
. But the change (line 269 ~ 282) proposed in Step 2 was not removed.How I did it
When
hostcfgd
started or was restarted, theRestart=
field in each container's systemd configuration file should be initialized according to the value ofauto_restart
field inFEATURE
table ofCONFIG_DB
.How to verify it
I verified this change by running
auto-restart
test case against newly builtmaster
image and also ran the unittest:Which release branch to backport (provide reason below if selected)
N/A
Description for the changelog
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)