Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[hostcfgd] Fixed the brief blackout in hostcfgd using SubscriberStateTable #8861

Merged
merged 22 commits into from
Oct 28, 2021

Conversation

vivekrnv
Copy link
Contributor

@vivekrnv vivekrnv commented Sep 28, 2021

Why I did it

Fixes #8619

How I did it

  1. Listening to CFG_DB notifications was migrated from ConfigDBConnector to SubscriberStateTable & Select
  2. This change in design helped me to remove update_all_features_config which was roughly taking a 5-10 sec time to execute and thus the reason for blackout
  3. Edited FeatureHandler, Feature & NtpCfgd classes to suit this design
  4. Added corresponding mocks and UT's

Changes made to classes other than HostConfigDaemon:
With the previous design, the initially read data from the config db was applied by using hardcoded methods even before the config_db.listen() was called. For Eg: update_all_features_config for FeatureHandler and load() named methods for NtpCfgd etc

But with this design, since the existing data is read and given out as a notification by SubscriberStateTable, i've pretty much removed these hardcoded methods. Thus changes made to these class will be around adapting them to the new design and no change in the actual functionality .

How to verify it

UT's:

tests/determine-reboot-cause_test.py .........                                                                                                                                                                                        [ 29%]
tests/procdockerstatsd_test.py .                                                                                                                                                                                                      [ 32%]
tests/caclmgrd/caclmgrd_dhcp_test.py ......                                                                                                                                                                                           [ 51%]
tests/hostcfgd/hostcfgd_radius_test.py ..                                                                                                                                                                                             [ 58%]
tests/hostcfgd/hostcfgd_test.py .............                                                                                                                                                                                         [100%]

Verified manually,

Sep 10 22:53:25.662621 sonic INFO systemd[1]: hostcfgd.service: Succeeded.
Sep 10 22:55:04.127719 sonic INFO /hostcfgd: ConfigDB connect success
Sep 10 22:55:04.128108 sonic INFO /hostcfgd: KdumpCfg init ...
Sep 10 22:55:04.148819 sonic INFO /hostcfgd: Waiting for systemctl to finish initialization
Sep 10 22:55:04.163452 sonic INFO /hostcfgd: systemctl has finished initialization -- proceeding ...
Sep 10 22:55:04.163834 sonic INFO /hostcfgd: Kdump handler...
Sep 10 22:55:04.164019 sonic INFO /hostcfgd: Kdump global configuration update
Sep 10 22:55:04.758784 sonic INFO hostcfgd[184471]: kdump is already disabled
Sep 10 22:55:04.758876 sonic INFO hostcfgd[184471]: Kdump is already disabled
Sep 10 22:55:05.182021 sonic INFO hostcfgd[184511]: Kdump configuration has been updated in the startup configuration
Sep 10 22:55:05.596919 sonic INFO hostcfgd[184528]: Kdump configuration has been updated in the startup configuration
Sep 10 22:55:06.140627 sonic INFO /hostcfgd: Feature nat is stopped and disabled
Sep 10 22:55:06.642629 sonic INFO /hostcfgd: Feature telemetry is enabled and started
Sep 10 22:55:07.101297 sonic INFO /hostcfgd: Feature pmon is enabled and started
Sep 10 22:55:07.554366 sonic INFO /hostcfgd: Feature database is enabled and started
Sep 10 22:55:08.009329 sonic INFO /hostcfgd: Feature mgmt-framework is enabled and started
Sep 10 22:55:08.394952 sonic INFO /hostcfgd: Feature macsec is stopped and disabled
Sep 10 22:55:08.782853 sonic INFO /hostcfgd: Feature snmp is enabled and started
Sep 10 22:55:09.205381 sonic INFO /hostcfgd: Feature teamd is enabled and started
Sep 10 22:55:09.224877 sonic INFO /hostcfgd: Feature what-just-happened is enabled and started
Sep 10 22:55:09.627929 sonic INFO /hostcfgd: Feature lldp is enabled and started
Sep 10 22:55:10.086993 sonic INFO /hostcfgd: Feature swss is enabled and started
Sep 10 22:55:10.170312 sonic INFO /hostcfgd: cmd - service aaastatsd stop
Sep 10 22:55:11.012236 sonic INFO /hostcfgd: cmd - service aaastatsd stop
Sep 10 22:55:12.225946 sonic INFO /hostcfgd: Feature bgp is enabled and started
Sep 10 22:55:12.712792 sonic INFO /hostcfgd: Feature dhcp_relay is enabled and started
Sep 10 22:55:13.166656 sonic INFO /hostcfgd: Feature sflow is stopped and disabled
Sep 10 22:55:13.593639 sonic INFO /hostcfgd: Feature radv is enabled and started
Sep 10 22:55:14.034106 sonic INFO /hostcfgd: Feature syncd is enabled and started
Sep 10 22:55:14.113064 sonic INFO /hostcfgd: cmd - service aaastatsd stop
Sep 10 22:55:14.863601 sonic INFO /hostcfgd: RADIUS_SERVER update: key: 10.10.10.1, op: SET, data: {'auth_type': 'pap', 'passkey': 'p*****', 'retransmit': '1', 'timeout': '1'}
Sep 10 22:55:14.938605 sonic INFO /hostcfgd: cmd - service aaastatsd stop
Sep 10 22:55:15.667545 sonic INFO /hostcfgd: RADIUS_SERVER update: key: 10.10.10.3, op: SET, data: {'auth_type': 'chap', 'passkey': 'p*****', 'retransmit': '2', 'timeout': '2'}
Sep 10 22:55:15.667801 sonic INFO /hostcfgd: RADIUS (NAS) IP change - key:eth0, current global info {}
Sep 10 22:55:15.746531 sonic INFO /hostcfgd: cmd - service aaastatsd stop
Sep 10 23:04:47.435340 sonic INFO /hostcfgd: ntp server update key 0.debian.pool.ntp.org
Sep 10 23:04:47.435661 sonic INFO /hostcfgd: ntp server update, restarting ntp-config, ntp servers configured {'0.debian.pool.ntp.org'}
Sep 10 23:04:47.866394 sonic INFO /hostcfgd: NTP GLOBAL Update
Sep 10 23:04:47.866557 sonic INFO /hostcfgd: ntp global update for source intf old {''} new {'eth0', 'Loopback0'}, restarting ntp-config
Sep 10 23:16:25.157600 sonic INFO /hostcfgd: Running cmd: 'sudo systemctl unmask sflow.service'
Sep 10 23:16:25.178472 sonic INFO hostcfgd[192106]: Removed /etc/systemd/system/sflow.service.
Sep 10 23:16:25.582018 sonic INFO /hostcfgd: Running cmd: 'sudo systemctl enable sflow.service'
Sep 10 23:16:25.604534 sonic INFO hostcfgd[192123]: Created symlink /etc/systemd/system/sonic.target.wants/sflow.service → /lib/systemd/system/sflow.service.
Sep 10 23:16:26.029416 sonic INFO /hostcfgd: Running cmd: 'sudo systemctl start sflow.service'
Sep 10 23:16:26.691927 sonic INFO /hostcfgd: Feature sflow is enabled and started

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106

Required for 202012 & 202106, but a clean cherry-pick to these branches is not possible. Will raise separate PR's, once this is approved

Description for the changelog

A picture of a cute animal (not mandatory but encouraged)

vivekrnv and others added 18 commits September 3, 2021 22:19
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
@lgtm-com
Copy link

lgtm-com bot commented Sep 28, 2021

This pull request introduces 3 alerts when merging c17c8f4 into 6cbdf11 - view on LGTM.com

new alerts:

  • 1 for Too few arguments in formatting call
  • 1 for Suspicious unused loop iteration variable
  • 1 for Variable defined multiple times

Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
@vivekrnv
Copy link
Contributor Author

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@liat-grozovik
Copy link
Collaborator

@qiluo-msft could you please help to review?

@liat-grozovik
Copy link
Collaborator

This pull request introduces 3 alerts when merging c17c8f4 into 6cbdf11 - view on LGTM.com

new alerts:

  • 1 for Too few arguments in formatting call
  • 1 for Suspicious unused loop iteration variable
  • 1 for Variable defined multiple times

@vivekreddynv please fix LGTM issues

@azure-pipelines
Copy link

You have several pipelines (over 10) configured to build pull requests in this repository. Specify which pipelines you would like to run by using /azp run [pipelines] command. You can specify multiple pipelines using a comma separated list.

@vivekrnv
Copy link
Contributor Author

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link

You have several pipelines (over 10) configured to build pull requests in this repository. Specify which pipelines you would like to run by using /azp run [pipelines] command. You can specify multiple pipelines using a comma separated list.

@vivekrnv
Copy link
Contributor Author

@qiluo-msft, Can you help resolve the build failures happening because of sonic-device-data_1.0-1_all.deb

It should be resolved now by #9078

Thanks @qiluo-msft, Can you restart the pipeline? /azpw run doesn't seen to start them

@vivekrnv
Copy link
Contributor Author

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link

You have several pipelines (over 10) configured to build pull requests in this repository. Specify which pipelines you would like to run by using /azp run [pipelines] command. You can specify multiple pipelines using a comma separated list.

@liat-grozovik
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

You have several pipelines (over 10) configured to build pull requests in this repository. Specify which pipelines you would like to run by using /azp run [pipelines] command. You can specify multiple pipelines using a comma separated list.

@vivekrnv
Copy link
Contributor Author

@qiluo-msft Can you sign this off and also #9031,

No change in the hostcfgd script and only had to raise a separate PR because of tests

@qiluo-msft qiluo-msft merged commit 3788294 into sonic-net:master Oct 28, 2021
@vivekrnv vivekrnv mentioned this pull request Oct 29, 2021
5 tasks
qiluo-msft pushed a commit that referenced this pull request Oct 29, 2021
…iberStateTable (#9031)

#### Why I did it
Ported #8861 to 202106 as it couldn't be cherry-pick directly
lguohan pushed a commit that referenced this pull request Nov 2, 2021
Missing comment change which is supposed to arrive with #8861 is added here

Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
qiluo-msft pushed a commit that referenced this pull request Nov 23, 2021
…iberStateTable (#9228)

#### Why I did it
Backporting #8861 to 202012
yozhao101 added a commit that referenced this pull request Jun 2, 2022
…alue of `auto_restart` in `CONFIG_DB` (#10915)

Why I did it
Recently the nightly testing pipeline found that the autorestart test case was failed when it was run against master image. The reason is Restart= field in each container's systemd configuration file was set to Restart=no even the value of auto_restart field in FEATURE table of CONFIG_DB is enabled.

This issue introduced by #10168 can be reproduced by the following steps:

Issues the config command to disable the auto-restart feature of a container
Runs command config reload or config reload minigraph to enable auto-restart of the container
Checks Restart= field in the container's systemd config file mentioned in step 1 by running the command
sudo systemctl cat <container_name>.service
Initially this PR (#10168) wants to revert the changes proposed by this: #8861. However, it did not fully revert all the changes.

How I did it
When hostcfgd started or was restarted, the Restart= field in each container's systemd configuration file should be initialized according to the value of auto_restart field in FEATURE table of CONFIG_DB.

How to verify it
I verified this change by running auto-restart test case against newly built master image and also ran the unittest:
yxieca pushed a commit that referenced this pull request Jun 17, 2022
…alue of `auto_restart` in `CONFIG_DB` (#10915)

Why I did it
Recently the nightly testing pipeline found that the autorestart test case was failed when it was run against master image. The reason is Restart= field in each container's systemd configuration file was set to Restart=no even the value of auto_restart field in FEATURE table of CONFIG_DB is enabled.

This issue introduced by #10168 can be reproduced by the following steps:

Issues the config command to disable the auto-restart feature of a container
Runs command config reload or config reload minigraph to enable auto-restart of the container
Checks Restart= field in the container's systemd config file mentioned in step 1 by running the command
sudo systemctl cat <container_name>.service
Initially this PR (#10168) wants to revert the changes proposed by this: #8861. However, it did not fully revert all the changes.

How I did it
When hostcfgd started or was restarted, the Restart= field in each container's systemd configuration file should be initialized according to the value of auto_restart field in FEATURE table of CONFIG_DB.

How to verify it
I verified this change by running auto-restart test case against newly built master image and also ran the unittest:
@vivekrnv vivekrnv deleted the hostcfgd_fix branch June 5, 2023 22:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[hostcfgd] hoscfgd doesn't honor CFG DB updates if they arrive in a specific time interval