Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[services] kill container on stop in warm/fast mode #10510

Merged
merged 12 commits into from
Sep 19, 2022

Conversation

stepanblyschak
Copy link
Collaborator

@stepanblyschak stepanblyschak commented Apr 8, 2022

Signed-off-by: Stepan Blyschak stepanb@nvidia.com

Depends on sonic-net/sonic-utilities#2184.

Why I did it

To optimize stop on warm boot.

How I did it

Added kill for containers

How to verify it

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
@vaibhavhd
Copy link
Contributor

The change LGTM, but can you look at the PR test errors? I see that warmboot test failed with:
"FAILED:dut:Longest downtime period must be less then 200 seconds. It was 0:03:50.634783",

Downtime being exceeded by 30s is unexpected and may not be termed flaky. Please look at the failures.

@stepanblyschak
Copy link
Collaborator Author

/azpw run Azure.sonic-buildimage

@mssonicbld
Copy link
Collaborator

/AzurePipelines run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@liat-grozovik
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

23e93984 [scripts/fast-reboot] Shutdown remaining containers through systemd (sonic-net#2133)
576c9efc [scripts/fast-reboot] stop timers in advance (sonic-net#2131)
4dad79c4 bugfix: incorrect command for portchannel creation (sonic-net#2134)
c17b1f49 [show][muxcable] Decrease the timeout for show mux status/hwmode (sonic-net#2130)
49d61f84 [scripts/fast-reboot] cleanup (sonic-net#2132)
52ca3245 [config/config_mgmt.py]: Fix dpb issue with upper case mac in (sonic-net#2066)
9e2fbf40 Update db_migrator to support `pfcwd_sw_enable` (sonic-net#2087)
4010bd09 FGNHG CLI changes (sonic-net#1588)
6bd54d04 Fix 'show mac' output when FDB entry for default vlan is None instead of 1 (sonic-net#2126)
f70dc27 [techsupport] Handle minor fixes of TS Lock and update auto-TS (sonic-net#2114)
51d3550 Fix issues in clear_qos (sonic-net#2122)
6d3aa1e [GCU] Optimizing moves by adding generators for keys/tables (sonic-net#2120)
65a5a6b Fixing get port speed when oper status is down (sonic-net#2123)
c752457 [PBH] Implement Edit Flows (sonic-net#2093)
827358f [debug dump] dump interface module added (sonic-net#2070)

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
```
23e93984 [scripts/fast-reboot] Shutdown remaining containers through systemd (sonic-net#2133)
576c9efc [scripts/fast-reboot] stop timers in advance (sonic-net#2131)
4dad79c4 bugfix: incorrect command for portchannel creation (sonic-net#2134)
c17b1f49 [show][muxcable] Decrease the timeout for show mux status/hwmode (sonic-net#2130)
49d61f84 [scripts/fast-reboot] cleanup (sonic-net#2132)
52ca3245 [config/config_mgmt.py]: Fix dpb issue with upper case mac in (sonic-net#2066)
9e2fbf40 Update db_migrator to support `pfcwd_sw_enable` (sonic-net#2087)
4010bd09 FGNHG CLI changes (sonic-net#1588)
6bd54d04 Fix 'show mac' output when FDB entry for default vlan is None instead of 1 (sonic-net#2126)
```

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
@stepanblyschak
Copy link
Collaborator Author

@vaibhavhd looks like this PR won't pass tests without sonic-utilities update as well as sonic-utilities update PR won't pass tests without this PR. Updated sonic-utilities here and checking results.

@vaibhavhd
Copy link
Contributor

The tests failed with:
"FAILED:dut:Longest downtime period must be less then 200 seconds. It was 0:03:43.843506",

@vaibhavhd
Copy link
Contributor

vaibhavhd commented Apr 14, 2022

I am curious now why the same PR on 202012 (#10511) consistently passed and this one on master consistently failed.
Particularly when this PR is meant to improve the warmboot flow.

Update: I just realized that the PR on 202012 did not run tests, and hence it showed up as passing. So basically we don't know if this test would have passed/failed on 202012.

@stepanblyschak
Copy link
Collaborator Author

stepanblyschak commented Apr 15, 2022

There is a problem that hostcfgd blocks forever and does not react to SIGTERM causing a delay in warm boot.
When hitting this line https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-host-services/scripts/hostcfgd#L1241 .

Example:

>>> from swsscommon import swsscommon
>>> c=swsscommon.ConfigDBConnector()
>>> c.subscribe('A', lambda a: None)
>>> c.connect()
>>> c.listen()


^C^C^C^C

@qiluo-msft This looks like a swss-common issue.

This wasn't seen for some reason during 500 warm boot tests on 202012 since warm reboot is executed early enough before hostcfgd stucks in a listen().

@yxieca
Copy link
Contributor

yxieca commented Apr 25, 2022

@stepanblyschak you are also updating sonic-utilities submodule head with this PR? Please update PR comments accordingly and list the changes included in this PR by submodule change.

@liat-grozovik
Copy link
Collaborator

liat-grozovik commented Apr 25, 2022 via email

@stepanblyschak
Copy link
Collaborator Author

@liat-grozovik @yxieca It is for test, currently this PR is blocked due to sonic-net/sonic-swss-common#603

liat-grozovik pushed a commit to sonic-net/sonic-utilities that referenced this pull request May 11, 2022
…ystemd (#2133)" (#2161)

This reverts commit 23e9398.

- What I did
Revert "[scripts/fast-reboot] Shutdown remaining containers through systemd (#2133)"

This reverted PR is part of a story that refactors warm/fast shutdown sequence to gracefully stop services instead of killing them without any ordering and dependency requirements which creates several issues and is error prone for the future.

This PR must come together with sonic-net/sonic-buildimage#10510.
However, #10510 is blocked due to an issue in swss-common sonic-net/sonic-swss-common#603
And a fix by MSFT is in review sonic-net/sonic-swss-common#606

I am reverting it because its dependency is still blocked and we cannot update submodule pointer. Once the dependency of the reverted PR is resolved, it shall be re-committed.
liat-grozovik pushed a commit to sonic-net/sonic-utilities that referenced this pull request May 11, 2022
…ystemd (#2133)" (#2166)

- What I did
This reverts commit a5f55aa.

Revert "[scripts/fast-reboot] Shutdown remaining containers through systemd (#2133)"

This reverted PR is part of a story that refactors warm/fast shutdown sequence to gracefully stop services instead of killing them without any ordering and dependency requirements which creates several issues and is error prone for the future.

This PR must come together with sonic-net/sonic-buildimage#10510.
However, #10510 is blocked due to an issue in swss-common sonic-net/sonic-swss-common#603
And a fix by MSFT is in review sonic-net/sonic-swss-common#606

I am reverting it because its dependency is still blocked and we cannot update submodule pointer. Once the dependency of the reverted PR is resolved, it shall be re-committed.

- How I did it
git revert a5f55aa

- How to verify it
Run tests
@stepanblyschak
Copy link
Collaborator Author

/azpw run Azure.sonic-buildimage

@mssonicbld
Copy link
Collaborator

/AzurePipelines run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@stepanblyschak
Copy link
Collaborator Author

Please note, that this PR only works togather with sonic-net/sonic-utilities#2184 thus we will need to merge sonic-net/sonic-utilities#2184 first and I will need to updated utilities submodule in this PR to make tests pass

vaibhavhd pushed a commit to sonic-net/sonic-utilities that referenced this pull request Jul 25, 2022
…hrough systemd (#2133)" (#2161)" (#2184)

Reverts #2161
Revert a revert. This must be merged together with sonic-net/sonic-buildimage#10510
@vaibhavhd
Copy link
Contributor

Please note, that this PR only works togather with Azure/sonic-utilities#2184 thus we will need to merge Azure/sonic-utilities#2184 first and I will need to updated utilities submodule in this PR to make tests pass

@stepanblyschak utilities PR is merged now. Please update the submodule here and let's trigger the new PR tests on this PR.

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
@stepanblyschak
Copy link
Collaborator Author

/azpw run Azure.sonic-buildimage

@mssonicbld
Copy link
Collaborator

/AzurePipelines run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@stepanblyschak
Copy link
Collaborator Author

kvmtest-t1-lag tests fail because they exceed the timeout of 300m. Looks like the timeout needs to be increased

@stepanblyschak
Copy link
Collaborator Author

/azpw run Azure.sonic-buildimage

@mssonicbld
Copy link
Collaborator

/AzurePipelines run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@stepanblyschak
Copy link
Collaborator Author

/azpw run Azure.sonic-buildimage

@mssonicbld
Copy link
Collaborator

/AzurePipelines run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@stepanblyschak
Copy link
Collaborator Author

/easycla

@liat-grozovik liat-grozovik merged commit e662008 into sonic-net:master Sep 19, 2022
yxieca pushed a commit that referenced this pull request Dec 8, 2022
- Why I did it
To optimize stop on warm boot.

- How I did it
Added kill for containers
malletvapid23 added a commit to malletvapid23/Sonic-Utility that referenced this pull request Aug 3, 2023
…ystemd (#2133)" (#2161)

This reverts commit 23e9398.

- What I did
Revert "[scripts/fast-reboot] Shutdown remaining containers through systemd (#2133)"

This reverted PR is part of a story that refactors warm/fast shutdown sequence to gracefully stop services instead of killing them without any ordering and dependency requirements which creates several issues and is error prone for the future.

This PR must come together with sonic-net/sonic-buildimage#10510.
However, #10510 is blocked due to an issue in swss-common sonic-net/sonic-swss-common#603
And a fix by MSFT is in review sonic-net/sonic-swss-common#606

I am reverting it because its dependency is still blocked and we cannot update submodule pointer. Once the dependency of the reverted PR is resolved, it shall be re-committed.
malletvapid23 added a commit to malletvapid23/Sonic-Utility that referenced this pull request Aug 3, 2023
…hrough systemd (#2133)" (#2161)" (#2184)

Reverts #2161
Revert a revert. This must be merged together with sonic-net/sonic-buildimage#10510
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants