-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[services] kill container on stop in warm/fast mode #10510
[services] kill container on stop in warm/fast mode #10510
Conversation
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
The change LGTM, but can you look at the PR test errors? I see that warmboot test failed with: Downtime being exceeded by 30s is unexpected and may not be termed flaky. Please look at the failures. |
/azpw run Azure.sonic-buildimage |
/AzurePipelines run Azure.sonic-buildimage |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run Azure.sonic-buildimage |
Azure Pipelines successfully started running 1 pipeline(s). |
23e93984 [scripts/fast-reboot] Shutdown remaining containers through systemd (sonic-net#2133) 576c9efc [scripts/fast-reboot] stop timers in advance (sonic-net#2131) 4dad79c4 bugfix: incorrect command for portchannel creation (sonic-net#2134) c17b1f49 [show][muxcable] Decrease the timeout for show mux status/hwmode (sonic-net#2130) 49d61f84 [scripts/fast-reboot] cleanup (sonic-net#2132) 52ca3245 [config/config_mgmt.py]: Fix dpb issue with upper case mac in (sonic-net#2066) 9e2fbf40 Update db_migrator to support `pfcwd_sw_enable` (sonic-net#2087) 4010bd09 FGNHG CLI changes (sonic-net#1588) 6bd54d04 Fix 'show mac' output when FDB entry for default vlan is None instead of 1 (sonic-net#2126) f70dc27 [techsupport] Handle minor fixes of TS Lock and update auto-TS (sonic-net#2114) 51d3550 Fix issues in clear_qos (sonic-net#2122) 6d3aa1e [GCU] Optimizing moves by adding generators for keys/tables (sonic-net#2120) 65a5a6b Fixing get port speed when oper status is down (sonic-net#2123) c752457 [PBH] Implement Edit Flows (sonic-net#2093) 827358f [debug dump] dump interface module added (sonic-net#2070) Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
This reverts commit e84d5ea.
…action-kill-on-wb
``` 23e93984 [scripts/fast-reboot] Shutdown remaining containers through systemd (sonic-net#2133) 576c9efc [scripts/fast-reboot] stop timers in advance (sonic-net#2131) 4dad79c4 bugfix: incorrect command for portchannel creation (sonic-net#2134) c17b1f49 [show][muxcable] Decrease the timeout for show mux status/hwmode (sonic-net#2130) 49d61f84 [scripts/fast-reboot] cleanup (sonic-net#2132) 52ca3245 [config/config_mgmt.py]: Fix dpb issue with upper case mac in (sonic-net#2066) 9e2fbf40 Update db_migrator to support `pfcwd_sw_enable` (sonic-net#2087) 4010bd09 FGNHG CLI changes (sonic-net#1588) 6bd54d04 Fix 'show mac' output when FDB entry for default vlan is None instead of 1 (sonic-net#2126) ``` Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
@vaibhavhd looks like this PR won't pass tests without sonic-utilities update as well as sonic-utilities update PR won't pass tests without this PR. Updated sonic-utilities here and checking results. |
The tests failed with: |
I am curious now why the same PR on 202012 (#10511) consistently passed and this one on master consistently failed. Update: I just realized that the PR on 202012 did not run tests, and hence it showed up as passing. So basically we don't know if this test would have passed/failed on 202012. |
There is a problem that hostcfgd blocks forever and does not react to SIGTERM causing a delay in warm boot. Example:
@qiluo-msft This looks like a swss-common issue. This wasn't seen for some reason during 500 warm boot tests on 202012 since warm reboot is executed early enough before hostcfgd stucks in a listen(). |
@stepanblyschak you are also updating sonic-utilities submodule head with this PR? Please update PR comments accordingly and list the changes included in this PR by submodule change. |
@liat-grozovik @yxieca It is for test, currently this PR is blocked due to sonic-net/sonic-swss-common#603 |
…ystemd (#2133)" (#2161) This reverts commit 23e9398. - What I did Revert "[scripts/fast-reboot] Shutdown remaining containers through systemd (#2133)" This reverted PR is part of a story that refactors warm/fast shutdown sequence to gracefully stop services instead of killing them without any ordering and dependency requirements which creates several issues and is error prone for the future. This PR must come together with sonic-net/sonic-buildimage#10510. However, #10510 is blocked due to an issue in swss-common sonic-net/sonic-swss-common#603 And a fix by MSFT is in review sonic-net/sonic-swss-common#606 I am reverting it because its dependency is still blocked and we cannot update submodule pointer. Once the dependency of the reverted PR is resolved, it shall be re-committed.
…ystemd (#2133)" (#2166) - What I did This reverts commit a5f55aa. Revert "[scripts/fast-reboot] Shutdown remaining containers through systemd (#2133)" This reverted PR is part of a story that refactors warm/fast shutdown sequence to gracefully stop services instead of killing them without any ordering and dependency requirements which creates several issues and is error prone for the future. This PR must come together with sonic-net/sonic-buildimage#10510. However, #10510 is blocked due to an issue in swss-common sonic-net/sonic-swss-common#603 And a fix by MSFT is in review sonic-net/sonic-swss-common#606 I am reverting it because its dependency is still blocked and we cannot update submodule pointer. Once the dependency of the reverted PR is resolved, it shall be re-committed. - How I did it git revert a5f55aa - How to verify it Run tests
This reverts commit a006f07.
/azpw run Azure.sonic-buildimage |
/AzurePipelines run Azure.sonic-buildimage |
Azure Pipelines successfully started running 1 pipeline(s). |
Please note, that this PR only works togather with sonic-net/sonic-utilities#2184 thus we will need to merge sonic-net/sonic-utilities#2184 first and I will need to updated utilities submodule in this PR to make tests pass |
…hrough systemd (#2133)" (#2161)" (#2184) Reverts #2161 Revert a revert. This must be merged together with sonic-net/sonic-buildimage#10510
@stepanblyschak utilities PR is merged now. Please update the submodule here and let's trigger the new PR tests on this PR. |
…action-kill-on-wb
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
/azpw run Azure.sonic-buildimage |
/AzurePipelines run Azure.sonic-buildimage |
Azure Pipelines successfully started running 1 pipeline(s). |
This reverts commit e38fb4c.
…action-kill-on-wb
kvmtest-t1-lag tests fail because they exceed the timeout of 300m. Looks like the timeout needs to be increased |
/azpw run Azure.sonic-buildimage |
/AzurePipelines run Azure.sonic-buildimage |
Azure Pipelines successfully started running 1 pipeline(s). |
/azpw run Azure.sonic-buildimage |
/AzurePipelines run Azure.sonic-buildimage |
Azure Pipelines successfully started running 1 pipeline(s). |
/easycla |
- Why I did it To optimize stop on warm boot. - How I did it Added kill for containers
…ystemd (#2133)" (#2161) This reverts commit 23e9398. - What I did Revert "[scripts/fast-reboot] Shutdown remaining containers through systemd (#2133)" This reverted PR is part of a story that refactors warm/fast shutdown sequence to gracefully stop services instead of killing them without any ordering and dependency requirements which creates several issues and is error prone for the future. This PR must come together with sonic-net/sonic-buildimage#10510. However, #10510 is blocked due to an issue in swss-common sonic-net/sonic-swss-common#603 And a fix by MSFT is in review sonic-net/sonic-swss-common#606 I am reverting it because its dependency is still blocked and we cannot update submodule pointer. Once the dependency of the reverted PR is resolved, it shall be re-committed.
…hrough systemd (#2133)" (#2161)" (#2184) Reverts #2161 Revert a revert. This must be merged together with sonic-net/sonic-buildimage#10510
Signed-off-by: Stepan Blyschak stepanb@nvidia.com
Depends on sonic-net/sonic-utilities#2184.
Why I did it
To optimize stop on warm boot.
How I did it
Added kill for containers
How to verify it
Which release branch to backport (provide reason below if selected)
Description for the changelog
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)