-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Features were not disabled after config load_minigraph #13293
Comments
#13064 will help. Please retry with this fix. |
sonic-net/sonic-host-services#8 could help , but not yet merged. |
The PR may help to reduce the window of the trace, but it may not able to eliminate the race. |
Hi @rlhui, I was still observing the issue with recent 202205 image: admin@cmp227:/$ sonic-db-cli CONFIG_DB hgetall "FEATURE|bgp" admin@cmp227:/$ container_checker |
about the start of dependent service(swss) also starts bgp, I don't quite think this way |
|
sonic-net/sonic-host-services#8 Hostcfgd change didn't help the issue from my experiment. Moreover, it causes dhcp_relay/mux/bgp/macsec/teamd feature status to be uncertain by directly taking logic from init_cfg.json.j2 |
@arlakshm has local fix for this, to raise a PR |
…#15734) Fixes #15667 and #13293 Work item tracking Microsoft ADO 24472854: How I did it On chassis supervisor bgp feature is disabled in hostcfgd. The dependency between swss and bgp causes the bgp containers to start even though the feature is disabled. How to verify it Tests on chassis supervisor and LC
…sonic-net#15734) Fixes sonic-net#15667 and sonic-net#13293 Work item tracking Microsoft ADO 24472854: How I did it On chassis supervisor bgp feature is disabled in hostcfgd. The dependency between swss and bgp causes the bgp containers to start even though the feature is disabled. How to verify it Tests on chassis supervisor and LC
…#15734) (#16099) Fixes #15667 and #13293 Work item tracking Microsoft ADO 24472854: How I did it On chassis supervisor bgp feature is disabled in hostcfgd. The dependency between swss and bgp causes the bgp containers to start even though the feature is disabled. How to verify it Tests on chassis supervisor and LC Co-authored-by: Arvindsrinivasan Lakshmi Narasimhan <55814491+arlakshm@users.noreply.github.com>
…sonic-net#15734) Fixes sonic-net#15667 and sonic-net#13293 Work item tracking Microsoft ADO 24472854: How I did it On chassis supervisor bgp feature is disabled in hostcfgd. The dependency between swss and bgp causes the bgp containers to start even though the feature is disabled. How to verify it Tests on chassis supervisor and LC
…#15734) (#16135) Fixes #15667 and #13293 Work item tracking Microsoft ADO 24472854: How I did it On chassis supervisor bgp feature is disabled in hostcfgd. The dependency between swss and bgp causes the bgp containers to start even though the feature is disabled. How to verify it Tests on chassis supervisor and LC Co-authored-by: Arvindsrinivasan Lakshmi Narasimhan <55814491+arlakshm@users.noreply.github.com>
…sonic-net#15734) Fixes sonic-net#15667 and sonic-net#13293 Work item tracking Microsoft ADO 24472854: How I did it On chassis supervisor bgp feature is disabled in hostcfgd. The dependency between swss and bgp causes the bgp containers to start even though the feature is disabled. How to verify it Tests on chassis supervisor and LC
#15734 merged. Closing this issue |
Description
We encountered the issue when config-load sonic-mgmt T2 minigraph in a VOQ chassis. Before config-load minigraph, feature bgp was enabled on the supervisor (so bgp containers of all fabric AISCs were running). After config-load sonic-mgmt T2 minigraph that disables feature bgp on the sup, container_check failed because some bgp containers were still running:
$ sonic-db-cli CONFIG_DB hgetall "FEATURE|bgp"
{'has_timer': 'False', 'check_up_status': 'false', 'state': 'disabled', 'auto_restart': 'enabled', 'has_global_scope': 'False', 'has_per_asic_scope': 'True', 'high_mem_alert': 'disabled'}
$ container_checker
Unexpected running containers: bgp7, bgp9, bgp6, bgp10, bgp11, bgp5, bgp8, bgp4
According to syslog, hostcfgd failed to stop bgp4 because systemctl stop was canceled.
Jan 6 05:35:11.848413 INFO hostcfgd: Running cmd: 'sudo systemctl stop bgp@4.service'
Jan 6 05:35:21.417044 INFO hostcfgd[705707]: Job for bgp@4.service canceled.
Jan 6 05:35:21.424134 ERR hostcfgd: sudo systemctl stop bgp@4.service - failed: return code - 1, output:#012None
The reason that systemctl stop was canceled is likely because there was another systemctl action being issued on bgp4, e.g., starting swss4 would systemctl start bgp4 as well. In another words, this may be a race condition.
Furthermore, with today's implementation of hostcfgd, it bails out immediately if it encountered error when stopping a feature. This explains why the rest bgp containers (bgp5-11) were not stopped.
This issue was seen with recent 202205 image.
Steps to reproduce the issue:
Describe the results you received:
Describe the results you expected:
Output of
show version
:Output of
show techsupport
:Additional information you deem important (e.g. issue happens only occasionally):
The text was updated successfully, but these errors were encountered: