-
Notifications
You must be signed in to change notification settings - Fork 740
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix for mocked T0 DToR TC failures due to config push delta #8363
fix for mocked T0 DToR TC failures due to config push delta #8363
Conversation
@kevinskwang - Could you review this? |
…ttps://github.com/AnantKishorSharma/sonic-mgmt into test_orchagent_standby_tor_downstream_failure_fix
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run Semgrep |
No pipelines are associated with this pull request. |
@kevinskwang - could you please help to merge? |
@kevinskwang Could you please help unblock the "expected check". Can we do anything from our side to re-trigger it except a new commit? |
@kevinskwang , could you please merge this PR? |
…tream_failure_fix
@kevinskwang , all the checks have passed after updating the branch today. Could you please merge? |
…t#8363) * fix for failures in orchagent_standby_tor_downstream script * Update test_orchagent_standby_tor_downstream.py * fix for mocked T0 DToR TC failures due to config push delta
Cherry-pick PR to 202305: #11213 |
* fix for failures in orchagent_standby_tor_downstream script * Update test_orchagent_standby_tor_downstream.py * fix for mocked T0 DToR TC failures due to config push delta
@AnantKishorSharma PR conflicts with 202205 branch |
Created #11550 manualy. |
…ush delta (#11550) Original PR: #8363 (A) Test fails because mux toggle json file execution fails as swss container is not running (B) Test fails because trigger happens before the mux toggle config is pushed from orchagent for all the 36 ports and took effect from sairedis. ports are selected randomly hence the issue is intermittent(if the ports selected out of 36, for that run has the config taken effect at sairedis by the time trigger happens). In ~10 runs, it's observed that it takes anywhere between 18-21s to finish the config at sairedis for all 36 ports(from the time ansible cmd for json is executed). In case of T0 mocked DToR we can not check the mux status so we're relying on sleep to finish config.
@kevinskwang, @mssonicbld , @wangxin , could you please approve this PR for 202311 brnach. I see this is missing in 202311 branch and causing the same failures. |
@kevinwangsk , could you please add "Approved for 202311 branch" label in this PR to cherry pick? Please let us know if we need to create it manually. |
…t#8363) * fix for failures in orchagent_standby_tor_downstream script * Update test_orchagent_standby_tor_downstream.py * fix for mocked T0 DToR TC failures due to config push delta
Cherry-pick PR to 202311: #13051 |
@AnantKishorSharma Please report back if the issue is still there after being merged to 202311 |
* fix for failures in orchagent_standby_tor_downstream script * Update test_orchagent_standby_tor_downstream.py * fix for mocked T0 DToR TC failures due to config push delta
Hi @wsycqyz , after merging this PR we still saw the failure and had to increase the delay, pleas review #13625 |
<!--
Please make sure you've read and understood our contributing guidelines;
https://github.com/sonic-net/SONiC/blob/gh-pages/CONTRIBUTING.md
Please provide following information to help code review process a bit easier:
-->
Description of PR
Summary:
Fixes # (issue)
(A) Test fails because mux toggle json file execution fails as swss container is not running
(B) Test fails because trigger happens before the mux toggle config is pushed from orchagent for all the 36 ports and took effect from sairedis. ports are selected randomly hence the issue is intermittent(if the ports selected out of 36, for that run has the config taken effect at sairedis by the time trigger happens). In ~10 runs, it's observed that it takes anywhere between 18-21s to finish the config at sairedis for all 36 ports(from the time ansible cmd for json is executed). In case of T0 mocked DToR we can not check the mux status so we're relying on sleep to finish config.
Type of change
Back port request
Approach
What is the motivation for this PR?
(A) Test fails because mux toggle json file execution fails.
json file execution fails because swss is not running.
swss is not running because it's allowed to restart only 3 times in a 20 min interval and hits that limit.
restart limit is hit because in this test for ASIC type "gb" we restart swss 4 times(twice for each v4 and v6)
reset-failed is called for swss before restart but it does not seem to be flushing the restart rate counter for swss.
Log excerpts for issue (A):
(B) #3 is happening before #2 in NOK run
1)when ansible command was executed(syslog)
2)when it took effect from sairedis(sairedis.rec)
3)when did trigger happen(test log)
How did you do it?
(A) config.bcm generation in not required for Cisco gb platform so just skipped one restart to avoid hitting restart limit error.
(B) Introduced a delay of 30s between mux toggle on DUT and send packet from T1(PTF)
How did you verify/test it?
Verified that mux config json is executed successfully and packets are sent to DUT after config is finished and test case passes.
Any platform specific information?
While applying dtor mock config to the dut, we do not need 2 swss restarts in case of Cisco platforms as one of the restart is for generating config.bcm which is Bcm specific
Supported testbed topology if it's a new test case?
Documentation