Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

increase wait time for mocked T0 DToR TC failures due to config push delta #13625

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

AnantKishorSharma
Copy link
Contributor

Description of PR

Test fails because trigger happens before the mux toggle config is pushed from orchagent for all the ports and took effect from sairedis. ports are selected randomly hence the issue is intermittent(if the ports selected for that run has the config taken effect at sairedis by the time trigger happens). In case of T0 mocked DToR we can not check the mux status so we're relying on sleep to finish config.

This was observed with sonic mgmt 202311 after merging #8363 which was missing in 202311

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Back port request

  • 202305
  • 202311
  • 202405

Approach

What is the motivation for this PR?

#3 is happening before #2 in NOK run
1)when ansible command was executed(syslog)

May 31 05:29:05.525247 mth-t0-64 INFO python[630593]: ansible-command Invoked with _raw_params=docker exec swss sh -c "swssconfig /muxactive.json" _uses_shell=True warn=True stdin_add_newline=True strip_empty_ends=True argv=None chdir=None executable=None creates=None removes=None stdin=None
May 31 05:29:49.005483 mth-t0-64 INFO python[631130]: ansible-command Invoked with _raw_params=docker exec swss sh -c "swssconfig /muxactive.json" _uses_shell=True warn=True stdin_add_newline=True strip_empty_ends=True argv=None chdir=None executable=None creates=None removes=None stdin=None 

2)when it took effect from sairedis(sairedis.rec)

2024-05-31.05:29:40.878793|c|SAI_OBJECT_TYPE_NEIGHBOR_ENTRY:{"ip":"192.168.0.17","rif":"oid:0x600000000099c","switch_id":"oid:0x21000000000000"}|SAI_NEIGHBOR_ENTRY_ATTR_DST_MAC_ADDRESS=3C:FD:FE:B0:8D:1B
2024-05-31.05:29:40.879545|c|SAI_OBJECT_TYPE_NEXT_HOP:oid:0x4000000000ad9|SAI_NEXT_HOP_ATTR_TYPE=SAI

3)when did trigger happen(test log)

31/05/2024 05:29:34 testutils.verify_packet                  L3245 DEBUG  | Checking for pkt on device 0, port 27

How did you do it?

Introduced an additional delay of 30s between mux toggle on DUT and send packet from T1(PTF)

How did you verify/test it?

Verified that test packets are sent to DUT after config is finished and test case passes.

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

@wsycqyz
Copy link
Contributor

wsycqyz commented Jul 11, 2024

@AnantKishorSharma Should we merge this PR to 2023 or 2024 branch?
Also please fix the confilct.

@AnantKishorSharma
Copy link
Contributor Author

@AnantKishorSharma Should we merge this PR to 2023 or 2024 branch? Also please fix the confilct.

We observed this in 202311 runs. Will check with 202405 and update here. Resolved the conflict. Thanks!

@AnantKishorSharma
Copy link
Contributor Author

/azp run

Copy link

Commenter does not have sufficient privileges for PR 13625 in repo sonic-net/sonic-mgmt

@AnantKishorSharma
Copy link
Contributor Author

@AnantKishorSharma Should we merge this PR to 2023 or 2024 branch? Also please fix the confilct.

We observed this in 202311 runs. Will check with 202405 and update here. Resolved the conflict. Thanks!

Hi @wsycqyz @kevinskwang , we observe this in 202405 too. Could you please review the changes?

@@ -115,7 +115,7 @@ def stop_garp(ptfhost):

if is_t0_mocked_dualtor(tbinfo): # noqa F405
request.getfixturevalue('apply_active_state_to_orchagent')
time.sleep(30)
wait(60, 'wait for config push/mux state change on mocked dualtor')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the reason to change sleep to wait?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @kevinskwang , changed it just for consistency(and a msg) after observing that we are using wait instead of sleep at other places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @kevinskwang , do we want to change it back to sleep?

@AnantKishorSharma
Copy link
Contributor Author

Hi @kevinskwang , could you please review this PR?

@AnantKishorSharma
Copy link
Contributor Author

Hi @kevinskwang , please help review this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants