Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[teamd][warmreboot] LAG flap seen with ioctl SIOCADDMULTI and SIOCDELMULTI failures #5761

Open
vaibhavhd opened this issue Oct 30, 2020 · 5 comments

Comments

@vaibhavhd
Copy link
Contributor

Description

Steps to reproduce the issue:

  1. Run continuous warm-reboot cycles,
  2. One of the warm-reboot attempts will fail with a LAG flap reported.
  3. syslog and teamd.log will report LAG IOCTL errors.

Very similar issue was reported earlier - #3649
Testing with the patch failed on 201811 image with similar errors. Submitted patch may or may not have fixed this issue. Needs further analysis.

Describe the results you received:

One of the LAGs flapped which caused the PTF test as part of warm reboot test to fail. Below errors were seen in the logs:

From SYLOG:

Oct 23 13:05:41.210163 sonic-device INFO kernel: [   44.087262] PortChannel0001: Port device Ethernet0 added
Oct 23 13:05:41.251496 sonic-device NOTICE teamd#teammgrd: :- addLagMember: Add Ethernet0 to port channel PortChannel0001
Oct 23 13:05:41.286226 sonic-device INFO kernel: [   44.163405] PortChannel0001: Port device Ethernet0 removed

From TEAMD log:

Oct 23 13:05:41.251457 sonic-device ERR teamd#teamd_PortChannel0001[23]: ioctl SIOCADDMULTI failed.
Oct 23 13:05:41.263724 sonic-device ERR teamd#teamd_PortChannel0001[23]: Failed to init port priv.
Oct 23 13:05:41.263724 sonic-device DEBUG teamd#teamd_PortChannel0001[23]: : Removing port (found ifindex "63").
Oct 23 13:05:41.313277 sonic-device DEBUG teamd#teamd_PortChannel0001[23]: function lacp_port_removed(): 
Oct 23 13:05:41.313277 sonic-device DEBUG teamd#teamd_PortChannel0001[23]: Callback named "lacp_timeout" not found.
Oct 23 13:05:41.313277 sonic-device DEBUG teamd#teamd_PortChannel0001[23]: Callback named "lacp_periodic" not found.
Oct 23 13:05:41.313277 sonic-device DEBUG teamd#teamd_PortChannel0001[23]: Callback named "lacp_socket" not found.
Oct 23 13:05:41.313277 sonic-device ERR teamd#teamd_PortChannel0001[23]: ioctl SIOCDELMULTI failed.
Oct 23 13:05:41.313277 sonic-device DEBUG teamd#teamd_PortChannel0001[23]: Callback named "lw_ethtool_delay" not found.
Oct 23 13:05:41.313277 sonic-device WARNING teamd#teamd_PortChannel0001[23]: Loop callback failed with: No such device
Oct 23 13:05:41.313277 sonic-device DEBUG teamd#teamd_PortChannel0001[23]: Failed loop callback: libteam_events, 0x12783c0
Oct 23 13:05:41.313277 sonic-device DEBUG teamd#teamd_PortChannel0001[23]: Removed loop callback: usock_acc_conn, 0x128ec90
Oct 23 13:05:41.313277 sonic-device ERR teamd#teamd_PortChannel0001[23]: Port with interface index "63" is not part of this device.
Oct 23 13:05:41.313277 sonic-device WARNING teamd#teamd_PortChannel0001[23]: Loop callback failed with: Invalid argument
Oct 23 13:05:41.313277 sonic-device DEBUG teamd#teamd_PortChannel0001[23]: Failed loop callback: libteam_events, 0x12783c0
...
Oct 23 13:06:09.552804 sonic-device DEBUG teamd#teamd_PortChannel0001[23]: Ethernet1: Disabling port
Oct 23 13:06:09.553190 sonic-device DEBUG teamd#teamd_PortChannel0001[23]: WR-mode. function lacp_update_carrier()
Oct 23 13:06:09.553433 sonic-device ERR teamd#teamd_PortChannel0001[23]: WR-mode. Timeout occured. Can't start in WR mode in 3 seconds

Describe the results you expected:

WARM-REBOOT test should have successfully passed with NO LAG FLAPs seen.

Additional information you deem important (e.g. issue happens only occasionally):

Errors have been consistently seen in 201811 images. Attempting to reproduce on 201911 and master.

@lguohan
Copy link
Collaborator

lguohan commented Nov 4, 2020

@shi-su , do you see such issue on kvm image? @vaibhavhd, is this issue for 201811 image only?

@shi-su
Copy link
Contributor

shi-su commented Nov 4, 2020

@shi-su , do you see such issue on kvm image? @vaibhavhd, is this issue for 201811 image only?

I think I did see LAG flap reported on master kvm image. I will check if it is the same issue.

@vaibhavhd
Copy link
Contributor Author

So far this seems to be a 201811 issue only. I am running tests to repro this on 201911 image. 500+ iterations have successfully passed on 201911.
I am closely monitoring this, and trying to find out a way to easily repro this locally.

@vaibhavhd
Copy link
Contributor Author

@shi-su , do you see such issue on kvm image? @vaibhavhd, is this issue for 201811 image only?

I think I did see LAG flap reported on master kvm image. I will check if it is the same issue.

@shi-su can you please point me to the LAG flap failures on master kvm image? Did you see one or multiple LAGs flapping at the same time? We have seen LAG flap failures for a few different reasons in the past. I can find if your test failures have similarity to this issue.

@shi-su
Copy link
Contributor

shi-su commented Nov 4, 2020

@shi-su , do you see such issue on kvm image? @vaibhavhd, is this issue for 201811 image only?

I think I did see LAG flap reported on master kvm image. I will check if it is the same issue.

@shi-su can you please point me to the LAG flap failures on master kvm image? Did you see one or multiple LAGs flapping at the same time? We have seen LAG flap failures for a few different reasons in the past. I can find if your test failures have similarity to this issue.

@vaibhavhd I just checked on my side. I can see LAG flap if I run the warm-reboot test for real switches on the latest kvm image. The failure looks like this. I also checked the syslog and teamd log but did not find anything abnormal. It seems irrelevant to this issue.

"FAILED:10.250.0.54:LAG flapped 2 times on 10.250.0.54 after warm boot",
"FAILED:10.250.0.51:LAG flapped 2 times on 10.250.0.51 after warm boot",
"FAILED:10.250.0.52:LAG flapped 2 times on 10.250.0.52 after warm boot",
"FAILED:10.250.0.53:LAG flapped 2 times on 10.250.0.53 after warm boot",

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants