-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add teamd patches to solve traffic loss issue when removing port from LAG #14002
Conversation
@vaibhavhd fyi, appreciate your review comments |
@stepanblyschak , @vaibhavhd appreciate if you can review. |
@liorghub can you confirm this can be cleanly cherry picked to 202205? |
Yes, this PR can be cleanly cherry picked to 202205. |
@judyjoseph , @saiarcot895 , Can you please review ? |
This PR fixes issue #14381 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a .patch
suffix to this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
src/libteam/patch/series
Outdated
@@ -10,3 +10,5 @@ | |||
0010-When-read-of-timerfd-returned-0-don-t-consider-this-.patch | |||
0011-Remove-extensive-debug-output.patch | |||
0012-Increase-min_ports-upper-limit-to-1024.patch | |||
0013-set-port-to-disabled-state-during-removal.patch | |||
0014-dont-move-the-port-state-from-disabled-when-admin-state-is-down |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies, done.
I tried to repro the issue described here on a t0-64-32 topo with sonic neighbors (with config_db.json modified to remove the Is this happening only on physical hardware, or is it on a virtual setup as well? |
It happens only on physical hardware when both sides are switches (when one side is host and we create bonding, issue is not reproduced). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I verified that this is sort-of happening on virtual switch as well. The ICMP response packet can arrive on the interface that was just removed from the port channel (and so outside of the port channel), if it happens to be the primary interface on the peer virtual switch. It doesn't manifest as a traffic drop because rp_filter
checks are disabled, so the kernel accepts it as a valid packet, even though it's outside of the port channel.
… LAG (sonic-net#14002) #### Why I did it When removing port from LAG while traffic is running thorough LAG there is traffic disruption of 60 seconds. Fix issue sonic-net#14381 #### How I did it The patch I added introduces "port_removing" op and call it right before Kernel is asked to remove the port. Implement the op in LACP runner to disable the port which leads to proper LACPDU send. #### How to verify it Set LAG between 2 switches. Set LAGs to be router port and set ip address. In switch A send ping to ip address of LAG in switch B. In switch B, while ping is running remove port from LAG. Verify ping is not stopping.
Cherry-pick PR to 202211: #14684 |
… LAG (sonic-net#14002) #### Why I did it When removing port from LAG while traffic is running thorough LAG there is traffic disruption of 60 seconds. Fix issue sonic-net#14381 #### How I did it The patch I added introduces "port_removing" op and call it right before Kernel is asked to remove the port. Implement the op in LACP runner to disable the port which leads to proper LACPDU send. #### How to verify it Set LAG between 2 switches. Set LAGs to be router port and set ip address. In switch A send ping to ip address of LAG in switch B. In switch B, while ping is running remove port from LAG. Verify ping is not stopping.
Cherry-pick PR to 202205: #14716 |
… LAG (#14002) #### Why I did it When removing port from LAG while traffic is running thorough LAG there is traffic disruption of 60 seconds. Fix issue #14381 #### How I did it The patch I added introduces "port_removing" op and call it right before Kernel is asked to remove the port. Implement the op in LACP runner to disable the port which leads to proper LACPDU send. #### How to verify it Set LAG between 2 switches. Set LAGs to be router port and set ip address. In switch A send ping to ip address of LAG in switch B. In switch B, while ping is running remove port from LAG. Verify ping is not stopping.
Why I did it
When removing port from LAG while traffic is running thorough LAG there is traffic disruption of 60 seconds.
Fix issue #14381
How I did it
The patch I added introduces "port_removing" op and call it right before Kernel is asked to remove the port.
Implement the op in LACP runner to disable the port which leads to proper LACPDU send.
How to verify it
Set LAG between 2 switches.
Set LAGs to be router port and set ip address.
In switch A send ping to ip address of LAG in switch B.
In switch B, while ping is running remove port from LAG.
Verify ping is not stopping.
Which release branch to backport (provide reason below if selected)
Description for the changelog
Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU.
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)