-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[libteam] LACP should not work after shutdown port-channel manually #2011
base: master
Are you sure you want to change the base?
Conversation
@klhaung I'll check and get back to you. |
@klhaung, this looks like a bug in teamd, do you think you can get this bug fix upstreamed? |
The fix we did previously is to have a patch, named 0005-when-port-status-is-disabled-it-could-not-handle-lacpdu-frame.patch, in src/libteam and modify src/libteam/Makefile to apply this patch. Is it not the method to provide the fix for libteam ? Please help to provide the correct steps for providing the fix for libteam. Thanks for the review. |
Hi, @lguohan, The fix we did previously is to have a patch, named 0005-when-port-status-is-disabled-it-could-not-handle-lacpdu-frame.patch, in src/libteam and modify src/libteam/Makefile to apply this patch. Is it not the correct method to fix the bug in src/libteam ? Please help to show us the correct method and we will follow the right one to do PR. Thanks for the review. |
Hi, @lguohan,
The fix we did previously is to have a patch, named 0005-when-port-status-is-disabled-it-could-not-handle-lacpdu-frame.patch, in src/libteam and modify src/libteam/Makefile to apply this patch. Is it not the correct method to fix the bug in src/libteam ? Please help to show us the correct method and we will follow the right one to do PR. Thanks for the review.
Best Regards,
From: lguohan [mailto:notifications@github.com]
Sent: Friday, October 19, 2018 6:45 PM
To: Azure/sonic-buildimage
Cc: Huang.Blake 黃國連 TAO; Mention
Subject: Re: [Azure/sonic-buildimage] [libteam] LACP should not work after shutdown port-channel manually (#2011)
@klhaung<https://github.com/klhaung>, this looks like a bug in teamd, do you think you can get this bug fix upstreamed?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#2011 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AkfunLs8xaywhZyK67fmOecoUhMeGqnNks5uma02gaJpZM4WW8W1>.
|
Hello @klhaung, I think what Guohan suggests is to send the pull request directly to https://github.com/jpirko/libteam. However, I have some questions regarding to what you have observed. I tried to bring down the port channel interface by setting the admin status to down. It won't cause the member ports' admin status to be set to down. Thus, without your fix, the member ports will still be in the selected mode, e.g. However, since the port channel interface is down, the neighbor and next hop associated with this port channel will be removed because the ARP entry is no longer valid. Thus traffic shall not be able to sent via the port channel. Could you describe more about the scenario you have set up? Why did you observe the traffic being forwarded afterwards? |
Hi, @stcheng,
Here are the steps we found the issue
Step1: Config PortChannel01 in config_db.json and check PortChannel01 is up via 'show interface portchannel'
Step2: Shutdown PortChannel01 via 'config portchannel shutdown PortChannel01'
Step3: Check PortChannel01 status again to make sure it is down
Step4: Send traffic to make sure it is down => STILL can forward the traffic
What you mentioned
57 PortChannel57 LACP(A)(Dw) Ethernet0(S) Ethernet1(S)
is Step3, it did show the status is down.
But, we used traffic generator to send traffic, it STILL can forward the traffic.
We checked the issue via bcm command and found that the port is removed for a while and then is added back by SWSS.
The reason why SWSS added the port back is because the the status of port is changed to CURRENT even the status of port is DISABLED.
It looks like a bug in LACP Fallback feature or the glue (teamsyncd,portsyncd,syncd) done by SONiC team, not a bug in libteamd, so that have a patch for integrating into SONiC.
Best Regards,
Blake
From: Shuotian Cheng [mailto:notifications@github.com]
Sent: Thursday, October 25, 2018 6:22 AM
To: Azure/sonic-buildimage
Cc: Huang.Blake 黃國連 TAO; Mention
Subject: Re: [Azure/sonic-buildimage] [libteam] LACP should not work after shutdown port-channel manually (#2011)
Hello @klhaung<https://github.com/klhaung>, I think what Guohan suggests is to send the pull request directly to https://github.com/jpirko/libteam. However, I have some questions regarding to what you have observed.
I tried to bring down the port channel interface by setting the admin status to down. It won't cause the member ports' admin status to be set to down. Thus, without your fix, the member ports will still be in the selected mode, e.g.
57 PortChannel57 LACP(A)(Dw) Ethernet0(S) Ethernet1(S)
However, since the port channel interface is down, the neighbor and next hop associated with this port channel will be removed because the ARP entry is no longer valid. Thus traffic shall not be able to sent via the port channel. Could you describe more about the scenario you have set up? Why did you observe the traffic being forwarded afterwards?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#2011 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AkfunE0hLC_JcYgDKwBPwppgqwU4HU1oks5uoOgegaJpZM4WW8W1>.
|
bb0733a [aclorch] Add ACL_TABLE_TYPE configuration (sonic-net#1982) 59cab5d Support for setting switch level DSCP to TC QoS map (sonic-net#2023) da21172 [aclorch] add generic AclOrch::updateAclRule() method (sonic-net#1993) 4f6cb05 [Reclaiming buffer] Support reclaiming buffer in traditional model (sonic-net#2011) 32d7a69 [Reclaiming buffer] Common code update (sonic-net#1996) b91d8ba [swss] L2 Forwarding Enhancements (sonic-net#1716) 797dab4 [muxorch] Bind all ports to drop ACL table (sonic-net#2027) 99929cd [lgtm.yml] add libgmock-dev (sonic-net#2035) 8727ae5 [flex counter] Flex counter threads consume too much CPU resources sonic-net#9202 (sonic-net#2031) 103fdf0 Remove redundant calls to get child scheduler group during initialization (sonic-net#1965) 18ea840 [macsec]: MACsec statistics support (sonic-net#1867) 0c46242 [orchagent] Flush pipeline every 1 second, not only when select will timeout (sonic-net#2003) 339101c [cbf] Add class-based forwarding support (sonic-net#1963) 24a615b Fix issue: accumulative headroom can exceed limit in rare scenario (sonic-net#2020) 708e232 Test divide by zero processing path (sonic-net#2028) 8f1d035 [macsecmgr]: Wait for port up before enabling macsec (sonic-net#2032) 4912a77 Remove buffer drop counter when port is removed (sonic-net#1860) f9462c4 [Dynamic buffer] [Mellanox] Calculate the peer response time according to the speed (sonic-net#1930) 8b5a401 Routed subinterface enhancements (sonic-net#2017) cdea5e9 Fix next hop compilation (sonic-net#2025) 37c197d [SRV6] Sonic-swss changes for SRV6 (sonic-net#1964) f502c32 [vnetorch] Add ECMP support for vnet tunnel routes (sonic-net#1960) Signed-off-by: Stephen Sun <stephens@nvidia.com>
bb0733a [aclorch] Add ACL_TABLE_TYPE configuration (#1982) 59cab5d Support for setting switch level DSCP to TC QoS map (#2023) da21172 [aclorch] add generic AclOrch::updateAclRule() method (#1993) 4f6cb05 [Reclaiming buffer] Support reclaiming buffer in traditional model (#2011) 32d7a69 [Reclaiming buffer] Common code update (#1996) b91d8ba [swss] L2 Forwarding Enhancements (#1716) 797dab4 [muxorch] Bind all ports to drop ACL table (#2027) 99929cd [lgtm.yml] add libgmock-dev (#2035) 8727ae5 [flex counter] Flex counter threads consume too much CPU resources #9202 (#2031) 103fdf0 Remove redundant calls to get child scheduler group during initialization (#1965) 18ea840 [macsec]: MACsec statistics support (#1867) 0c46242 [orchagent] Flush pipeline every 1 second, not only when select will timeout (#2003) 339101c [cbf] Add class-based forwarding support (#1963) 24a615b Fix issue: accumulative headroom can exceed limit in rare scenario (#2020) 708e232 Test divide by zero processing path (#2028) 8f1d035 [macsecmgr]: Wait for port up before enabling macsec (#2032) 4912a77 Remove buffer drop counter when port is removed (#1860) f9462c4 [Dynamic buffer] [Mellanox] Calculate the peer response time according to the speed (#1930) 8b5a401 Routed subinterface enhancements (#2017) cdea5e9 Fix next hop compilation (#2025) 37c197d [SRV6] Sonic-swss changes for SRV6 (#1964) f502c32 [vnetorch] Add ECMP support for vnet tunnel routes (#1960) Signed-off-by: Stephen Sun <stephens@nvidia.com>
dd71848 [GCU] Show default option for '--format' (#2003) f296e76 [GCU] Disallowing DeleteInsteadOfReplaceMoveExtender from generating delete whole config move (#2006) 731d643 [flow counter] Fix issue: should not compare str with int (#2001) e628f01 Support CLI for buffer queue configuration (#1965) 585fd40 Fix show ip bgp nei command rw required issue (#2011)
…onic-net#2011) - What I did It's to port sonic-net#1837 to master to reclaim reserved buffer. As the way to do it differs among vendors, buffermgrd will: 1. Handle port admin down on Mellanox platform. - Not apply lossless buffer PG to an admin-down port - Remove lossless buffer PG (3-4) from a port when it is shut down. 2. Read lossless buffer PG (3-4) to a port when a port is started up. - Why I did it To support reclaiming reserved buffer when a port is shut down on Mellanox platform in traditional buffer model. - How I verified it sonic-mgmt test and vs test. Signed-off-by: Stephen Sun <stephens@nvidia.com>
Fix the ro command rw permission required issue
- What I did
The sympthon of this issue is.
Step1: Config PortChannel01 in config_db.json and check PortChannel01 is up via 'show interface portchannel'
Step2: Shutdown PortChannel01 via 'config portchannel shutdown PortChannel01'
Step3: Check PortChannel01 status again to make sure it is down
Step4: Send traffic to make sure it is down => STILL can forward the traffic
- How I did it
According to the LACP state machine process, the status of port is changed to CURRENT only when the status of the port is EXPIRED or DEFAULTED. We should check the status of the port when receive LACPDU to see if it is EXPIRED or DEFAULTED before changing the status to CURRENT.
- How to verify it
- Description for the changelog
[libteam] LACP should not work after shutdown port-channel manually
- A picture of a cute animal (not mandatory but encouraged)