-
Notifications
You must be signed in to change notification settings - Fork 525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[teammgr] Added LAG member check into addLagMember() #2464
Conversation
Signed-off-by: Andriy Kokhan <andriyx.kokhan@intel.com>
bc0bdb6
to
6bc2bda
Compare
When syncd container auto-restart, it will also restart swss and cleanup the STATE_DB. so which scenario is we encounter this? |
On syncd exit, the host interfaces (tun/tap netdevs) go to the DOWN state and then get removed. Looks like this triggers the notification to teammgr before STATE_DB gets cleaned up. That's why teammgr tries to add LAG members which do not exists any more. |
@judyjoseph , @prsunny , please review. Thanks |
1 similar comment
@judyjoseph , @prsunny , please review. Thanks |
Taking a look @akokhan -- thanks |
@judyjoseph , did you get a chance to review this? Thanks |
@judyjoseph , @prsunny , please review. Thanks |
@judyjoseph , did you get a chance to check this PR? Thanks |
@prsunny , please approve and merge. Thank you. |
*[teammgr] Added LAG member check into addLagMember()
*[teammgr] Added LAG member check into addLagMember()
Signed-off-by: Andriy Kokhan andriyx.kokhan@intel.com
What I did
Added a check into addLagMember() whether this new LAG member still exists in the kernel.
Why I did it
During syncd container autorestart scenario, on syncd exit, the host interfaces (tun/tap netdevs) go to the DOWN state and then get removed.
Due to the validation as follows, the teammgr will receive the notification about the port state change (the information will be updated in the state DB and pubsub message sent) but the port state record will not be removed from the state DB on port delete:
sonic-swss/portsyncd/linksync.cpp
Line 210 in 7cc035f
Due to this, on port state change notification, the isPortStateOk() will succeed and TeamMgr::addLagMember() will be executed even the host interface was actually removed.
The operation is expected to be ignored if the port is already enslaved:
sonic-swss/cfgmgr/teammgr.cpp
Line 721 in 7cc035f
The check fails since the port has already been removed:
sonic-swss/cfgmgr/teammgr.cpp
Line 412 in 7cc035f
As a result, the TeamMgr::addLagMember() logic will be executed and failed:
The issue started to reproduce after #2233
How I verified it