-
Notifications
You must be signed in to change notification settings - Fork 543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[EVPN]Handling race condition when remote VNI arrives before tunnel map entry #2642
Conversation
@srj102 Can you please review? This is to fix a bug we are hitting around 20% of time during testing. |
|
||
if (!gPortsOrch->getVlanByVlanId(vlan_id, vlanPort)) | ||
{ | ||
SWSS_LOG_WARN("Vxlan tunnel map vlan id doesn't exist: %d", vlan_id); | ||
return false; | ||
} | ||
|
||
/* Remote end point can be added only after local VLAN to VNI map gets created */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see why we need this change.
Is there any constraint imposed by the SAI API for this ?
What happens without this change ?
The errors listed seem specific to the implementation. I think this should be handled within the SAI driver implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the ordering is correct logically. You cannot add the remote end point without the respective maps being created. We following the same approach as linux and expect the same ordering. In linux we first add the vlan to vni map and them advertise it after which we receive the remote end point which we program. The same should be the behavior in orchagent. I don't think there is anything wrong in adding this check.
Update sonic-swss submodule pointer to include the following: * 7d223d3 Remove TODO comment which is no longer relevant ([sonic-net#2645](sonic-net/sonic-swss#2645)) * 02c2267 [test_mux] add sleep in test_NH ([sonic-net#2648](sonic-net/sonic-swss#2648)) * 8de52bf [EVPN]Handling race condition when remote VNI arrives before tunnel map entry ([sonic-net#2642](sonic-net/sonic-swss#2642)) * e99e2e4 [voq][chassis] Remove created ports from the default vlan. ([sonic-net#2607](sonic-net/sonic-swss#2607)) Signed-off-by: dprital <drorp@nvidia.com>
Update sonic-swss submodule pointer to include the following: * 7d223d3 Remove TODO comment which is no longer relevant ([#2645](sonic-net/sonic-swss#2645)) * 02c2267 [test_mux] add sleep in test_NH ([#2648](sonic-net/sonic-swss#2648)) * 8de52bf [EVPN]Handling race condition when remote VNI arrives before tunnel map entry ([#2642](sonic-net/sonic-swss#2642)) * e99e2e4 [voq][chassis] Remove created ports from the default vlan. ([#2607](sonic-net/sonic-swss#2607)) Signed-off-by: dprital <drorp@nvidia.com>
…ap entry (#2642) *Added check in remote VNI add to ensure vxlan tunnel map is created before adding the remote end point.
*Merge remote-tracking branch 'upstream/master' into dash (#2663) * Modify coppmgr mergeConfig to support preserving copp tables through reboot. (#2548) * Avoid aborting orchagent when setting TUNNEL attributes (#2591) * Handle Mac address 'none' (#2593) * Increase diff coverage to 80% (#2599) * Add missing parameter to on_switch_shutdown_request method. (#2567) * Add ZMQ based ProducerStateTable and CustomerStateTable. * Revert "[voq][chassis]Add show fabric counters port/queue commands (#2522)" (#2611) * Added new attributes for Vnet and Vxlan ecmp configurations. (#2584) * added support for monitoring, primary and adv_prefix and overlay_dmac. * [routesync] Fix for stale dynamic neighbor (#2553) * [MuxOrch] Enabling neighbor when adding in active state (#2601) * Changed the BFD default detect multiplier to 10x (#2614) * Remove TODO comments that are no longer relevant (#2622) * Fix issue #13341 ARP entry can be out of sync between kernel and APPL_DB if multiple updates are received from RTNL (#2619) * [refactor]Refactoring sai handle status (#2621) * Vxlan tunnel endpoint custom monitoring APPL DB table. (#2589) * added support for monitoring, primary and adv_prefix. changed filter_mac to overlay_dmac * Data Structures and code to write APP_DB VNET_MONITOR table entries for custom monitoring of Vxlan tunnel endpoints. * [bfdorch] add local discriminator to state DB (#2629) * [acl] Add new ACL key BTH_OPCODE and AETH_SYNDROME (#2617) * [voq][chassis] Remove created ports from the default vlan. (#2607) * [EVPN]Handling race condition when remote VNI arrives before tunnel map entry (#2642) *Added check in remote VNI add to ensure vxlan tunnel map is created before adding the remote end point. * [test_mux] add sleep in test_NH (#2648) * [autoneg]Fixing adv interface types to be set when AN is disabled (#2638) * [hash]: Add UT infra. (#2660) *Added UT infra for Generic Hash feature *Aligned PBH tests with Generic Hash UT infra * [sai_failure_dump]Invoking dump during SAI failure (#2644) * [ResponsePublisher] add pipeline support (#2511) * [dash] Fix compilation issue caused by missing include.
…ap entry (sonic-net#2642) *Added check in remote VNI add to ensure vxlan tunnel map is created before adding the remote end point.
…ap entry (sonic-net#2642) *Added check in remote VNI add to ensure vxlan tunnel map is created before adding the remote end point.
) *Merge remote-tracking branch 'upstream/master' into dash (sonic-net#2663) * Modify coppmgr mergeConfig to support preserving copp tables through reboot. (sonic-net#2548) * Avoid aborting orchagent when setting TUNNEL attributes (sonic-net#2591) * Handle Mac address 'none' (sonic-net#2593) * Increase diff coverage to 80% (sonic-net#2599) * Add missing parameter to on_switch_shutdown_request method. (sonic-net#2567) * Add ZMQ based ProducerStateTable and CustomerStateTable. * Revert "[voq][chassis]Add show fabric counters port/queue commands (sonic-net#2522)" (sonic-net#2611) * Added new attributes for Vnet and Vxlan ecmp configurations. (sonic-net#2584) * added support for monitoring, primary and adv_prefix and overlay_dmac. * [routesync] Fix for stale dynamic neighbor (sonic-net#2553) * [MuxOrch] Enabling neighbor when adding in active state (sonic-net#2601) * Changed the BFD default detect multiplier to 10x (sonic-net#2614) * Remove TODO comments that are no longer relevant (sonic-net#2622) * Fix issue #13341 ARP entry can be out of sync between kernel and APPL_DB if multiple updates are received from RTNL (sonic-net#2619) * [refactor]Refactoring sai handle status (sonic-net#2621) * Vxlan tunnel endpoint custom monitoring APPL DB table. (sonic-net#2589) * added support for monitoring, primary and adv_prefix. changed filter_mac to overlay_dmac * Data Structures and code to write APP_DB VNET_MONITOR table entries for custom monitoring of Vxlan tunnel endpoints. * [bfdorch] add local discriminator to state DB (sonic-net#2629) * [acl] Add new ACL key BTH_OPCODE and AETH_SYNDROME (sonic-net#2617) * [voq][chassis] Remove created ports from the default vlan. (sonic-net#2607) * [EVPN]Handling race condition when remote VNI arrives before tunnel map entry (sonic-net#2642) *Added check in remote VNI add to ensure vxlan tunnel map is created before adding the remote end point. * [test_mux] add sleep in test_NH (sonic-net#2648) * [autoneg]Fixing adv interface types to be set when AN is disabled (sonic-net#2638) * [hash]: Add UT infra. (sonic-net#2660) *Added UT infra for Generic Hash feature *Aligned PBH tests with Generic Hash UT infra * [sai_failure_dump]Invoking dump during SAI failure (sonic-net#2644) * [ResponsePublisher] add pipeline support (sonic-net#2511) * [dash] Fix compilation issue caused by missing include.
) *Merge remote-tracking branch 'upstream/master' into dash (sonic-net#2663) * Modify coppmgr mergeConfig to support preserving copp tables through reboot. (sonic-net#2548) * Avoid aborting orchagent when setting TUNNEL attributes (sonic-net#2591) * Handle Mac address 'none' (sonic-net#2593) * Increase diff coverage to 80% (sonic-net#2599) * Add missing parameter to on_switch_shutdown_request method. (sonic-net#2567) * Add ZMQ based ProducerStateTable and CustomerStateTable. * Revert "[voq][chassis]Add show fabric counters port/queue commands (sonic-net#2522)" (sonic-net#2611) * Added new attributes for Vnet and Vxlan ecmp configurations. (sonic-net#2584) * added support for monitoring, primary and adv_prefix and overlay_dmac. * [routesync] Fix for stale dynamic neighbor (sonic-net#2553) * [MuxOrch] Enabling neighbor when adding in active state (sonic-net#2601) * Changed the BFD default detect multiplier to 10x (sonic-net#2614) * Remove TODO comments that are no longer relevant (sonic-net#2622) * Fix issue #13341 ARP entry can be out of sync between kernel and APPL_DB if multiple updates are received from RTNL (sonic-net#2619) * [refactor]Refactoring sai handle status (sonic-net#2621) * Vxlan tunnel endpoint custom monitoring APPL DB table. (sonic-net#2589) * added support for monitoring, primary and adv_prefix. changed filter_mac to overlay_dmac * Data Structures and code to write APP_DB VNET_MONITOR table entries for custom monitoring of Vxlan tunnel endpoints. * [bfdorch] add local discriminator to state DB (sonic-net#2629) * [acl] Add new ACL key BTH_OPCODE and AETH_SYNDROME (sonic-net#2617) * [voq][chassis] Remove created ports from the default vlan. (sonic-net#2607) * [EVPN]Handling race condition when remote VNI arrives before tunnel map entry (sonic-net#2642) *Added check in remote VNI add to ensure vxlan tunnel map is created before adding the remote end point. * [test_mux] add sleep in test_NH (sonic-net#2648) * [autoneg]Fixing adv interface types to be set when AN is disabled (sonic-net#2638) * [hash]: Add UT infra. (sonic-net#2660) *Added UT infra for Generic Hash feature *Aligned PBH tests with Generic Hash UT infra * [sai_failure_dump]Invoking dump during SAI failure (sonic-net#2644) * [ResponsePublisher] add pipeline support (sonic-net#2511) * [dash] Fix compilation issue caused by missing include.
) *Merge remote-tracking branch 'upstream/master' into dash (sonic-net#2663) * Modify coppmgr mergeConfig to support preserving copp tables through reboot. (sonic-net#2548) * Avoid aborting orchagent when setting TUNNEL attributes (sonic-net#2591) * Handle Mac address 'none' (sonic-net#2593) * Increase diff coverage to 80% (sonic-net#2599) * Add missing parameter to on_switch_shutdown_request method. (sonic-net#2567) * Add ZMQ based ProducerStateTable and CustomerStateTable. * Revert "[voq][chassis]Add show fabric counters port/queue commands (sonic-net#2522)" (sonic-net#2611) * Added new attributes for Vnet and Vxlan ecmp configurations. (sonic-net#2584) * added support for monitoring, primary and adv_prefix and overlay_dmac. * [routesync] Fix for stale dynamic neighbor (sonic-net#2553) * [MuxOrch] Enabling neighbor when adding in active state (sonic-net#2601) * Changed the BFD default detect multiplier to 10x (sonic-net#2614) * Remove TODO comments that are no longer relevant (sonic-net#2622) * Fix issue #13341 ARP entry can be out of sync between kernel and APPL_DB if multiple updates are received from RTNL (sonic-net#2619) * [refactor]Refactoring sai handle status (sonic-net#2621) * Vxlan tunnel endpoint custom monitoring APPL DB table. (sonic-net#2589) * added support for monitoring, primary and adv_prefix. changed filter_mac to overlay_dmac * Data Structures and code to write APP_DB VNET_MONITOR table entries for custom monitoring of Vxlan tunnel endpoints. * [bfdorch] add local discriminator to state DB (sonic-net#2629) * [acl] Add new ACL key BTH_OPCODE and AETH_SYNDROME (sonic-net#2617) * [voq][chassis] Remove created ports from the default vlan. (sonic-net#2607) * [EVPN]Handling race condition when remote VNI arrives before tunnel map entry (sonic-net#2642) *Added check in remote VNI add to ensure vxlan tunnel map is created before adding the remote end point. * [test_mux] add sleep in test_NH (sonic-net#2648) * [autoneg]Fixing adv interface types to be set when AN is disabled (sonic-net#2638) * [hash]: Add UT infra. (sonic-net#2660) *Added UT infra for Generic Hash feature *Aligned PBH tests with Generic Hash UT infra * [sai_failure_dump]Invoking dump during SAI failure (sonic-net#2644) * [ResponsePublisher] add pipeline support (sonic-net#2511) * [dash] Fix compilation issue caused by missing include.
What I did
Added check in remote VNI add to ensure vxlan tunnel map is created before adding the remote end point.
Why I did it
In some race conditions, the VXLAN_REMOTE_VNI_TABLE arrives before VXLAN_TUNNEL_MAP_TABLE. The below log shows race condition as for VNI 500101 is set seen before vxlan tunnel map entry.
2023-01-18.09:35:21.223861|VXLAN_TUNNEL_MAP_TABLE:vtep101032:map_500100_Vlan100|SET|vlan:Vlan100|vni:500100
2023-01-18.09:35:21.326760|VXLAN_FDB_TABLE:Vlan101:aa:e9:21:5a:48:3b|SET|remote_vtep:30.0.0.2|type:dynamic|vni:500101
2023-01-18.09:35:21.327053|VXLAN_FDB_TABLE:Vlan100:5e:3d:f7:1e:93:f6|SET|remote_vtep:30.0.0.2|type:dynamic|vni:500100
2023-01-18.09:35:21.327079|VXLAN_FDB_TABLE:Vlan100:2e:a1:55:f6:59:2e|SET|remote_vtep:40.0.0.3|type:dynamic|vni:500100
2023-01-18.09:35:21.328590|VXLAN_REMOTE_VNI_TABLE:Vlan100:30.0.0.2|SET|vni:500100
2023-01-18.09:35:21.328643|VXLAN_REMOTE_VNI_TABLE:Vlan100:40.0.0.3|SET|vni:500100
2023-01-18.09:35:21.328768|VXLAN_REMOTE_VNI_TABLE:Vlan101:30.0.0.2|SET|vni:500101
This results in SAI failure as below
Jan 18 11:35:21.450681 r-alligator-04 ERR syncd#SDK: [TUNNEL.ERR] Could not find the entry in the tunnel entries map
Jan 18 11:35:21.450801 r-alligator-04 ERR syncd#SDK: [TUNNEL.ERR] Failed to get from tunnel[0x08c00000] entry with FID 101 , err = Entry Not Found
Jan 18 11:35:21.450873 r-alligator-04 ERR syncd#SDK: [FDB_FLOOD.ERR] Failed to get tunnel mapping from fid (101)
Jan 18 11:35:21.450943 r-alligator-04 ERR syncd#SDK: [FDB_FLOOD.ERR] Failed to set flood_vector to fid (101)
Jan 18 11:35:21.451108 r-alligator-04 ERR syncd#SDK: [CORE_API.ERR] Failed to set fdb flood params, FDB module, return message: [Entry Not Found]
Jan 18 11:35:21.451182 r-alligator-04 ERR syncd#SDK: [SAI_L2MC_GROUP.ERR] mlnx_sai_l2mcgroup.c[853]- mlnx_l2mc_group_fid_uc_bc_flood_ctrl_update: Failed to SET ports to fid 101 flood set - Entry Not Found.
Jan 18 11:35:21.451367 r-alligator-04 ERR swss#orchagent: :- create: create status: SAI_STATUS_ITEM_NOT_FOUND
Jan 18 11:35:21.451501 r-alligator-04 ERR swss#orchagent: :- addVlanFloodGroups: Failed to create l2mc group member for adding tunnel 30.0.0.2 to vlan 101
Jan 18 11:35:21.451656 r-alligator-04 ERR swss#orchagent: :- handleSaiCreateStatus: Encountered failure in create operation, exiting orchagent, SAI API: SAI_API_L2MC_GROUP, status: SAI_STATUS_ITEM_NOT_FOUND
Jan 18 11:35:21.451863 r-alligator-04 ERR syncd#SDK: :- sendApiResponse: api SAI_COMMON_API_CREATE failed in syncd mode: SAI_STATUS_ITEM_NOT_FOUND
How I verified it
Added UT to verify it.
Details if related