-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EVPN VxLAN update for platforms using P2MP tunnel based L2 forwarding #806
Conversation
@srj102, @RajeshPukhrajJain could you please review. |
Adding show vxlan counters command
@@ -698,6 +714,7 @@ It is proposed to handle these variances in the SAI implementation. | |||
|
|||
### 4.3.6 IMET route handling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any changes required for MAC route and MAC Move handling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of P2MP tunnel there will be only one bridge port and thus there are some minor changes in code where instead of checking bridgeport alone fur duplicate, endpoint IP is also checked.
@@ -656,6 +669,9 @@ The following will be added as part of tunnel deletion. | |||
- sai_tunnel_remove_map, sai_tunnel_remove_tunnel_termination, sai_tunnel_remove_tunnel when the tunnel is to be removed on account of the last entry being removed. | |||
- VxlanTunnel object will be deleted. | |||
|
|||
#### 4.3.2.2 P2MP Tunnel Deletion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
The tunnel object can be deleted only after all the IMR, MAC, IP Prefix routes are removed by BGP.
The bridge port object can be deleted only after the fdb count referencing the bridgeport goes down to 0. -
Once the map count becomes 0, the vxlanmgr holds off handling further Map creations till the number of P2P
tunnels goes down to 0. This is to handle quick delete and readd of mapping entries.
VxlanMgr gets to know of P2P tunnel count by looking at the number of state table entries.
However for P2MP the state table entries are not populated..
How is this scenario handled ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are no changes to P2MP tunnel logic currently except int this case it will always have 0 dip tunnels. Rest of the logic is the same as in existing code. I did have a question on FDB reference. In current orchagent code i don't not see tunnel delete skipped based on fdb reference count. Only bridge port removal is skipped while the actual tunnel might be deleted even when FDB reference count is non zero. Is that by design? If that's the case isn't it an issue with bridge port still referencing the tunnel port? https://github.com/Azure/sonic-swss/blob/4f1d726d4cbf8a283b22cd5f612cf03ca21a27b3/orchagent/vxlanorch.cpp#L1499
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few more questions and clarifications.
- I don't see TUNNEL_USER_MAC being used anywhere. Is it same as fdb count?
- I believe IP prefix routes use the P2MP tunnel rather than P2P tunnel but their references are stored in P2P tunnels. In this scenario since there are no P2P tunnels, I believe i should have route references on P2MP tunnel itself. Am I right here?
- When IMR is received the tunnel's bridge port will be added to L2MC group member. So maintaining a reference here and deleting the bridge port and tunnel only when IMR count reaches zero. This will be identical to DIP tunnel reference used. Is this fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- m_fdb_count is not the same as the evpn mac count. However currently the evpn mac ref cnt is not being used.
- The refcnts were maintained for P2P tunnels to create and delete the P2P tunnels. The DIP tunnel count was used for deciding when to delete the P2MP tunnel.
- For the P2MP only case the refcnts could be reused to decide when to delete the P2MP tunnel object. Please check for del_tnl_hw_pending and getDipTunnelCnt and deletePendingSIPTunnel. On the last map entry deletion the P2MP tunnel object will be deleted once the total of refcnts across all the remote VTEP IP becomes zero.
- For bridgeport deletion.. the Port object and Associated bridgeport object (P2P) is deleted only when the total fdb count goes down to 0. It is after the bridge port deletion do we delete the Tunnel Object as well.
- My comment was that these are applicable to P2MP scenarios also and needs to be mentioned in the HLD. Code wise we may have to do some more adjustments to reuse the existing implementation for P2MP as well.
- Regarding the L2MC group member yes using the existing IMR refcnt to decide when to add/remove the L2MC group members would be good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding your query https://github.com/Azure/SONiC/pull/806/files#r669059056 , No the bridgeport is deleted first and then the tunnel object which is being referenced by the bridgeport.
As part of deleteDynamicDIPTunnel there is a call to getTunnelPort. If this exists then the Tunnel Object is not deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dgsudharsan @srj102 Currently when tunnel nexthops are created by orchagent, it doesn't check for reachability of remote destination. Are we following the same approach or add the reachability condition while creating the the nexthops?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we are enhancing that here. The existing approach still holds true.
doc/vxlan/EVPN/EVPN_VXLAN_HLD.md
Outdated
@@ -1082,6 +1107,7 @@ Linux kernel version 4.9.x used in SONiC requires backport of a few patches to s | |||
4. show vxlan tunnel | |||
- lists all the discovered tunnels. | |||
- SIP, DIP, Creation Source, OperStatus are the columns. | |||
- This command is not supported when the platform supports only P2MP tunnels. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this command not supported for P2MP case? this is important command to show the tunnel status.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@venkatmahalingam I don't think the command's output signifies the actual tunnel status, meaning we don't perform any kind of tunnel status /monitoring or checking reachability at both ends. These tunnels are logically created within the device and status signifies just receiving the remote end point. I believe With P2MP model we have only one tunnel. What i can do is consider introducing an enhancement using which end points are printed when displaying the P2MP tunnel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, maybe, you can remove the lines you have added for P2MP, as nothing is going to be changed in the output for P2MP case.
Do we have to display P2P/P2MP in any cmd output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Venkat. I believe the HLD is not inline with the actual implementation. This command's output maps to the implementation of "show vxlan remotevtep" https://github.com/Azure/sonic-utilities/blob/c2ac2d2327acc90a35b33ddead070995034d457a/show/vxlan.py#L176
This indication of remote tunnels will not be support in P2MP based scenarios since internally we create only one tunnel and not multiple tunnels. I will remove the line here and will try to update the command "show vxlan remotevtep" for P2MP scenario.
doc/vxlan/EVPN/EVPN_VXLAN_HLD.md
Outdated
@@ -1147,7 +1173,14 @@ Linux kernel version 4.9.x used in SONiC requires backport of a few patches to s | |||
+---------+--------------+-------+ | |||
Total count : 1 | |||
|
|||
7. show vxlan counters | |||
+--------+------------+----------+------------+----------+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to add more details on how counters are designed for P2P (in the future ?) and P2MP case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
doc/vxlan/EVPN/EVPN_VXLAN_HLD.md
Outdated
@@ -623,6 +628,9 @@ In the current implementation, Tunnel Creation handling in the VxlanMgr and Vxla | |||
The VTEP is represented by a VxlanTunnel Object created as above with the DIP as 0.0.0.0 and | |||
SAI object type as TUNNEL. This SAI object is P2MP. | |||
|
|||
Some vendors support P2P Tunnels to handle Layer2 extension and fdb learning while some vendors support using existing P2MP for Layer2 scenarios. In order to differentiate the different requirements evpn_remote_vni orch which currently handles remote VNI is split into two types - evpn_remote_vni_p2p to handle the flow involving the P2P tunnel creation and evpn_remote_vni_p2mp to handle the flow for using the existing P2MP tunnel. The decision to chose which orch to use is dependent on the SAI enum query capability for the attribute SAI_TUNNEL_ATTR_PEER_MODE. If the vendors have SAI_TUNNEL_PEER_MODE_P2P listed, then evpn_remote_vni_p2p orch will be used, else evpn_remote_vni_p2mp will be used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add some details on P2P vs P2MP here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Addressing review comments on abbreviation.
Updating show vxlan counters command for P2P Tunnel
Updating show commands
Updated the VxLAN EVPN HLD with the scenario to handle P2MP tunnel based L2 Forwarding. This PR doesn't change the existing Head end replication behavior and doesn't add multicast replication. This PR is to adapt VXLAN EVPN to platforms which leverage using single tunnel(P2MP tunnel) and use L2MC groups for replication.