Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EVPN VxLAN update for platforms using P2MP tunnel based L2 forwarding #806

Merged
merged 18 commits into from
Nov 10, 2021

Conversation

dgsudharsan
Copy link
Contributor

@dgsudharsan dgsudharsan commented Jun 19, 2021

Updated the VxLAN EVPN HLD with the scenario to handle P2MP tunnel based L2 Forwarding. This PR doesn't change the existing Head end replication behavior and doesn't add multicast replication. This PR is to adapt VXLAN EVPN to platforms which leverage using single tunnel(P2MP tunnel) and use L2MC groups for replication.

Repo PR title State
sonic-swss EVPN VxLAN enhancement to support P2MP tunnel based programming for Layer2 extension GitHub issue/pull request detail
sonic-buildimage Enhancing vs support to mock based on platform GitHub issue/pull request detail
sonic-sairedis Added query Enum value capability for vxlan EVPN feature GitHub issue/pull request detail
sonic-swss VxLAN Tunnel Counters and Rates implementation GitHub issue/pull request detail
sonic-sairedis Added Flex Counters support for tunnel counters GitHub issue/pull request detail
sonic-swss-common VxLAN Tunnel Counters and Rates implementation GitHub issue/pull request detail
sonic-utilities VxLAN Tunnel Counters and Rates implementation GitHub issue/pull request detail
sonic-buildimage VxLAN Tunnel Counters and Rates implementation GitHub issue/pull request detail

@prsunny
Copy link
Contributor

prsunny commented Jul 1, 2021

@srj102, @RajeshPukhrajJain could you please review.

Adding show vxlan counters command
@@ -698,6 +714,7 @@ It is proposed to handle these variances in the SAI implementation.

### 4.3.6 IMET route handling
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any changes required for MAC route and MAC Move handling?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of P2MP tunnel there will be only one bridge port and thus there are some minor changes in code where instead of checking bridgeport alone fur duplicate, endpoint IP is also checked.

@@ -656,6 +669,9 @@ The following will be added as part of tunnel deletion.
- sai_tunnel_remove_map, sai_tunnel_remove_tunnel_termination, sai_tunnel_remove_tunnel when the tunnel is to be removed on account of the last entry being removed.
- VxlanTunnel object will be deleted.

#### 4.3.2.2 P2MP Tunnel Deletion
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The tunnel object can be deleted only after all the IMR, MAC, IP Prefix routes are removed by BGP.
    The bridge port object can be deleted only after the fdb count referencing the bridgeport goes down to 0.

  2. Once the map count becomes 0, the vxlanmgr holds off handling further Map creations till the number of P2P
    tunnels goes down to 0. This is to handle quick delete and readd of mapping entries.
    VxlanMgr gets to know of P2P tunnel count by looking at the number of state table entries.
    However for P2MP the state table entries are not populated..
    How is this scenario handled ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no changes to P2MP tunnel logic currently except int this case it will always have 0 dip tunnels. Rest of the logic is the same as in existing code. I did have a question on FDB reference. In current orchagent code i don't not see tunnel delete skipped based on fdb reference count. Only bridge port removal is skipped while the actual tunnel might be deleted even when FDB reference count is non zero. Is that by design? If that's the case isn't it an issue with bridge port still referencing the tunnel port? https://github.com/Azure/sonic-swss/blob/4f1d726d4cbf8a283b22cd5f612cf03ca21a27b3/orchagent/vxlanorch.cpp#L1499

Copy link
Contributor Author

@dgsudharsan dgsudharsan Jul 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few more questions and clarifications.

  1. I don't see TUNNEL_USER_MAC being used anywhere. Is it same as fdb count?
  2. I believe IP prefix routes use the P2MP tunnel rather than P2P tunnel but their references are stored in P2P tunnels. In this scenario since there are no P2P tunnels, I believe i should have route references on P2MP tunnel itself. Am I right here?
  3. When IMR is received the tunnel's bridge port will be added to L2MC group member. So maintaining a reference here and deleting the bridge port and tunnel only when IMR count reaches zero. This will be identical to DIP tunnel reference used. Is this fine?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. m_fdb_count is not the same as the evpn mac count. However currently the evpn mac ref cnt is not being used.
  2. The refcnts were maintained for P2P tunnels to create and delete the P2P tunnels. The DIP tunnel count was used for deciding when to delete the P2MP tunnel.
  3. For the P2MP only case the refcnts could be reused to decide when to delete the P2MP tunnel object. Please check for del_tnl_hw_pending and getDipTunnelCnt and deletePendingSIPTunnel. On the last map entry deletion the P2MP tunnel object will be deleted once the total of refcnts across all the remote VTEP IP becomes zero.
  4. For bridgeport deletion.. the Port object and Associated bridgeport object (P2P) is deleted only when the total fdb count goes down to 0. It is after the bridge port deletion do we delete the Tunnel Object as well.
  5. My comment was that these are applicable to P2MP scenarios also and needs to be mentioned in the HLD. Code wise we may have to do some more adjustments to reuse the existing implementation for P2MP as well.
  6. Regarding the L2MC group member yes using the existing IMR refcnt to decide when to add/remove the L2MC group members would be good.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding your query https://github.com/Azure/SONiC/pull/806/files#r669059056 , No the bridgeport is deleted first and then the tunnel object which is being referenced by the bridgeport.
As part of deleteDynamicDIPTunnel there is a call to getTunnelPort. If this exists then the Tunnel Object is not deleted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dgsudharsan @srj102 Currently when tunnel nexthops are created by orchagent, it doesn't check for reachability of remote destination. Are we following the same approach or add the reachability condition while creating the the nexthops?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we are enhancing that here. The existing approach still holds true.

@@ -1082,6 +1107,7 @@ Linux kernel version 4.9.x used in SONiC requires backport of a few patches to s
4. show vxlan tunnel
- lists all the discovered tunnels.
- SIP, DIP, Creation Source, OperStatus are the columns.
- This command is not supported when the platform supports only P2MP tunnels.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this command not supported for P2MP case? this is important command to show the tunnel status.

Copy link
Contributor Author

@dgsudharsan dgsudharsan Aug 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@venkatmahalingam I don't think the command's output signifies the actual tunnel status, meaning we don't perform any kind of tunnel status /monitoring or checking reachability at both ends. These tunnels are logically created within the device and status signifies just receiving the remote end point. I believe With P2MP model we have only one tunnel. What i can do is consider introducing an enhancement using which end points are printed when displaying the P2MP tunnel.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, maybe, you can remove the lines you have added for P2MP, as nothing is going to be changed in the output for P2MP case.

Do we have to display P2P/P2MP in any cmd output?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Venkat. I believe the HLD is not inline with the actual implementation. This command's output maps to the implementation of "show vxlan remotevtep" https://github.com/Azure/sonic-utilities/blob/c2ac2d2327acc90a35b33ddead070995034d457a/show/vxlan.py#L176
This indication of remote tunnels will not be support in P2MP based scenarios since internally we create only one tunnel and not multiple tunnels. I will remove the line here and will try to update the command "show vxlan remotevtep" for P2MP scenario.

@@ -1147,7 +1173,14 @@ Linux kernel version 4.9.x used in SONiC requires backport of a few patches to s
+---------+--------------+-------+
Total count : 1

7. show vxlan counters
+--------+------------+----------+------------+----------+
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add more details on how counters are designed for P2P (in the future ?) and P2MP case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -623,6 +628,9 @@ In the current implementation, Tunnel Creation handling in the VxlanMgr and Vxla
The VTEP is represented by a VxlanTunnel Object created as above with the DIP as 0.0.0.0 and
SAI object type as TUNNEL. This SAI object is P2MP.

Some vendors support P2P Tunnels to handle Layer2 extension and fdb learning while some vendors support using existing P2MP for Layer2 scenarios. In order to differentiate the different requirements evpn_remote_vni orch which currently handles remote VNI is split into two types - evpn_remote_vni_p2p to handle the flow involving the P2P tunnel creation and evpn_remote_vni_p2mp to handle the flow for using the existing P2MP tunnel. The decision to chose which orch to use is dependent on the SAI enum query capability for the attribute SAI_TUNNEL_ATTR_PEER_MODE. If the vendors have SAI_TUNNEL_PEER_MODE_P2P listed, then evpn_remote_vni_p2p orch will be used, else evpn_remote_vni_p2mp will be used.
Copy link
Collaborator

@venkatmahalingam venkatmahalingam Jul 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add some details on P2P vs P2MP here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Addressing review comments on abbreviation.
Updating show vxlan counters command for P2P Tunnel
Updating show commands
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants