Rev | Date | Author | Change Description |
---|---|---|---|
0.1 | Prince Sunny | Initial version | |
1.0 | Prince Sunny | Review comments/feedback | |
1.1 | Prince Sunny | Review comments | |
1.2 | Prince Sunny | Design change for VNET Table flow | |
1.3 | Prince Sunny | VNet and Route Delete flow |
This document provides general information about the Vxlan feature implementation in SONiC.
This document describes the high level design of the Vxlan feature. Kernel VRF (L3mdev) programming for VNET peering is beyond the scope of this document.
VNI | Vxlan Network Identifier |
VTEP | Vxlan Tunnel End Point |
VM | Virtual Machine |
VRF | Virtual Routing and Forwarding |
VNet | Virtual Network |
This section describes the SONiC requirements for Vxlan feature primarily in the context of VNet.
At a high level the following should be supported:
Phase #1
- Should be able to perform the role of Vxlan Tunnel End Point (VTEP)
- VNet peering between customer VMs and Baremetal servers VNet Requirements.
- Distributed Vxlan routing with Symmetric IRB model (RIOT)
Phase #2
- Integration with BGP EVPN
- Should support untagged or tagged traffic (Overlay layer 2 networks over layer 3 underlay)
- Should be able to do HER for unicast traffic based on configured flood list
- CLI commands to configure Vxlan
- Should be able to create VRF/BRIDGE/VLAN to VNI mapping.
- Should be able to create NH Tunnel and Tunnel termination tables.
- Should be able to create tunnels and encap/decap mappers.
- Should be able to create VRFs per VNET tables.
- Should be able to track peering configurations.
- Should be VNet/VRF aware
- Should be able to handle routes within a VNet
- Should be able to create NH tunnels for the endpoints
- Should be VNet/VRF aware
- Should be VTEP aware
- Should support static configuration of FDB entries learnt on remote VTEP
- Should be VRF aware
- Should be able to create router interfaces in a specific VRF
- User should be able to get FDB learnt per VNI
- User should be able to configure Vxlan tunnels and VTEPs (Overlay)
In summary:
- config vxlan <vxlan_name> vlan <vlan_id> vni <vni_id>
- config vxlan <vxlan_name> src_if <interface>
- config vxlan <vxlan_name> vlan <vlan_id> flood vtep <ip1, ip2, ip3>
- show mac vxlan <vxlan_name> <vni_id>
- show vxlan <vxlan_name>
Configuring VNet peering via CLI is beyond the scope
Vxlan component | Expected value |
---|---|
VNI | 8k |
Tunnel encaps | 128k |
VMs | 512k |
VRFs | 128 |
Routes | 512k |
Phase #1 shall not include warm restart capabilities. SAI VR objects are not compliant with warm restart currently. This shall be revisited in Phase #2.
Following new tables will be added to Config DB. Unless otherwise stated, the attributes are mandatory
VXLAN_TUNNEL|{{tunnel_name}}
"src_ip": {{ip_address}}
"dst_ip": {{ip_address}} (OPTIONAL)
VXLAN_TUNNEL_MAP|{{tunnel_name}}|{{tunnel_map}}
"vni": {{ vni_id}}
"vlan": {{ vlan_id }}
VNET|{{vnet_name}}
"vxlan_tunnel": {{tunnel_name}}
"vni": {{vni}}
"scope": {{"default"}} (OPTIONAL)
"peer_list": {{vnet_name_list}} (OPTIONAL)
INTERFACE|{{intf_name}}
"vnet_name": {{vnet_name}}
INTERFACE|{{intf_name}}|{{prefix}}
{ }
VLAN_INTERFACE|{{intf_name}}
"vnet_name": {{vnet_name}}
VLAN_INTERFACE|{{intf_name}}|{{prefix}}
{ }
NEIGH_TABLE|{{intf_name}}|{{ip_address}}
"family": "IPv4"
; Defines schema for VXLAN Tunnel configuration attributes
key = VXLAN_TUNNEL:name ; Vxlan tunnel configuration
; field = value
SRC_IP = ipv4 ; Ipv4 source address, lpbk address for tunnel term
DST_IP = ipv4 ; Ipv4 destination address, for P2P
;value annotations
ipv4 = dec-octet "." dec-octet "." dec-octet "." dec-octet
dec-octet = DIGIT ; 0-9
/ %x31-39 DIGIT ; 10-99
/ "1" 2DIGIT ; 100-199
/ "2" %x30-34 DIGIT ; 200-249
; Defines schema for VXLAN Tunnel map configuration attributes
key = VXLAN_TUNNEL:tunnel_name:name ; Vxlan tunnel configuration
; field = value
VNI = DIGITS ; 1 to 16 million values
VLAN = 1\*4DIGIT ; 1 to 4094 Vlan id
; Defines schema for VNet configuration attributes
key = VNET:name ; Vnet name
; field = value
VXLAN_TUNNEL = tunnel_name ; refers to the Vxlan tunnel name
VNI = DIGITS ; 1 to 16 million VNI values
SCOPE = Vnet Scope ; Whether to use default or non-default VRF
PEER_LIST = \*vnet_name ; vnet names seperate by ","
(empty indicates no peering)
; Defines schema for VNet Interface configuration attributes
key = INTERFACE:name ; Vnet interface name. This can be port, vlan
or port-channel interface
; field = value
VNET_NAME = vnet_name ; vnet name where the interface belongs to
; Defines schema for VNet Interface configuration attributes
key = INTERFACE:name:prefix ; Vnet interface name with IP prefix. No change to
existing schema.
; field = value
; Defines schema for VNet Neighbor configuration attributes
key = NEIGH_TABLE:name:ip_address ; Vnet neighbor with IP address. Swss shall resolve
the mac addresss for this configuration
; field = value
family = IPv4/IPv6 ; Address family
Please refer to the schema document for details on value annotations.
Two new tables would be introduced to specify routes and tunnel end points in VNet domain.
VNET_ROUTE_TABLE:{{vnet_name}}:{{prefix}}
"nexthop": {{ip_address}} (OPTIONAL)
"ifname": {{intf_name}}
VNET_ROUTE_TUNNEL_TABLE:{{vnet_name}}:{{prefix}}
"endpoint": {{ip_address}}
"mac_address":{{mac_address}} (OPTIONAL)
"vni": {{vni}}(OPTIONAL)
VXLAN_FDB_TABLE::{{tunnel_name}}:{{vni_id}}:{{mac_address}}
"remote_vtep": {{ip_address}}
VRFMgrD creates the following VNET Table
VNET_TABLE:{{vnet_name}}
"vxlan_tunnel": {{tunnel_name}}
"vni": {{vni}}
"scope": {{"default"}}
"peer_list": {{ vnet_name_list }}
; Defines schema for VNet Route table attributes
key = VNET_ROUTE_TABLE:vnet_name:prefix ; Vnet route table with prefix
; field = value
NEXTHOP = ipv4 ; Nexthop IP address
IFNAME = ifname ; Interface name
; Defines schema for VNet Route tunnel table attributes
key = VNET_ROUTE_TUNNEL_TABLE:vnet_name:prefix ; Vnet route tunnel table with prefix
; field = value
ENDPOINT = ipv4 ; Host VM IP address
MAC_ADDRESS = 12HEXDIG ; Inner dest mac in encapsulated packet (Optional)
VNI = DIGITS ; VNI value in encapsulated packet (Optional)
; Defines FDB entries for remote VTEP
key = VXLAN_FDB_TABLE:tunnel_name:vni_id:mac_address ; Remotely learnt mac-address
REMOTE_VTEP = ipv4 ; Remote VTEP where the host resides
; Defines schema for VXLAN VRF Tunnel map attributes
key = VXLAN_TUNNEL:tunnel_name:name ; Vxlan tunnel map
; field = value
VNI = DIGITS ; 1 to 16 million values
VRF = vrf_name ; VRF name
; Defines schema for VNET Table attributes
key = VNET_TABLE:name ; VNet table name
; field = value
VXLAN_TUNNEL = tunnel_name ; refers to the Vxlan tunnel name
VNI = DIGITS ; 1 to 16 million VNI values
PEER_LIST = \*vnet_name ; vnet names seperate by ","
(empty indicates no peering)
Following orchagents shall be modified. Flow diagrams are captured in a later section.
This is the major subsystem for Vxlan that handles configuration request. Vxlanorch creates the tunnel and attaches encap and decap mappers. Seperate tunnels are created for L2 Vxlan and L3 Vxlan and can attach different VLAN/VNI or VRF/VNI to respective tunnel.
VrfMgrD gets the VNET Table config and creates the L3mdev interface in kernel. VrfMgrD updates the APP_DB with VNET_TABLE later to be used by VnetOrch. VrfMgrD also updates the STATE_DB for the status of VRF created.
VrfOrch creates VRF in SAI from APP_DB updates from VrfMgrD for the regular VRF configurations. RouterOrch fetch this information for programming routes based on VRF.
VnetOrch is another major component introduced for the VNet usecase. VnetOrch creates ingress/Egress (based on context) VRF or BRIDGE in SAI for a VNet and also maintains the peering list. VnetOrch call VxlanOrch API to create the encap/decap mappers for the VNet. VnetRouterOrch fetch the VRF and peering information for replicating the routes, if applicable. When app-route-table has new updates for the VNet, VnetRouteOrch gets the VNet objects (VRF or BRIDGE) from VnetOrch and programs SAI.
- VNET_ROUTE_TABLE is translated to create subnet/local route entries
- VNET_ROUTE_TUNNEL_TABLE is translated to create routes with tunnel nexthop
IntfMgrD creates the kernel routing interface and enslave it to the VRF L3mdev. IntfMgrD waits for VRF creation update in STATE_DB and updates the APP_DB INTF_TABLE with the Vrf/VNet name.
Add VrfOrch as a member of IntfsOrch. IntfsOrch creates Router Interfaces based on interface table (INTF_TABLE) and the VRF information. For VNet usecase, IntfOrch calls VnetOrch API to handle router interface creation.
Add VxlanOrch as a member of FDBOrch. For FDB entries learnt on remote VTEP, app-fdb-table shall be updated and programmed to SAI by getting the BridgeIf/RemoteVTEP mapping from VxlanOrch. (TBD)
The overall data flow diagram is captured below for all TABLE updates.
Shown below table represents main SAI attributes which shall be used for Vxlan
Vxlan component | SAI attribute |
---|---|
Vxlan Tunnel type | SAI_TUNNEL_TYPE_VXLAN |
Encap mapper | SAI_TUNNEL_MAP_TYPE_VIRTUAL_ROUTER_ID_TO_VNI |
Decap mapper | SAI_TUNNEL_MAP_TYPE_VNI_TO_VIRTUAL_ROUTER_ID |
Nexthop tunnel | SAI_NEXT_HOP_TYPE_TUNNEL_ENCAP |
Tunnel term type | SAI_TUNNEL_TERM_TABLE_ENTRY_TYPE_P2MP |
Vxlan MAC | SAI_SWITCH_ATTR_VXLAN_DEFAULT_ROUTER_MAC |
Vxlan port | SAI_SWITCH_ATTR_VXLAN_DEFAULT_PORT |
Commands summary (Phase #2):
- config vxlan <vxlan_name> vlan <vlan_id> vni <vni_id>
- config vxlan <vxlan_name> src_if <interface>
- config vxlan <vxlan_name> vlan <vlan_id> flood vtep <ip1, ip2, ip3>
- show mac vxlan <vxlan_name> <vni_id>
- show vxlan <vxlan_name>
vxlan
Usage: vxlan [OPTIONS] COMMAND [ARGS]...
Utility to operate with Vxlan configuration.
Options:
--help Show this message and exit.
Commands:
config Set Vxlan configuration.
show Show Vxlan information.
Config command should be extended in order to add "vxlan" alias
Usage: config [OPTIONS] COMMAND [ARGS]...
SONiC command line - 'config' command
Options:
--help Show this message and exit.
Commands:
...
vxlan vxlan related configuration.
Show command should be extended in order to add "vxlan" alias
show
Usage: show [OPTIONS] COMMAND [ARGS]...
SONiC command line - 'show' command
Options:
-?, -h, --help Show this message and exit.
Commands:
...
vxlan Show vxlan related information
TBD
TBD
Vnet 1
□ VNI - 2000
□ VMs
VM1. CA: 100.100.1.1/32, PA: 10.10.10.1, MAC: 00:00:00:00:01:02
□ BM1
Connected on Ethernet1
Ip: 100.100.3.2/24
MAC: 00:00:AA:AA:AA:01
Vnet 2
□ VNI - 3000
□ VMs
VM2. CA: 100.100.2.1/32, PA: 10.10.10.2, MAC: 00:00:00:00:03:04
□ BM2
Connected on Ethernet2 in Vlan2000
Ip: 100.100.4.2/24
MAC: 00:00:AA:AA:AA:02
{
"VXLAN_TUNNEL": {
"tunnel1": {
"src_ip": "10.10.10.10"
}
},
"VNET": {
"Vnet_2000": {
"vxlan_tunnel": "tunnel1",
"vni": "2000",
"peer_list": ""
}
},
"INTERFACE": {
"Ethernet1": {
"vnet_name": "Vnet_2000"
}
},
"INTERFACE": {
"Ethernet1|100.100.3.1/24": {}
}
"NEIGH": {
"Ethernet1|100.100.3.2": {
"family": "IPv4"
},
"VNET": {
"Vnet_3000": {
"vxlan_tunnel": "tunnel1",
"vni": "3000",
"peer_list": "Vnet_2000"
}
},
"VLAN": {
"Vlan2000": {
"vlanid": 2000
}
},
"VLAN_MEMBER": {
"Vlan2000|Ethernet2": {
"tagging_mode": "tagged"
}
},
"VLAN_INTERFACE": {
"Vlan2000": {
"vnet_name": "Vnet_3000"
}
},
"VLAN_INTERFACE": {
"Vlan2000|100.100.4.1/24": {}
},
"NEIGH": {
"Vlan2000|100.100.4.2": {
"family": "IPv4"
},
{
"VNET_ROUTE_TABLE:Vnet_2000:100.100.3.0/24": {
"ifname": "Ethernet1",
},
"VNET_ROUTE_TABLE:Vnet_3000:100.100.4.0/24": {
"ifname": "Vlan2000",
},
"VNET_ROUTE_TUNNEL_TABLE:Vnet_2000:100.100.1.1/32": {
"endpoint": "10.10.10.1",
},
"VNET_ROUTE_TUNNEL_TABLE:Vnet_3000:100.100.2.1/32": {
"endpoint": "10.10.10.2",
"mac_address": "00:00:00:00:03:04"
},
}