Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FDB] L2 traffic from PTF container couldn't reach DUT #585

Open
lvphuc133 opened this issue Apr 24, 2018 · 11 comments
Open

[FDB] L2 traffic from PTF container couldn't reach DUT #585

lvphuc133 opened this issue Apr 24, 2018 · 11 comments
Labels

Comments

@lvphuc133
Copy link

Description
FDB test failed because L2 traffic from PTF container couldn't reach DUT.

Steps to reproduce the issue:

  1. Run the test script and pause it before sending the L2 traffic

  2. Send the traffic using PTF script manually
    ptf --test-dir ptftests fdb_test.FdbTest --platform-dir ptftests --platform remote -t "testbed_type='t0';router_mac='a4:8c:db:b9:b3:00';fdb_info='/root/fdb_info.txt';vlan_ip='192.168.0.1'" --relax --debug info --log-file /tmp/fdb_test.FdbTest.2018-04-23-18:28:36.log --disable-vxlan --disable-geneve --disable-erspan --disable-mpls --disable-nvgre 2>&1

  3. Capture traffic on the testbed server trunk port connected to the root fanout switch
    #sudo tcpdump -ne -i ens1f0
    18:23:07.978961 24:8a:07:92:7a:00 > a4:8c:db:b9:b3:00, ethertype 802.1Q (0x8100), length 64: vlan 1708, p 0, ethertype 0x1234,
    18:23:07.978975 24:8a:07:92:7a:01 > a4:8c:db:b9:b3:00, ethertype 802.1Q (0x8100), length 64: vlan 1683, p 0, ethertype 0x1234,
    18:23:07.978978 24:8a:07:92:7a:02 > a4:8c:db:b9:b3:00, ethertype 802.1Q (0x8100), length 64: vlan 1707, p 0, ethertype 0x1234,

18:23:07.978982 24:8a:07:92:7a:03 > a4:8c:db:b9:b3:00, ethertype 802.1Q (0x8100), length 64: vlan 1684, p 0, ethertype 0x1234,

a4:8c:db:b9:b3:00 is DUT MAC
24:8a:07:92:7a:xx is Server MACs (PTF MACs)

The L2 traffic sent out to test the L2 forwarding
18:23:09.190015 24:8a:07:92:7a:00 > 24:8a:07:92:7a:01, ethertype 802.1Q (0x8100), length 64: vlan 1681, p 0, ethertype 0x1234,

  1. Capture traffic on the DUT for port Ethernet0
    => There is no packets captured ???? So the problem should be here. Traffic sent from PTF should reach DUT at this port but there is nothing found here.

  2. Try to send another traffic
    I tried to use a Python script to send ICMP packet to Vlan1000 interface (IP 192.168.0.1 MAC = DUT MAC) from PTF container.
    And capture the traffic using tcpdump similar to step#4 and I can see the ICMP packet sent from PTF container.

Describe the results you received:
Don't see the L2 packet to DMAC 24:8a:07:92:7a:01 received on port eth1 (Vlan = 1682) on the PTF

Describe the results you expected:
L2 packet to DMAC 24:8a:07:92:7a:01 should arrive port eth1 of PTF (Vlan = 1682)

Can someone show me what should cause this problem?
Did I miss anything in test testbed or in script?

Thank you.

Additional information you deem important:

**Output of `show version`:**

```
(paste your output here)
```

**Attach debug file `sudo generate_dump`:**

```
(paste your output here)
```
@stcheng
Copy link
Contributor

stcheng commented Apr 24, 2018

Packets shall not be captured on the DUT. Since it is the data plane traffic, it won't be trapped and captured to the CPU. ICMP packets, on the contrary, will be captured and send up to CPU and that is why you could see the ICMP packets. Could you check the counters of the DUT and make sure that the DUT receives the packets? You might need to find out specifically on which step the packets get lost. Meanwhile, you will also need to check the NIC on your server to see if it supports our test scenario.

@lvphuc133
Copy link
Author

Thank you stcheng,

That was what I thought too. At least I can make sure the PTF container has the correct port mapping and Vlan tagging with the ICMP packets.

I modified the PTF script to send 10-100 L2 packets but didn't see the same counter hits on both the Fanout switches and the SONiC DUT. So look like testbed server failed to send the L2 traffic out.

My server NIC is Mellanox CX-4.

Do you have any suggestion for my case?

@keboliu
Copy link
Contributor

keboliu commented Apr 27, 2018

@lvphuc133 are you able to capture those packets on the server port which connected to the fanout? If yes and at the same time you can not see the counter change on the fanout trunk port, maybe you need to check the server, like upgrade the NIC firmware and MLNX_OFED, etc.

@lvphuc133
Copy link
Author

@keboliu

I did more debugging and found the following:

  • At first the script sends traffic with DMAC = DUT's MAC to populate the FDB table on the DUT. I can capture these packets on the wire and counters on both server and Fanout show correct numbers.
    And DUT learns all MACs correctly

FDB table on the DUT
$ show mac
No. Vlan MacAddress Port Type


1    1000  24:8A:07:92:7A:0C  Ethernet48   Dynamic
2    1000  24:8A:07:92:7A:01  Ethernet4    Dynamic
3    1000  24:8A:07:92:7A:16  Ethernet88   Dynamic
4    1000  24:8A:07:92:7A:04  Ethernet16   Dynamic
5    1000  24:8A:07:92:7A:0D  Ethernet52   Dynamic
6    1000  24:8A:07:92:7A:00  Ethernet0    Dynamic
7    1000  24:8A:07:92:7A:0E  Ethernet56   Dynamic
8    1000  24:8A:07:92:7A:1A  Ethernet104  Dynamic
9    1000  24:8A:07:92:7A:15  Ethernet84   Dynamic

10 1000 24:8A:07:92:7A:08 Ethernet32 Dynamic
11 1000 24:8A:07:92:7A:09 Ethernet36 Dynamic
12 1000 24:8A:07:92:7A:0B Ethernet44 Dynamic
13 1000 24:8A:07:92:7A:07 Ethernet28 Dynamic
14 1000 24:8A:07:92:7A:1B Ethernet108 Dynamic
15 1000 24:8A:07:92:7A:18 Ethernet96 Dynamic
16 1000 24:8A:07:92:7A:11 Ethernet68 Dynamic
17 1000 24:8A:07:92:7A:17 Ethernet92 Dynamic
18 1000 24:8A:07:92:7A:06 Ethernet24 Dynamic
19 1000 24:8A:07:92:7A:13 Ethernet76 Dynamic
20 1000 24:8A:07:92:7A:3A Ethernet108 Dynamic
21 1000 24:8A:07:92:7A:14 Ethernet80 Dynamic
22 1000 24:8A:07:92:7A:10 Ethernet64 Dynamic
23 1000 24:8A:07:92:7A:03 Ethernet12 Dynamic
24 1000 24:8A:07:92:7A:19 Ethernet100 Dynamic
25 1000 24:8A:07:92:7A:12 Ethernet72 Dynamic
26 1000 24:8A:07:92:7A:0F Ethernet60 Dynamic
27 1000 24:8A:07:92:7A:05 Ethernet20 Dynamic
28 1000 24:8A:07:92:7A:0A Ethernet40 Dynamic
29 1000 24:8A:07:92:7A:02 Ethernet8 Dynamic

And the script also changed PTF ports' MAC addresses as following

root@79fe7233672f:~# ifconfig | grep eth
eth0 Link encap:Ethernet HWaddr 24:8a:07:92:7a:00
eth1 Link encap:Ethernet HWaddr 24:8a:07:92:7a:01
eth2 Link encap:Ethernet HWaddr 24:8a:07:92:7a:02
eth3 Link encap:Ethernet HWaddr 24:8a:07:92:7a:03
eth4 Link encap:Ethernet HWaddr 24:8a:07:92:7a:04
eth5 Link encap:Ethernet HWaddr 24:8a:07:92:7a:05
eth6 Link encap:Ethernet HWaddr 24:8a:07:92:7a:06
eth7 Link encap:Ethernet HWaddr 24:8a:07:92:7a:07
eth8 Link encap:Ethernet HWaddr 24:8a:07:92:7a:08
eth9 Link encap:Ethernet HWaddr 24:8a:07:92:7a:09
eth10 Link encap:Ethernet HWaddr 24:8a:07:92:7a:0a
eth11 Link encap:Ethernet HWaddr 24:8a:07:92:7a:0b
eth12 Link encap:Ethernet HWaddr 24:8a:07:92:7a:0c
eth13 Link encap:Ethernet HWaddr 24:8a:07:92:7a:0d
eth14 Link encap:Ethernet HWaddr 24:8a:07:92:7a:0e
eth15 Link encap:Ethernet HWaddr 24:8a:07:92:7a:0f
eth16 Link encap:Ethernet HWaddr 24:8a:07:92:7a:10
eth17 Link encap:Ethernet HWaddr 24:8a:07:92:7a:11
eth18 Link encap:Ethernet HWaddr 24:8a:07:92:7a:12
eth19 Link encap:Ethernet HWaddr 24:8a:07:92:7a:13
eth20 Link encap:Ethernet HWaddr 24:8a:07:92:7a:14
eth21 Link encap:Ethernet HWaddr 24:8a:07:92:7a:15
eth22 Link encap:Ethernet HWaddr 24:8a:07:92:7a:16
eth23 Link encap:Ethernet HWaddr 24:8a:07:92:7a:17
eth24 Link encap:Ethernet HWaddr 24:8a:07:92:7a:18
eth25 Link encap:Ethernet HWaddr 24:8a:07:92:7a:19
eth26 Link encap:Ethernet HWaddr 24:8a:07:92:7a:1a
eth27 Link encap:Ethernet HWaddr 24:8a:07:92:7a:1b

  • Next the script sends L2 traffic with SMAC = PTF eth0's MAC and DMAC = PTF eth1's MAC.
    And it expects this packet to be received by port Ethernet0 on DUT and will be forwarded to port Ethernet4 and destine to port eth1 on PTF.

But somehow I don't know yet this packet is never sent out of my server NIC. I couldn't capture it on the wire.

I modified DMAC of the packet in the PTF script, if its DMAC is not in the PTF nic's MAC above then it will be able to be sent out of the server NIC and hit the DUT.
But if its DMAC is any of the PTF's MAC the it will be dropped somewhere.

I checked the OVS FDB table for all bridges and don't see any PTF MAC addresses

$sudo ovs-vsctl list-br | xargs -n1 sudo ovs-appctl fdb/show
port VLAN MAC Age
port VLAN MAC Age
LOCAL 0 46:4a:07:7e:73:4f 293
2 0 fa:d3:5e:f0:31:4b 293
2 0 7a:f3:97:ba:0b:4b 293
port VLAN MAC Age
LOCAL 0 fa:d3:5e:f0:31:4b 293
2 0 46:4a:07:7e:73:4f 293
2 0 7a:f3:97:ba:0b:4b 293
port VLAN MAC Age
LOCAL 0 7a:f3:97:ba:0b:4b 293
2 0 46:4a:07:7e:73:4f 293
2 0 fa:d3:5e:f0:31:4b 293
port VLAN MAC Age
LOCAL 0 2e:f2:76:3d:4a:4b 293
21 0 22:c0:2b:5a:d3:4c 293
21 0 26:6a:15:b4:f8:47 293
21 0 ae:99:c7:27:5d:42 293
2 0 52:54:00:e0:98:f2 22
21 0 52:54:00:f8:f7:48 22
21 0 52:54:00:db:a6:dc 8
21 0 52:54:00:33:76:bf 6
port VLAN MAC Age
port VLAN MAC Age
....

I have no idea of where my L2 traffic get dropped so far

Am I doing something wrong here?

@keboliu
Copy link
Contributor

keboliu commented Apr 28, 2018

@lvphuc133 I encountered a similar issue on one of my testbeds with FDB test case, DUT already learned the MACs of the PTF container interfaces, but the followed verification packets cannot be sent out to the fanout and loopback from the src port, resulting in the FDB case failure. Finally, it resolved by upgrading the NIC firmware and MLNX_OFD.

@lvphuc133
Copy link
Author

@keboliu
You are absolutely right. A firmware upgrade did help PTF send the L2 traffic out to the DUT and passed the test.
Thank you a lot. You saved me days of debugging.

@mooncat2
Copy link

@keboliu
@lvphuc133
i also encountered the issue. i upgraded my CX4 firmware to fw-ConnectX4-rel-12_22_1002-MCX415A-CCA_Ax-UEFI-14.15.19-FlexBoot-3.5.403.bin but the issue is still exist. Could you share your workable Mellanox CX-4 firmware version.

@mooncat2
Copy link

i added my Mellanox CX4 firmware below
Device #2:

Device Type: ConnectX4
Part Number: MCX415A-CCA_Ax
Description: ConnectX-4 EN network interface card; 100GbE single-port QSFP28; PCIe3.0 x16; ROHS R6
PSID: MT_2140110033
PCI Device Name: /dev/mst/mt4115_pciconf0
Base GUID: ec0d9a03006fa26a
Base MAC: ec0d9a6fa26a
Versions: Current Available
FW 12.22.1002 N/A
PXE 3.5.0403 N/A
UEFI 14.15.0019 N/A

@keboliu
Copy link
Contributor

keboliu commented Jun 25, 2018

@mooncat2
What I am using is ‘12.22.1002’

root@dev-r730-01:~# mlxfwmanager --query
Querying Mellanox devices firmware ...

Device #1:
----------
  Device Type:      ConnectX4
  Part Number:      MCX456A-ECA_Ax
  Description:      ConnectX-4 VPI adapter card; EDR IB (100Gb/s) and 100GbE; dual-port QSFP28; PCIe3.0 x16; ROHS R6
  PSID:             MT_2190110032
  PCI Device Name:  0000:04:00.0
  Base MAC:         0000248a07aba1ec
  Versions:         Current        Available     
     FW             12.22.1002     N/A           

  Status:           No matching image found

@mooncat2
Copy link

@keboliu
Thanks for your information.
We passed the fdb test and the MLNX_OFED_LINUX driver also need to be upgraded to latest for passing the test case.

@DhinakaranDayalan
Copy link

The Test case has the Following steps

• Step1 : Create a VLAN 1000 and Add Port(s) Ethernet1- Ethernet24 as Untagged VLAN Members. The L3 VLAN Interface is created with the L3 VLAN IP as 192.162.0.1/24
• Step2 : PTF (Traffic Generator) sends 10 packet to the VLAN member ports with the Incrementing SA to Check FDB Learning. (Expectation: All 24 Ports has 10 FDB learned Entries)
o DA = L3 VLAN Interface MAC
o SA = Incrementing MAC with Count 10.
o EtherType = 0x1234
• Step3 : PTF (Traffic Generator) sends the following packet to one of the VLAN member ports as Follows to check the FDB Switching based on learned entries.
o DA = L3 VLAN Interface MAC
o SA = PTF Interface MAC Address
o EtherType = 0x1234
• Step4: PTF (Traffic Generator) sends the following packet to DUT to check the FDB Switching based on learned entries.
o DA = PTF Interface MAC Address which is learned by DUT
o SA = PTF Interface MAC Address (This is Wrongly Populated by test script)
o EtherType = 0x1234
• Step5: Testcase checks whether all the MAC address are learned Properly and provides the result

Where the test Case Fails ?

The Test case fails in the Step 4.
PTF try to generate the packet with the DA = PTF Interface MAC Address which is learned by DUT (learning is done in Steps3) and SA = PTF Interface MAC Address.
The test case expects the packet is switched back to the PTF since the DA of the PTF interface is already in DUT, but the packet is not switched back due to DA MAC = SA MAC issue, causing the test case to Fail.

Deep Analysis of the Issue

In Step 4, the PTF expects the packet as shown Below.

    "========== EXPECTED ==========",
    "dst        : DestMACField         = 'ac:1f:6b:7e:2d:bb' (None)",
    "src        : SourceMACField       = 'ac:1f:6b:7e:2d:bb' (None)",
    "type       : XShortEnumField      = 4660            (0)",
    "--",
    "load       : StrField             = '0000000000000000000000000000000000000000000000' ('')",
    "--",
    "0000   AC 1F 6B 7E 2D BB AC 1F  6B 7E 2D BB 12 34 30 30   ..k~-...k~-..400",
    "0010   30 30 30 30 30 30 30 30  30 30 30 30 30 30 30 30   0000000000000000",
    "0020   30 30 30 30 30 30 30 30  30 30 30 30 30 30 30 30   0000000000000000",
    "0030   30 30 30 30 30 30 30 30  30 30 30 30               000000000000",
    "========== RECEIVED ==========",

If we closely examine the packet we see the packet is having a DA MAC = SA MAC which is wrong. This Mac entries are read by the test script from ARP table of the PTF server and filled in DA and SA fields. The test script expectation is send a Packet from one interface say eth0 , the packet will be switched back to eth1 by DUT and received in eth1 interface of PTF, since the MAC address of the eth1 is already learned in the DUT (in Step3) Since DA and SA are filled wrongly with the same MAC in the packet, the packet is dropped since DA and SA are pointing to same Ports after SA learning of the malformed packet (Source Port Suppression)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants