Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fast-reboot fails when LAG has VLAN membership #4793

Closed
nazariig opened this issue Jun 17, 2020 · 1 comment · Fixed by sonic-net/sonic-utilities#1393
Closed

fast-reboot fails when LAG has VLAN membership #4793

nazariig opened this issue Jun 17, 2020 · 1 comment · Fixed by sonic-net/sonic-utilities#1393
Labels
Bug 🐛 Triaged this issue has been triaged

Comments

@nazariig
Copy link
Collaborator

nazariig commented Jun 17, 2020

Description

fast-reboot fails while calling /usr/bin/fast-reboot-dump.py:

Jun 17 18:48:23.632648 sonic ERR fast-reboot-dump: Got an exception '24:8a:07:7e:41:80': Traceback: Traceback (most recent call last):#012  File "/usr/bin/fast-reboot-dump.py", line 301, in <module>#012    res = main()#012  File "/usr/bin/fast-reboot-dump.py", line 294, in main#012    garp_send(arp_entries, map_mac_ip_per_vlan)#012  File "/usr/bin/fast-reboot-dump.py", line 229, in garp_send#012    src_ifs = {map_mac_ip_per_vlan[vlan_name][dst_mac] for vlan_name, dst_mac, _ in arp_entries}#012  File "/usr/bin/fast-reboot-dump.py", line 229, in <setcomp>#012    src_ifs = {map_mac_ip_per_vlan[vlan_name][dst_mac] for vlan_name, dst_mac, _ in arp_entries}#012KeyError: '24:8a:07:7e:41:80'

The issue happens when /usr/bin/fast-reboot-dump.py tries to process VLAN LAG members:

def get_fdb(db, vlan_name, vlan_id, bridge_id_2_iface):
    fdb_types = {
      'SAI_FDB_ENTRY_TYPE_DYNAMIC': 'dynamic',
      'SAI_FDB_ENTRY_TYPE_STATIC' : 'static'
    }

    bvid = get_vlan_oid_by_vlan_id(db, vlan_id)
    available_macs = set()
    map_mac_ip = {}
    fdb_entries = []
    keys = db.keys(db.ASIC_DB, 'ASIC_STATE:SAI_OBJECT_TYPE_FDB_ENTRY:{*\"bvid\":\"%s\"*}' % bvid)
    keys = [] if keys is None else keys
    for key in keys:
        key_obj = json.loads(key.replace('ASIC_STATE:SAI_OBJECT_TYPE_FDB_ENTRY:', ''))
        mac = str(key_obj['mac'])
        if not is_mac_unicast(mac):
            continue
        available_macs.add((vlan_name, mac.lower()))
        fdb_mac = mac.replace(':', '-')
        # get attributes
        value = db.get_all(db.ASIC_DB, key)
        fdb_type = fdb_types[value['SAI_FDB_ENTRY_ATTR_TYPE']]
        if value['SAI_FDB_ENTRY_ATTR_BRIDGE_PORT_ID'] not in bridge_id_2_iface:
            continue
        fdb_port = bridge_id_2_iface[value['SAI_FDB_ENTRY_ATTR_BRIDGE_PORT_ID']]

        obj = {
          'FDB_TABLE:Vlan%d:%s' % (vlan_id, fdb_mac) : {
            'type': fdb_type,
            'port': fdb_port,
          },
          'OP': 'SET'
        }

        fdb_entries.append(obj)
        map_mac_ip[mac.lower()] = fdb_port

    return fdb_entries, available_macs, map_mac_ip

vlan_id:
10

key_obj:
{u'mac': u'24:8A:07:7E:41:80', u'bvid': u'oid:0x260000000005bb', u'switch_id': u'oid:0x21000000000000'}

value:
{'SAI_FDB_ENTRY_ATTR_BRIDGE_PORT_ID': 'oid:0x3a0000000005bd', 'SAI_FDB_ENTRY_ATTR_PACKET_ACTION': 'SAI_PACKET_ACTION_FORWARD', 'SAI_FDB_ENTRY_ATTR_TYPE': 'SAI_FDB_ENTRY_TYPE_DYNAMIC'}

bridge_id_2_iface:
{'oid:0x3a00000000064c': 'Ethernet56'}

value['SAI_FDB_ENTRY_ATTR_BRIDGE_PORT_ID']:
0x3a0000000005bd
def garp_send(arp_entries, map_mac_ip_per_vlan):
    ETH_P_ALL = 0x03

    # generate source ip addresses for arp packets
    src_ip_addrs = {vlan_name:get_iface_ip_addr(vlan_name) for vlan_name,_,_ in arp_entries}

    # generate source mac addresses for arp packets
    src_ifs = {map_mac_ip_per_vlan[vlan_name][dst_mac] for vlan_name, dst_mac, _ in arp_entries}

arp_entries:
[('Vlan10', '24:8a:07:7e:41:80', '10.0.1.1'), ('Vlan10', '24:8a:07:7e:41:80', '2000')]

map_mac_ip_per_vlan:
{'Vlan23': {}, 'Vlan10': {}}

SONiC DB info:

root@sonic:/home/admin# redis-cli -n 1 KEYS '*' | grep '0x3a0000000005bd'
ASIC_STATE:SAI_OBJECT_TYPE_BRIDGE_PORT:oid:0x3a0000000005bd

root@sonic:/home/admin# redis-cli -n 1 HGETALL 'ASIC_STATE:SAI_OBJECT_TYPE_BRIDGE_PORT:oid:0x3a0000000005bd'
1) "SAI_BRIDGE_PORT_ATTR_TYPE"
2) "SAI_BRIDGE_PORT_TYPE_PORT"
3) "SAI_BRIDGE_PORT_ATTR_PORT_ID"
4) "oid:0x20000000005b5"
5) "SAI_BRIDGE_PORT_ATTR_ADMIN_STATE"
6) "true"
7) "SAI_BRIDGE_PORT_ATTR_FDB_LEARNING_MODE"
8) "SAI_BRIDGE_PORT_FDB_LEARNING_MODE_HW"

root@sonic:/home/admin# redis-cli -n 1 KEYS '*' | grep '0x20000000005b5'
ASIC_STATE:SAI_OBJECT_TYPE_LAG:oid:0x20000000005b5

The result of operation is invalid since bridge_id_2_iface doesn't have mapping for LAG bridge interfaces.

Steps to reproduce the issue:

  1. Connect two DUT with LAG
  2. Add LAG to VLAN RIF as tagged member
  3. Setup BGP session over VLAN RIF

Describe the results you received:
fast-reboot fails

Jun 17 18:48:23.632648 sonic ERR fast-reboot-dump: Got an exception '24:8a:07:7e:41:80': Traceback: Traceback (most recent call last):#012  File "/usr/bin/fast-reboot-dump.py", line 301, in <module>#012    res = main()#012  File "/usr/bin/fast-reboot-dump.py", line 294, in main#012    garp_send(arp_entries, map_mac_ip_per_vlan)#012  File "/usr/bin/fast-reboot-dump.py", line 229, in garp_send#012    src_ifs = {map_mac_ip_per_vlan[vlan_name][dst_mac] for vlan_name, dst_mac, _ in arp_entries}#012  File "/usr/bin/fast-reboot-dump.py", line 229, in <setcomp>#012    src_ifs = {map_mac_ip_per_vlan[vlan_name][dst_mac] for vlan_name, dst_mac, _ in arp_entries}#012KeyError: '24:8a:07:7e:41:80'

Describe the results you expected:
fast-reboot shouldn't fail

Additional information you deem important (e.g. issue happens only occasionally):

Output of show version:

SONiC Software Version: SONiC.201911.113-093d7731
Distribution: Debian 9.12
Kernel: 4.9.0-11-2-amd64
Build commit: 093d7731
Build date: Sun Jun 14 03:45:40 UTC 2020
Built by: johnar@jenkins-worker-8

Platform: x86_64-mlnx_msn2100-r0
HwSKU: ACS-MSN2100
ASIC: mellanox
Uptime: 19:13:15 up 10:56,  3 users,  load average: 0.19, 0.32, 0.48

Docker images:
REPOSITORY                    TAG                   IMAGE ID            SIZE
docker-syncd-mlnx             201911.113-093d7731   d11fa0617162        386MB
docker-syncd-mlnx             latest                d11fa0617162        386MB
docker-router-advertiser      201911.113-093d7731   da9d108cabc9        285MB
docker-router-advertiser      latest                da9d108cabc9        285MB
docker-sonic-mgmt-framework   201911.113-093d7731   deba713cbabb        425MB
docker-sonic-mgmt-framework   latest                deba713cbabb        425MB
docker-platform-monitor       201911.113-093d7731   cedcaec571f9        647MB
docker-platform-monitor       latest                cedcaec571f9        647MB
docker-fpm-frr                201911.113-093d7731   578bdd07c4c0        330MB
docker-fpm-frr                latest                578bdd07c4c0        330MB
docker-sflow                  201911.113-093d7731   3c8863e5a96a        310MB
docker-sflow                  latest                3c8863e5a96a        310MB
docker-lldp-sv2               201911.113-093d7731   15d73e30c0e9        307MB
docker-lldp-sv2               latest                15d73e30c0e9        307MB
docker-dhcp-relay             201911.113-093d7731   65346705abce        295MB
docker-dhcp-relay             latest                65346705abce        295MB
docker-database               201911.113-093d7731   b98668b03299        285MB
docker-database               latest                b98668b03299        285MB
docker-teamd                  201911.113-093d7731   d983d6a99831        310MB
docker-teamd                  latest                d983d6a99831        310MB
docker-snmp-sv2               201911.113-093d7731   0c821c3e62ce        343MB
docker-snmp-sv2               latest                0c821c3e62ce        343MB
docker-orchagent              201911.113-093d7731   ea84da2dedc9        328MB
docker-orchagent              latest                ea84da2dedc9        328MB
docker-nat                    201911.113-093d7731   0ade55a3c7a3        311MB
docker-nat                    latest                0ade55a3c7a3        311MB
docker-sonic-telemetry        201911.113-093d7731   9f3fe08edde6        349MB
docker-sonic-telemetry        latest                9f3fe08edde6        349MB

Attach debug file sudo generate_dump:

root@sonic:/home/admin# show int status
      Interface        Lanes    Speed    MTU    Alias             Vlan    Oper    Admin             Type    Asym PFC
---------------  -----------  -------  -----  -------  ---------------  ------  -------  ---------------  ----------
      Ethernet0            0      25G   9100     etp1           routed    down     down  QSFP28 or later         N/A
      Ethernet4            4      25G   9100     etp2           routed    down     down   SFP/SFP+/SFP28         N/A
      Ethernet8            8      25G   9100     etp3           routed    down     down   SFP/SFP+/SFP28         N/A
     Ethernet12           12      25G   9100     etp4           routed    down     down   SFP/SFP+/SFP28         N/A
     Ethernet16           16      25G   9100     etp5           routed    down     down   SFP/SFP+/SFP28         N/A
     Ethernet20           20      25G   9100     etp6           routed    down     down   SFP/SFP+/SFP28         N/A
     Ethernet24           24      25G   9100     etp7           routed    down     down   SFP/SFP+/SFP28         N/A
     Ethernet28           28      25G   9100     etp8           routed    down     down   SFP/SFP+/SFP28         N/A
     Ethernet32           32      25G   9100     etp9  PortChannel0001      up       up   SFP/SFP+/SFP28         N/A
     Ethernet36           36      25G   9100    etp10  PortChannel0001      up       up   SFP/SFP+/SFP28         N/A
     Ethernet40           40      25G   9100   etp11a  PortChannel0002    down     down  QSFP28 or later         N/A
     Ethernet41           41      25G   9100   etp11b  PortChannel0002    down     down  QSFP28 or later         N/A
     Ethernet42           42      25G   9100   etp11c           routed    down     down  QSFP28 or later         N/A
     Ethernet43           43      25G   9100   etp11d           routed    down     down  QSFP28 or later         N/A
     Ethernet44           44      25G   9100   etp12a           routed    down     down  QSFP28 or later         N/A
     Ethernet45           45      25G   9100   etp12b           routed    down     down  QSFP28 or later         N/A
     Ethernet46           46      25G   9100   etp12c           routed    down     down  QSFP28 or later         N/A
     Ethernet47           47      25G   9100   etp12d           routed    down     down  QSFP28 or later         N/A
     Ethernet48  48,49,50,51     100G   9100    etp13           routed    down     down  QSFP28 or later         N/A
     Ethernet52  52,53,54,55     100G   9100    etp14           routed    down     down  QSFP28 or later         N/A
     Ethernet56  56,57,58,59     100G   9100    etp15            trunk    down     down  QSFP28 or later         N/A
     Ethernet60  60,61,62,63     100G   9100    etp16           routed    down     down  QSFP28 or later         N/A
PortChannel0001          N/A      50G   9100      N/A           routed      up       up              N/A         N/A
PortChannel0002          N/A      50G   9100      N/A           routed    down       up              N/A         N/A

root@sonic:/home/admin# show int po
Flags: A - active, I - inactive, Up - up, Dw - Down, N/A - not available,
       S - selected, D - deselected, * - not synced
  No.  Team Dev         Protocol     Ports
-----  ---------------  -----------  ---------------------------
 0001  PortChannel0001  LACP(A)(Up)  Ethernet32(S) Ethernet36(S)
 0002  PortChannel0002  LACP(A)(Dw)  Ethernet41(D) Ethernet40(D)

root@sonic:/home/admin# show vlan brief
+-----------+----------------+-----------------+----------------+-----------------------+
|   VLAN ID | IP Address     | Ports           | Port Tagging   | DHCP Helper Address   |
+===========+================+=================+================+=======================+
|        10 | 10.0.1.2/24    | PortChannel0001 | tagged         |                       |
|           | 2000:1::2/64   |                 |                |                       |
+-----------+----------------+-----------------+----------------+-----------------------+
|        23 | 100.2.3.1/24   | Ethernet56      | tagged         |                       |
|           | 2000:2:3::1/64 |                 |                |                       |
+-----------+----------------+-----------------+----------------+-----------------------+

root@sonic:/home/admin# show ip int
Interface        Master    IPv4 address/mask    Admin/Oper    BGP Neighbor    Neighbor IP
---------------  --------  -------------------  ------------  --------------  -------------
Ethernet60                 100.2.4.1/24         down/down     IXIA2.4         100.2.4.2
Loopback0                  1.1.1.2/32           up/up         N/A             N/A
PortChannel0002            10.0.2.2/24          up/down       Aux             10.0.2.1
Vlan10                     10.0.1.2/24          up/up         Aux             10.0.1.1
Vlan23                     100.2.3.1/24         up/up         IXIA2.3         100.2.3.2
docker0                    240.127.1.1/24       up/down       N/A             N/A
eth0                       10.210.25.44/22      up/up         N/A             N/A
lo                         127.0.0.1/8          up/up         N/A             N/A

root@sonic:/home/admin# show ip bgp su

IPv4 Unicast Summary:
BGP router identifier 1.1.1.2, local AS number 65200 vrf-id 0
BGP table version 9
RIB entries 9, using 1656 bytes of memory
Peers 4, using 82 KiB of memory
Peer groups 4, using 256 bytes of memory

Neighbor        V         AS MsgRcvd MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd   NeighborName
10.0.1.1        4      65100    9725    9726        0    0    0 02:42:00            3   Aux
10.0.2.1        4      65100    2513    2516        0    0    0 02:41:34       Active   Aux
100.2.3.2       4      65023       0       0        0    0    0    never       Active   IXIA2.3
100.2.4.2       4      65024       0       0        0    0    0    never       Active   IXIA2.4

Total number of neighbors 4
@liat-grozovik
Copy link
Collaborator

@anshuv-mfst for some unknown reason i cannot assign the issue to @shlomibitton and he cannot self assign it. Could you please help?

lguohan pushed a commit to sonic-net/sonic-utilities that referenced this issue Apr 2, 2021
…#1393)

Add PortChannels to the list of interfaces (port_id_2_iface) to support FDB dump for PortChannel in a VLAN group.

Fix sonic-net/sonic-buildimage#4793

- How I did it

- Get LAG ID from the DB.
- Find the LAG name from APP DB.
- Add it to the list of 'port_id_2_iface' to be used.

- How to verify it
Reproduce the issue mentioned on this PR and try to run fast-reboot with this fix.

- Previous command output (if the output of a command-line utility has changed)
Traceback:
src_ifs = {map_mac_ip_per_vlan[vlan_name][dst_mac] for vlan_name, dst_mac, _ in arp_entries}
KeyError: 'b8:59:9f:a8:e2:00'

Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>
qiluo-msft pushed a commit to sonic-net/sonic-utilities that referenced this issue Apr 6, 2021
… [201911] (#1547)

**- What I did**
Add PortChannels to the list of interfaces (port_id_2_iface) to support FDB dump for PortChannel in a VLAN group.
Fixes sonic-net/sonic-buildimage#4793
This PR is the 201911 version of the original PR: #1393

**- How I did it**
* Get LAG ID from the DB.
* Find the LAG name from APP DB.
* Add it to the list of 'port_id_2_iface' to be used.

**- How to verify it**
Reproduce the issue mentioned on this PR and try to run fast-reboot with this fix.

**- Previous command output (if the output of a command-line utility has changed)**
Traceback:
src_ifs = {map_mac_ip_per_vlan[vlan_name][dst_mac] for vlan_name, dst_mac, _ in arp_entries}
KeyError: 'b8:59:9f:a8:e2:00'

**- New command output (if the output of a command-line utility has changed)**
Success .
yxieca pushed a commit to sonic-net/sonic-utilities that referenced this issue Apr 8, 2021
…#1393)

Add PortChannels to the list of interfaces (port_id_2_iface) to support FDB dump for PortChannel in a VLAN group.

Fix sonic-net/sonic-buildimage#4793

- How I did it

- Get LAG ID from the DB.
- Find the LAG name from APP DB.
- Add it to the list of 'port_id_2_iface' to be used.

- How to verify it
Reproduce the issue mentioned on this PR and try to run fast-reboot with this fix.

- Previous command output (if the output of a command-line utility has changed)
Traceback:
src_ifs = {map_mac_ip_per_vlan[vlan_name][dst_mac] for vlan_name, dst_mac, _ in arp_entries}
KeyError: 'b8:59:9f:a8:e2:00'

Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>
malletvapid23 added a commit to malletvapid23/Sonic-Utility that referenced this issue Aug 3, 2023
… (#1393)

Add PortChannels to the list of interfaces (port_id_2_iface) to support FDB dump for PortChannel in a VLAN group.

Fix sonic-net/sonic-buildimage#4793

- How I did it

- Get LAG ID from the DB.
- Find the LAG name from APP DB.
- Add it to the list of 'port_id_2_iface' to be used.

- How to verify it
Reproduce the issue mentioned on this PR and try to run fast-reboot with this fix.

- Previous command output (if the output of a command-line utility has changed)
Traceback:
src_ifs = {map_mac_ip_per_vlan[vlan_name][dst_mac] for vlan_name, dst_mac, _ in arp_entries}
KeyError: 'b8:59:9f:a8:e2:00'

Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug 🐛 Triaged this issue has been triaged
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants