Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leftover portchannel found in the kernel after switch topology from T0 to T1-LAG #2760

Open
keboliu opened this issue Apr 10, 2019 · 3 comments
Assignees

Comments

@keboliu
Copy link
Collaborator

keboliu commented Apr 10, 2019

Description
After switch DUT topology from T0 to T1-LAG through "config reload", some portchannel in T0 topology still can be found in the kernel(from the result of "ifconfig"), only reboot the DUT can have these leftover portchannels cleared.

Switch from T1-LAG to T0 can see the similiar issue.

capture from the DUT:

  1. configured portchannels of T1 topo:
root@arc-mtbc-1001:/tmp# show interface portchannel 
Flags: A - active, I - inactive, Up - up, Dw - Down, N/A - not available, S - selected, D - deselected
  No.  Team Dev         Protocol     Ports
-----  ---------------  -----------  ---------------------------
 0002  PortChannel0002  LACP(A)(Up)  Ethernet0(S) Ethernet4(S)
 0005  PortChannel0005  LACP(A)(Up)  Ethernet8(S) Ethernet12(S)
 0008  PortChannel0008  LACP(A)(Up)  Ethernet20(S) Ethernet16(S)
 0011  PortChannel0011  LACP(A)(Up)  Ethernet28(S) Ethernet24(S)
 0014  PortChannel0014  LACP(A)(Up)  Ethernet32(S) Ethernet36(S)
 0017  PortChannel0017  LACP(A)(Up)  Ethernet44(S) Ethernet40(S)
 0020  PortChannel0020  LACP(A)(Up)  Ethernet52(S) Ethernet48(S)
 0023  PortChannel0023  LACP(A)(Up)  Ethernet60(S) Ethernet56(S)
  1. All portchannels from "ifconfig" command, including leftovers:
root@arc-mtbc-1001:/tmp# ifconfig | grep PortChannel
PortChannel0001: flags=4099<UP,BROADCAST,MULTICAST>  mtu 9100
PortChannel0002: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9100
PortChannel0003: flags=4099<UP,BROADCAST,MULTICAST>  mtu 9100
PortChannel0004: flags=4099<UP,BROADCAST,MULTICAST>  mtu 9100
PortChannel0005: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9100
PortChannel0008: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9100
PortChannel0011: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9100
PortChannel0014: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9100
PortChannel0017: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9100
PortChannel0020: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9100
PortChannel0023: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9100

Steps to reproduce the issue:

  1. Deploy the DUT with T0 topo
  2. Prepare and config_db.json for T1-LAG topo and replace the current one.
  3. Issue "config reload" to change the configuration, after success can still see the T0 portchannels in the kernel

Describe the results you received:
Not all the portchannels of T0 are cleared.

Describe the results you expected:
All the T0 configuration should be cleared and only T1-LAG configuration applied on the DUT

Additional information you deem important (e.g. issue happens only occasionally):

**Output of `show version`:**

```
root@arc-mtbc-1001:/tmp# show version
SONiC Software Version: SONiC.HEAD.43-6c1a0ce
Distribution: Debian 9.8
Kernel: 4.9.0-8-amd64
Build commit: 6c1a0ce
Build date: Tue Apr  9 10:44:13 UTC 2019
Built by: johnar@jenkins-worker-4

Docker images:
REPOSITORY                 TAG                 IMAGE ID            SIZE
docker-orchagent-mlnx      HEAD.43-6c1a0ce     1584c6f5403b        286MB
docker-orchagent-mlnx      latest              1584c6f5403b        286MB
docker-syncd-mlnx          HEAD.43-6c1a0ce     8504950d2a3c        331MB
docker-syncd-mlnx          latest              8504950d2a3c        331MB
docker-lldp-sv2            HEAD.43-6c1a0ce     689ccc124209        274MB
docker-lldp-sv2            latest              689ccc124209        274MB
docker-dhcp-relay          HEAD.43-6c1a0ce     9db41f909a46        256MB
docker-dhcp-relay          latest              9db41f909a46        256MB
docker-database            HEAD.43-6c1a0ce     5f01e8163e4a        255MB
docker-database            latest              5f01e8163e4a        255MB
docker-snmp-sv2            HEAD.43-6c1a0ce     4e99da8c3edb        294MB
docker-snmp-sv2            latest              4e99da8c3edb        294MB
docker-teamd               HEAD.43-6c1a0ce     5f2220585a30        274MB
docker-teamd               latest              5f2220585a30        274MB
docker-router-advertiser   HEAD.43-6c1a0ce     477c29d00dc8        254MB
docker-router-advertiser   latest              477c29d00dc8        254MB
docker-platform-monitor    HEAD.43-6c1a0ce     0c4056ed0b1d        286MB
docker-platform-monitor    latest              0c4056ed0b1d        286MB
docker-fpm-quagga          HEAD.43-6c1a0ce     51364e3fc200        281MB
docker-fpm-quagga          latest              51364e3fc200        281MB
```

**Attach debug file `sudo generate_dump`:**

```
(paste your output here)
```
@xinliu-seattle
Copy link
Contributor

@keboliu Can you help with the fix?

@madhukar-kamarapu
Copy link

madhukar-kamarapu commented Sep 14, 2019

When a port-channel is created by user (config portchannel add PortChannelXXX), the corresponding netdevice is created in the kernel(teamd_init() calls team_create() which creates the netdevice in the kernel).

These netdevices are deleted in the kernel when user deletes the configuration (config portchannel del PortChannelXXX).

During config-reload, port-channel netdevices are not exclusively deleted in the kernel.

Solution - when teamd docker starts, delete all the existing port-channel netdevices in kernel.

@prsunny
Copy link
Contributor

prsunny commented Jan 16, 2020

Looks like fixed as part of sonic-net/sonic-swss#1159

mihirpat1 pushed a commit to mihirpat1/sonic-buildimage that referenced this issue Jun 14, 2023
* Fix segmentation fault is observed during SWSS compilation:
mssonicbld added a commit that referenced this issue Sep 1, 2023
…lly (#16335)

#### Why I did it
src/sonic-swss
```
* 16817324 - (HEAD -> 202211, origin/202211) [mux]: Fix UTs segmentation fault (#2760) (12 hours ago) [Nazarii Hnydyn]
* 0fa5d880 - [orchagent]: Handle additional SAI error conditions gracefully (#2755) (2 days ago) [prabhataravind]
* 3726aebc - [mux]: Implement rollback for failed mux switchovers (#2714) (2 days ago) [Lawrence Lee]
* a8e50e7d - [portsorch]: Set default hostif TX queue (#2697) (2 days ago) [prabhataravind]
* 0689d656 - Add missing parameter to on_switch_shutdown_request method. (#2567) (2 days ago) [Hua Liu]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants