Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[teamd] Port-channel interface is down after system/config reload #4070

Closed
volodymyrsamotiy opened this issue Jan 27, 2020 · 0 comments · Fixed by #4109
Closed

[teamd] Port-channel interface is down after system/config reload #4070

volodymyrsamotiy opened this issue Jan 27, 2020 · 0 comments · Fixed by #4109

Comments

@volodymyrsamotiy
Copy link
Collaborator

Description
Port-channel goes down after few system/config reloads and one member port is operationally down in kernel although it is up in APP_DB and SDK/SAI.

  • It is not always reproducible but can be observed after all types of reboots/reloads: 1) config reload; 2) cold/fast/warm reboot; 3) minigraph load config.
  • Also it was observed on both typologies with port-channels - T1-LAG and T0.
  • Looks like somehow operational state of such LAG member port in kernel is out of sync with the all other components.
  • netdev event for updating operational state of the member port to UP is always received by kernel (also all other callbacks/events are called/received and operational state is correct in SDK/SAI/APP_DB).
  • It always happens only with port-channel and its member port, so it looks like teamd related issue (it is time sensitive since not always reproducible).
  • The easiest way to reproduce the issue is to deploy T1-LAG topology and perform 5-10 config reloads.

Steps to reproduce the issue:

  1. Deploy T1-LAG topology
  2. Execute config reload command few times (usually 5-10)
  3. Observe that one port-channel is down
  4. Check that one member port of port-channel is down in kernel
    cat /sys/class/net/Ethernet<num>/operstate

Describe the results you received:
Once issue is reproduced the following is observed:

  • Port-channel is permanently down
show in po
Flags: A - active, I - inactive, Up - up, Dw - Down, N/A - not available,
       S - selected, D - deselected, * - not synced
  No.  Team Dev         Protocol     Ports
-----  ---------------  -----------  ---------------------------
 0002  PortChannel0002  LACP(A)(Up)  Ethernet0(S) Ethernet4(S)
 0005  PortChannel0005  LACP(A)(Up)  Ethernet8(S) Ethernet12(S)
 0008  PortChannel0008  LACP(A)(Up)  Ethernet20(S) Ethernet16(S)
 0011  PortChannel0011  LACP(A)(Up)  Ethernet28(S) Ethernet24(S)
 0014  PortChannel0014  LACP(A)(Up)  Ethernet32(S) Ethernet36(S)
 0017  PortChannel0017  LACP(A)(Up)  Ethernet44(S) Ethernet40(S)
 0020  PortChannel0020  LACP(A)(Dw)  Ethernet48(S*)
 0023  PortChannel0023  LACP(A)(Up)  Ethernet60(S) Ethernet56(S)
  • Operational state of LAG member is down in kernel:
cat /sys/class/net/Ethernet52/operstate
down
  • All ports are operationally up in SONiC (also up in SDK/SAI)
show int sta
      Interface            Lanes    Speed    MTU    Alias             Vlan    Oper    Admin             Type    Asym PFC
---------------  ---------------  -------  -----  -------  ---------------  ------  -------  ---------------  ----------
      Ethernet0          0,1,2,3     100G   9100     etp1  PortChannel0002      up       up  QSFP28 or later         off
      Ethernet4          4,5,6,7     100G   9100     etp2  PortChannel0002      up       up   QSFP+ or later         off
      Ethernet8        8,9,10,11     100G   9100     etp3  PortChannel0005      up       up  QSFP28 or later         off
     Ethernet12      12,13,14,15     100G   9100     etp4  PortChannel0005      up       up  QSFP28 or later         off
     Ethernet16      16,17,18,19     100G   9100     etp5  PortChannel0008      up       up   QSFP+ or later         off
     Ethernet20      20,21,22,23     100G   9100     etp6  PortChannel0008      up       up   QSFP+ or later         off
     Ethernet24      24,25,26,27     100G   9100     etp7  PortChannel0011      up       up  QSFP28 or later         off
     Ethernet28      28,29,30,31     100G   9100     etp8  PortChannel0011      up       up   QSFP+ or later         off
     Ethernet32      32,33,34,35     100G   9100     etp9  PortChannel0014      up       up   QSFP+ or later         off
     Ethernet36      36,37,38,39     100G   9100    etp10  PortChannel0014      up       up   QSFP+ or later         off
     Ethernet40      40,41,42,43     100G   9100    etp11  PortChannel0017      up       up   QSFP+ or later         off
     Ethernet44      44,45,46,47     100G   9100    etp12  PortChannel0017      up       up   QSFP+ or later         off
     Ethernet48      48,49,50,51     100G   9100    etp13  PortChannel0020      up       up   QSFP+ or later         off
     Ethernet52      52,53,54,55     100G   9100    etp14  PortChannel0020      up       up   QSFP+ or later         off
     Ethernet56      56,57,58,59     100G   9100    etp15  PortChannel0023      up       up   QSFP+ or later         off
     Ethernet60      60,61,62,63     100G   9100    etp16  PortChannel0023      up       up   QSFP+ or later         off
     Ethernet64      64,65,66,67     100G   9100    etp17           routed      up       up   QSFP+ or later         off
     Ethernet68      68,69,70,71     100G   9100    etp18           routed      up       up   QSFP+ or later         off
     Ethernet72      72,73,74,75     100G   9100    etp19           routed      up       up   QSFP+ or later         off
     Ethernet76      76,77,78,79     100G   9100    etp20           routed      up       up   QSFP+ or later         off
     Ethernet80      80,81,82,83     100G   9100    etp21           routed      up       up   QSFP+ or later         off
     Ethernet84      84,85,86,87     100G   9100    etp22           routed      up       up   QSFP+ or later         off
     Ethernet88      88,89,90,91     100G   9100    etp23           routed      up       up   QSFP+ or later         off
     Ethernet92      92,93,94,95     100G   9100    etp24           routed      up       up   QSFP+ or later         off
     Ethernet96      96,97,98,99     100G   9100    etp25           routed      up       up   QSFP+ or later         off
    Ethernet100  100,101,102,103     100G   9100    etp26           routed      up       up   QSFP+ or later         off
    Ethernet104  104,105,106,107     100G   9100    etp27           routed      up       up   QSFP+ or later         off
    Ethernet108  108,109,110,111     100G   9100    etp28           routed      up       up   QSFP+ or later         off
    Ethernet112  112,113,114,115     100G   9100    etp29           routed      up       up   QSFP+ or later         off
    Ethernet116  116,117,118,119     100G   9100    etp30           routed      up       up   QSFP+ or later         off
    Ethernet120  120,121,122,123      50G   9100    etp31           routed      up       up  QSFP28 or later         off
    Ethernet124  124,125,126,127      50G   9100    etp32           routed      up       up  QSFP28 or later         off
PortChannel0002              N/A     200G   9100      N/A           routed      up       up              N/A         N/A
PortChannel0005              N/A     200G   9100      N/A           routed      up       up              N/A         N/A
PortChannel0008              N/A     200G   9100      N/A           routed      up       up              N/A         N/A
PortChannel0011              N/A     200G   9100      N/A           routed      up       up              N/A         N/A
PortChannel0014              N/A     200G   9100      N/A           routed      up       up              N/A         N/A
PortChannel0017              N/A     200G   9100      N/A           routed      up       up              N/A         N/A
PortChannel0020              N/A     200G   9100      N/A           routed    down       up              N/A         N/A
PortChannel0023              N/A     200G   9100      N/A           routed      up       up              N/A         N/A

Describe the results you expected:

  • Port-channel should be operationally up after system/config reload.
  • Member port of the port-channel should be operationally up in kernel after system/config reload (should be in sync with the rest of the system).

Additional information you deem important (e.g. issue happens only occasionally):

  • Issue is reproducible on both master and ```201911`` images.
  • For example, it was observed on this version - SONiC.HEAD.17-e884e583.
lguohan pushed a commit that referenced this issue Feb 5, 2020
- What I did
Ported a fix from libteam master to our master.
Fixes #4070
Fixes #3649

- How I did it
Applied patch jpirko/libteam@c723737 from upstream.

- How to verify it
Build image for your DUT and warm-reboot your DUT 10 times. Check that all PortChannels are up and no error messages in teamd.log
prsunny pushed a commit that referenced this issue Feb 11, 2020
- What I did
Ported a fix from libteam master to our master.
Fixes #4070
Fixes #3649

- How I did it
Applied patch jpirko/libteam@c723737 from upstream.

- How to verify it
Build image for your DUT and warm-reboot your DUT 10 times. Check that all PortChannels are up and no error messages in teamd.log
abdosi pushed a commit that referenced this issue Feb 14, 2020
- What I did
Ported a fix from libteam master to our master.
Fixes #4070
Fixes #3649

- How I did it
Applied patch jpirko/libteam@c723737 from upstream.

- How to verify it
Build image for your DUT and warm-reboot your DUT 10 times. Check that all PortChannels are up and no error messages in teamd.log
pphuchar pushed a commit to SONIC-DEV/sonic-buildimage that referenced this issue Mar 9, 2020
- What I did
Ported a fix from libteam master to our master.
Fixes sonic-net#4070
Fixes sonic-net#3649

- How I did it
Applied patch jpirko/libteam@c723737 from upstream.

- How to verify it
Build image for your DUT and warm-reboot your DUT 10 times. Check that all PortChannels are up and no error messages in teamd.log
tiantianlv pushed a commit to SONIC-DEV/sonic-buildimage that referenced this issue Apr 24, 2020
- What I did
Ported a fix from libteam master to our master.
Fixes sonic-net#4070
Fixes sonic-net#3649

- How I did it
Applied patch jpirko/libteam@c723737 from upstream.

- How to verify it
Build image for your DUT and warm-reboot your DUT 10 times. Check that all PortChannels are up and no error messages in teamd.log
yxieca pushed a commit that referenced this issue Oct 12, 2020
- What I did
Ported a fix from libteam master to our master.
Fixes #4070
Fixes #3649

- How I did it
Applied patch jpirko/libteam@c723737 from upstream.

- How to verify it
Build image for your DUT and warm-reboot your DUT 10 times. Check that all PortChannels are up and no error messages in teamd.log
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants