Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[restore_neigh] Check STATE_DB before sending ARP/ND pkts for neighbors associated with PortChannel #2444

Merged
merged 1 commit into from
Sep 12, 2022

Conversation

vivekrnv
Copy link
Contributor

@vivekrnv vivekrnv commented Sep 9, 2022

Signed-off-by: Vivek Reddy Karri vkarri@nvidia.com

What I did

As a part of intf check, wait for the LAG_MEMBER_TABLE to be populated for PortChannel ifaces

Why I did it

Currently, restore_neighbors just checks the /sys/class/net/{0}/carrier to determine of if the status is up, but during WR teamd
starts very early. It creates the netdev from the prev-life state and as i understand sets the carrier state to be based on what was found in the prev-life dump. Ref: https://github.com/sonic-net/sonic-buildimage/blob/master/src/libteam/patch/0008-libteam-Add-warm_reboot-mode.patch#L442

Checking the sysfs attribute is not a reliable option. Thus during WR because of this false-positive, restore_neighbor sends the ARP/NS very early i.e. before anything else is created, resulting in neighbor not being resolved in the end. This results in problems such as this https://github.com/sonic-net/sonic-buildimage/issues/12029

Aug 30 17:03:21.605577 sonic INFO swss#restore_neighbor: Add neighbor entries: family: IPv6, intf_idx: 5, ip: 2001:20::2, mac: 7c:fe:90:cd:86:00
Aug 30 17:03:21.606189 sonic INFO swss#restore_neighbor: Sending Neigh with family: IPv6, intf_idx: 5, ip: 2001:20::2, mac: 7c:fe:90:cd:86:00

Aug 30 17:03:30.321763 qa-eth-vt04-7-2410 NOTICE syncd#SDK: [SAI_SWITCH.NOTICE] mlnx_sai_switch.c[7638]- mlnx_create_switch: Created switch Switch ID

Routes/member/ifaces populated after this:
2022-08-30.17:03:32.643025|LAG_MEMBER_TABLE:PortChannel0001:Ethernet200|SET|status:enabled
2022-08-30.17:03:32.643338|VLAN_TABLE:Vlan1071|SET|admin_status:up|mtu:9100|mac:50:6b:4b:94:fd:00|host_ifname:
2022-08-30.17:03:32.643747|VLAN_MEMBER_TABLE:Vlan1071:Ethernet4|SET|tagging_mode:untagged
2022-08-30.17:03:32.647143|INTF_TABLE:Ethernet0|SET|NULL:NULL|mac_addr:00:00:00:00:00:00
2022-08-30.17:03:32.647182|INTF_TABLE:Ethernet194.12|SET|vlan:12|mtu:9100|:|admin_status:up|mac_addr:00:00:00:00:00:00
2022-08-30.17:03:32.647204|INTF_TABLE:Ethernet194.12:10.10.12.1/24|SET|scope:global|family:IPv4
2022-08-30.17:03:32.647225|INTF_TABLE:Ethernet192:2001:11::1/64|SET|scope:global|family:IPv6
2022-08-30.17:03:32.647245|INTF_TABLE:PortChannel0001:2001:20::1/64|SET|scope:global|family:IPv6
2022-08-30.17:03:32.647265|INTF_TABLE:Ethernet194.12:2001:12::1/64|SET|scope:global|family:IPv6
2022-08-30.17:03:32.647285|INTF_TABLE:Vlan1071:10.10.2.1/24|SET|scope:global|family:IPv4
2022-08-30.17:03:32.647305|INTF_TABLE:Vlan1071|SET|vrf_name:Vrf-blue|mac_addr:00:00:00:00:00:00
2022-08-30.17:03:32.647325|INTF_TABLE:Ethernet0:2001:3::1/64|SET|scope:global|family:IPv6
2022-08-30.17:03:32.647346|INTF_TABLE:Ethernet0:10.10.1.1/24|SET|scope:global|family:IPv4
2022-08-30.17:03:32.647367|INTF_TABLE:PortChannel0001|SET|vrf_name:Vrf-blue|mac_addr:00:00:00:00:00:00
2022-08-30.17:03:32.647387|INTF_TABLE:Ethernet192:10.10.11.1/24|SET|scope:global|family:IPv4
2022-08-30.17:03:32.647448|INTF_TABLE:Vlan1071:10.10.4.1/24|SET|scope:global|family:IPv4
2022-08-30.17:03:32.647474|INTF_TABLE:Ethernet192|SET|NULL:NULL|mac_addr:00:00:00:00:00:00
2022-08-30.17:03:32.647494|INTF_TABLE:Ethernet0:2001:1::1/64|SET|scope:global|family:IPv6
2022-08-30.17:03:32.647514|INTF_TABLE:Vlan1071:2001:4::1/64|SET|scope:global|family:IPv6
2022-08-30.17:03:32.647533|INTF_TABLE:Vlan1071:2001:2::1/64|SET|scope:global|family:IPv6
2022-08-30.17:03:32.647553|INTF_TABLE:Ethernet0:10.10.3.1/24|SET|scope:global|family:IPv4
2022-08-30.17:03:32.647572|INTF_TABLE:PortChannel0001:10.10.20.1/24|SET|scope:global|family:IPv4

How I verified it

On the live switch:
Verified logs and checked if the ND/ARP msgs are received on the other end:

00:52:40.053139 IP6 fe80::1e34:daff:fe1c:9f00 > ff02::1:ff00:2: ICMP6, neighbor solicitation, who has 2001:20::2, length 32
00:52:40.053140 ARP, Request who-has 10.10.20.2 tell 10.10.20.1, length 46
00:52:40.053197 ARP, Reply 10.10.20.2 is-at 7c:fe:90:12:22:ec (oui Unknown), length 28
00:52:40.053204 IP6 2001:20::2 > fe80::1e34:daff:fe1c:9f00: ICMP6, neighbor advertisement, tgt is 2001:20::2, length 32

Details if related

Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
return True
key = db.keys(db.STATE_DB, table_name)
if key is None:
log_info ("members for {} are not yet created".format(intf))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep the log for Vlan as before and add a new one for LAG.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why, are there any log checkers matching for this string?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. but its been a very legacy code. So just thot of being safe. Approving from my side

return True
key = db.keys(db.STATE_DB, table_name)
if key is None:
log_info ("members for {} are not yet created".format(intf))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. but its been a very legacy code. So just thot of being safe. Approving from my side

@dgsudharsan dgsudharsan merged commit c1eb99a into sonic-net:master Sep 12, 2022
vivekrnv added a commit to vivekrnv/sonic-swss that referenced this pull request Sep 14, 2022
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Check STATE_DB before sending ARP/ND pkts for neighbors associated with PortChannel. As a part of intf check, wait for the LAG_MEMBER_TABLE to be populated for Portchannels ifaces
dgsudharsan pushed a commit that referenced this pull request Sep 16, 2022
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Check STATE_DB before sending ARP/ND pkts for neighbors associated with PortChannel. As a part of intf check, wait for the LAG_MEMBER_TABLE to be populated for Portchannels ifaces
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants