Fix potential blackholing/looping traffic when link-local was used and refresh ipv6 neighbor to avoid CPU hit #1904

zhenggen-xu · 2018-08-08T06:19:53Z

Fix potential blackholing/looping traffic and refresh ipv6 neighbor to avoid CPU hit

In case ipv6 global addresses were configured on L3 interfaces and used for peering,
and routing protocol was using link-local addresses on the same interfaces as prefered nexthops,
the link-local addresses could be aged out after a while due to no activities towards the link-local
addresses themselves. And when we receive new routes with the link-local nexthops, SONiC won't insert
them to the HW, and thus cause looping or blackholing traffic.

Global ipv6 addresses on L3 interfaces between switches are refreshed by BGP keeplive and other messages.

On server facing side, traffic may hit fowarding plane only, and no refresh for the ipv6 neighbor entries regularly.
This could age-out the linux kernel ipv6 neighbor entries, and HW neighbor table entries could be removed,
and thus traffic going to those neighbors would hit CPU, and cause traffic drop and temperary CPU high load.

Also, if link-local addresses were not learned, we may not get them at all later.

It is intended to fix all above issues.

Changes:
Add ndisc6 package in swss docker and use it for ipv6 ndp ping to update the neighbors' state on Vlan interfaces
Change the default ipv6 neighbor reachable timer to 30mins
Add periodical ipv6 multicast ping to ff02::11 to get/refresh link-local neighbor info.

- What I did
Fix potential blackholing/looping traffic when ipv6 link-local was used, and refresh ipv6 neighbor to avoid CPU hit

- How I did it
Add ndisc6 package in swss docker and use it for ipv6 ndp ping to update the neighbors' state on Vlan interfaces
Change the default ipv6 neighbor reachable timer to 30mins
Add periodical ipv6 multicast ping to ff02::11 to get/refresh link-local neighbor info.

- How to verify it
-- ipv6 neighbors now stay at REACHABLE state for 30mins instead of 30seconds.

admin@lnos-x1-a-asw03:~$ clear arp
fe80::a83c:309a:5ca3:6593 dev Vlan100 lladdr 04:62:73:c4:eb:59 ref 1 used 2470/2470/39 probes 1 REACHABLE
2100::7 dev Vlan100 lladdr 04:62:73:8c:fd:6a ref 1 used 95/9/56 probes 4 REACHABLE
fe80::2a6f:7fff:feba:1cff dev eth0 lladdr 28:6f:7f:ba:1c:ff router used 30606/27426/22 probes 1 STALE
fe80::662:73ff:fe8c:fd6a dev Vlan100 lladdr 04:62:73:8c:fd:6a ref 1 used 85/85/5 probes 1 REACHABLE
fe80::2e0:ecff:fe3b:d6ac dev Ethernet122 lladdr 00:e0:ec:3b:d6:ac router ref 1 used 651/651/52 probes 1 REACHABLE
172.25.11.1 dev eth0 lladdr 28:6f:7f:ba:1c:ff ref 1 used 857/0/852 probes 1 REACHABLE
172.18.1.7 dev Vlan100 lladdr 04:62:73:8c:fd:6a ref 1 used 39523/56/56 probes 6 REACHABLE
172.25.11.46 dev eth0 lladdr 00:e0:ec:3c:09:9a ref 1 used 2475/2472/20 probes 1 REACHABLE

Round 1, deleting 8 entries
Flush is complete after 1 round

admin@lnos-x1-a-asw03:~$ ip neighbor show | grep -v FAILED
172.25.11.1 dev eth0 lladdr 28:6f:7f:ba:1c:ff REACHABLE
172.25.11.46 dev eth0 lladdr 00:e0:ec:3c:09:9a REACHABLE

admin@lnos-x1-a-asw03:~$ docker exec -it swss bash -c "/usr/bin/arp_update"

admin@lnos-x1-a-asw03:~$ ip neighbor show | grep -v FAILED
fe80::662:73ff:fe8c:fd6a dev Vlan100 lladdr 04:62:73:8c:fd:6a DELAY
fe80::2e0:ecff:fe3b:d6ac dev Ethernet122 lladdr 00:e0:ec:3b:d6:ac router DELAY
172.25.11.1 dev eth0 lladdr 28:6f:7f:ba:1c:ff REACHABLE
172.18.1.7 dev Vlan100 lladdr 04:62:73:8c:fd:6a REACHABLE
172.25.11.46 dev eth0 lladdr 00:e0:ec:3c:09:9a REACHABLE

admin@lnos-x1-a-asw03:~$ ip neighbor show | grep -v FAILED
fe80::662:73ff:fe8c:fd6a dev Vlan100 lladdr 04:62:73:8c:fd:6a REACHABLE
fe80::2e0:ecff:fe3b:d6ac dev Ethernet122 lladdr 00:e0:ec:3b:d6:ac router REACHABLE
172.25.11.1 dev eth0 lladdr 28:6f:7f:ba:1c:ff REACHABLE
172.18.1.7 dev Vlan100 lladdr 04:62:73:8c:fd:6a REACHABLE
172.25.11.46 dev eth0 lladdr 00:e0:ec:3c:09:9a REACHABLE

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

…o avoid CPU hit In case ipv6 global addresses were configured on L3 interfaces and used for peering, and routing protocol was using link-local addresses on the same interfaces as prefered nexthops, the link-local addresses could be aged out after a while due to no activities towards the link-local addresses themselves. And when we receive new routes with the link-local nexthops, SONiC won't insert them to the HW, and thus cause looping or blackholing traffic. Global ipv6 addresses on L3 interfaces between switches are refreshed by BGP keeplive and other messages. On server facing side, traffic may hit fowarding plane only, and no refresh for the ipv6 neighbor entries regularly. This could age-out the linux kernel ipv6 neighbor entries, and HW neighbor table entries could be removed, and thus traffic going to those neighbors would hit CPU, and cause traffic drop and temperary CPU high load. Also, if link-local addresses were not learned, we may not get them at all later. It is intended to fix all above issues. Changes: Add ndisc6 package in swss docker and use it for ipv6 ndp ping to update the neighbors' state on Vlan interfaces Change the default ipv6 neighbor reachable timer to 30mins Add periodical ipv6 multicast ping to ff02::11 to get/refresh link-local neighbor info.

prsunny

Just checking for information, did you analyze using neighbor discovery (ndisc6 ?) or any configuration to send NS to achieve this functionality?

prsunny · 2018-08-08T22:58:30Z

files/scripts/arp_update

+      ping6cmd="ping6 -I $intf -n -q -i 0 -c 1 -W 0 ff02::1 >/dev/null"
+      intf_up=$(ip link show $intf | grep "state UP")
+      if [[ -n "$intf_up" ]]; then
+	  eval $ping6cmd


This does not look space aligned. Can you check?

prsunny · 2018-08-08T23:34:15Z

files/scripts/arp_update


 while /bin/true; do
+  # find L3 interfaces which are UP, send ipv6 multicast pings
+  echo "{% for (name, prefix) in INTERFACE %} {{name}} {% endfor %}" > /tmp/intf_tmp.j2


We would need to handle PORTCHANNEL_INTERFACE as well

Thanks Zhen. Can we concatenate INTERFACE and PC_INTERFACE and do one for-loop. It will avoid the duplicate code.

Add PORTCHANNEL_INTERFACE interface for ipv6 multicast ping format issue

zhenggen-xu · 2018-08-09T01:17:01Z

ndisc6 tool is not able to discover link-local addresses if you never learnt them for some reason (timing or network issue etc).

lguohan · 2018-08-10T07:24:06Z

files/scripts/arp_update

-# the neighbors state.
+# arp_update:
+# Send ipv6 multicast pings to all "UP" L3 interfaces including vlan interfaces to
+# refresh link-local addresses from neighbors.


is there a concern if the vlan have many neighbors? one multicast ping will get reply from all neighbors?

yes, by standard all neighbors should reply to ff02::1, so it will get reply from all neighbors.
Also verified it and it worked as expected.

lguohan · 2018-08-10T07:48:13Z

dockers/docker-orchagent/Dockerfile.j2

@@ -10,6 +10,7 @@ RUN apt-get update

 RUN apt-get install -f -y ifupdown arping libdbus-1-3 libdaemon0 libjansson4

+RUN apt-get install -f -y ndisc6


can you add this into vs docker? we need to replicate same env in the vs.

lguohan · 2018-08-10T07:51:46Z

does it make sense to eventually move this function into neighbor mgrd? which is our own version of neighbor manager.

looks like we also have issue with linux kernel neighbor gc function. we cannot really trust it. at certain point, we need to develop our own neighbor manager.

prsunny · 2018-08-10T18:34:25Z

@lguohan , We would eventually need a neighbor mgr. Atleast I could think of the following cases that may need some user-space handling. We could prioritize based on use-case.

Issue with gc threshold and flush of ARP entries
Periodic refresh of link-local entries.
Handle oper-down issue, where kernel does not delete the ARP entries but it might not be required to occupy HW resource for ARPs that belongs to a down interface
Handle admin-down scenario. If there are large number of ARP entries learnt on an interface, currently we rely on the the netlink notifications from kernel and this could be considerable depending on the interace. Instead a neigbhor manager could just act on the admin-down
Handle ECMP neighbor refresh. If one or two ECMP neigbhor members are resolved and programmed to ASIC, I don't think there is a mechanism to resolve the remaining members of ECMP. This could be handled by Nbr-mgr.

[Open]
6. FDB age-out in kernel and if ARP is not refreshed, kernel floods the IP packet to all member ports of Vlan.
7. Mgmt-VRF ARP reservation in case data-plane ARP fillsup the kernel threshold.

zhenggen-xu · 2018-08-10T21:49:02Z

It would be good to have a session in community to discuss the things around "neighbor mgrd". Source of truth should be linux kernel IMO.

Anyway, it probably makes sense to defer those implementation to later PR.

lguohan · 2018-08-12T09:32:22Z

retest this please

lguohan · 2018-08-12T10:14:41Z

@prsunny, good summary. add as an issue. #1917

``` ca728b8 [config] fix interface IPv6 address removal. (sonic-net#1819) 0665d6f VxLAN Tunnel Counters and Rates implementation (sonic-net#1748) 80a10dc Fix log_ssd_health hang issue (sonic-net#1904) ea4a730 [config][cbf] Added config commands for CBF (sonic-net#1799) 02ce8d6 [sonic-package-manager] update FEATURE entries on upgrade (sonic-net#1803) 9f123c0 [generate_dump] remove secrets from dump files (sonic-net#1886) 3a8ab73 [fwutil] Add `fwutil update all` to support the automatic platform component fw updates (sonic-net#1242) 776fddf [sonic-package-manager] code style fixes and enhancements (sonic-net#1802) f53baac [watermarkstat] Fix for error in processing empty array from couters db (sonic-net#1810) 0b2536b Generic_upater: Apply JSON change (sonic-net#1856) ``` Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>

``` ca728b8 [config] fix interface IPv6 address removal. (#1819) 0665d6f VxLAN Tunnel Counters and Rates implementation (#1748) 80a10dc Fix log_ssd_health hang issue (#1904) ea4a730 [config][cbf] Added config commands for CBF (#1799) 02ce8d6 [sonic-package-manager] update FEATURE entries on upgrade (#1803) 9f123c0 [generate_dump] remove secrets from dump files (#1886) 3a8ab73 [fwutil] Add `fwutil update all` to support the automatic platform component fw updates (#1242) 776fddf [sonic-package-manager] code style fixes and enhancements (#1802) f53baac [watermarkstat] Fix for error in processing empty array from couters db (#1810) 0b2536b Generic_upater: Apply JSON change (#1856) ``` Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>

swss 73caba3 Allow interface type value none (#1991) utilities 32e530f Allow interface type value none (#1902) 53f066c Fix log_ssd_health hang issue (#1904)

48035d75 [202012] [techsupport] Techsupport Error Reporting pending fixes (sonic-net#1854) 8b2ec09a Fix log_ssd_health hang issue (sonic-net#1904) ac9c4254 Fix the option missing in kernel config issue (sonic-net#1888) 5cc9417a disk_check: Script updated to run good in 201811 & 201911 (sonic-net#1747) Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>

Submodule update for sonic-utilties ``` 48035d75 [202012] [techsupport] Techsupport Error Reporting pending fixes (#1854) 8b2ec09a Fix log_ssd_health hang issue (#1904) ac9c4254 Fix the option missing in kernel config issue (#1888) 5cc9417a disk_check: Script updated to run good in 201811 & 201911 (#1747) ```

What I did Fix sonic-net#9114 The log_ssd_health command hangs due to timeout being used with docker exec -i which also affect warmboot flow. How I did it Added foreground option for timeout. This is recommended when not using the command on shell https://man7.org/linux/man-pages/man1/timeout.1.html How to verify it Run log_ssd_health and verify it does not hang Signed-off-by: Sudharsan Dhamal Gopalarathnam sudharsand@nvidia.com

prsunny reviewed Aug 9, 2018

View reviewed changes

Fix review comments:

ca0f9ef

Add PORTCHANNEL_INTERFACE interface for ipv6 multicast ping format issue

Combine regular L3 interface and portchannel interface for looping

6f25dff

prsunny approved these changes Aug 9, 2018

View reviewed changes

lguohan reviewed Aug 10, 2018

View reviewed changes

Add ndisc6 package to vs docker

d7baed2

stcheng added the Enhancement ➕ label Aug 10, 2018

lguohan merged commit d761630 into sonic-net:master Aug 12, 2018

prsunny mentioned this pull request Nov 16, 2018

Support ConfigDB neighbor configuration, introduce nbrmgr daemon sonic-net/sonic-swss#693

Merged

zhenggen-xu deleted the github-fork-ndp-update branch June 7, 2019 19:37

pavel-shirshov mentioned this pull request Oct 15, 2019

Optimize ipv4/ipv6 neighbor timers #2803

Closed

judyjoseph added a commit that referenced this pull request Nov 6, 2021

Update sonic-swss sonic-utilities

9972450

swss 73caba3 Allow interface type value none (#1991) utilities 32e530f Allow interface type value none (#1902) 53f066c Fix log_ssd_health hang issue (#1904)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix potential blackholing/looping traffic when link-local was used and refresh ipv6 neighbor to avoid CPU hit #1904

Fix potential blackholing/looping traffic when link-local was used and refresh ipv6 neighbor to avoid CPU hit #1904

zhenggen-xu commented Aug 8, 2018

prsunny left a comment

prsunny Aug 8, 2018

zhenggen-xu Aug 9, 2018

prsunny Aug 8, 2018

zhenggen-xu Aug 9, 2018

prsunny Aug 9, 2018

zhenggen-xu Aug 9, 2018

zhenggen-xu commented Aug 9, 2018

lguohan Aug 10, 2018

zhenggen-xu Aug 10, 2018

lguohan Aug 10, 2018

zhenggen-xu Aug 10, 2018

lguohan commented Aug 10, 2018

prsunny commented Aug 10, 2018

zhenggen-xu commented Aug 10, 2018

lguohan commented Aug 12, 2018

lguohan commented Aug 12, 2018

		@@ -10,6 +10,7 @@ RUN apt-get update

		RUN apt-get install -f -y ifupdown arping libdbus-1-3 libdaemon0 libjansson4

		RUN apt-get install -f -y ndisc6

Fix potential blackholing/looping traffic when link-local was used and refresh ipv6 neighbor to avoid CPU hit #1904

Fix potential blackholing/looping traffic when link-local was used and refresh ipv6 neighbor to avoid CPU hit #1904

Conversation

zhenggen-xu commented Aug 8, 2018

prsunny left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhenggen-xu commented Aug 9, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lguohan commented Aug 10, 2018

prsunny commented Aug 10, 2018

zhenggen-xu commented Aug 10, 2018

lguohan commented Aug 12, 2018

lguohan commented Aug 12, 2018