Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[arp_update]: Fix IPv6 neighbor race condition #15583

Merged
merged 3 commits into from
Jun 30, 2023
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 50 additions & 25 deletions files/scripts/arp_update
Original file line number Diff line number Diff line change
Expand Up @@ -89,32 +89,57 @@ while /bin/true; do
eval `eval $ip6cmd`

if [[ $SUBTYPE == "dualtor" ]]; then
# manually set any remaining FAILED/INCOMPLETE entries to permanently INCOMPLETE
# this prevents any remaining INCOMPLETE entries from automatically transitioning to FAILED
# once these entries are incomplete, any subsequent neighbor advertisement messages
# are able to resolve the entry

# generates the following command for each failed or incomplete IPv6 neighbor
# ip neigh replace <neighbor IPv6> dev <VLAN name> nud incomplete
neigh_replace_template="sed -e 's/^/ip neigh replace /' -e 's/,/ dev /' -e 's/$/ nud incomplete;/'"
ip_neigh_replace_cmd="ip -6 neigh show | grep -v fe80 | grep $vlan | grep -E 'FAILED|INCOMPLETE' | cut -d ' ' -f 1,3 --output-delimiter=',' | $neigh_replace_template"
eval `eval $ip_neigh_replace_cmd`

# on dual ToR devices, try to resolve failed neighbor entries since
# these entries will have tunnel routes installed, preventing normal
# neighbor resolution (SWSS PR #2137)

# since ndisc6 is a userland process, the above ndisc6 commands are
# insufficient to update the kernel neighbor table for failed entries

# we don't need to do this for ipv4 neighbors since arping is able to
# update the kernel neighbor table

# generates the following command for each failed or incomplete IPv6 neighbor
# capture all current failed/incomplete IPv6 neighbors in the kernel to avoid situations where new neighbors are learned
# in the middle of the below sequence of commands
unresolved_kernel_neighbors=$(ip -6 neigh show | grep -v fe80 | grep $vlan | grep -E 'FAILED|INCOMPLETE')
failed_kernel_neighbors=$(echo "$unresolved_kernel_neighbors" | grep FAILED | cut -d ' ' -f 1)

# it's possible for kernel neighbors to fall out of sync with the hardware
# this can result in failed neighbors entries that don't have corresponding zero MAC neighbor entries
# and therefore don't have tunnel routes installed in the hardware
# flush these neighbors from the kernel to force relearning and resync them to the hardware:
# 1. for every FAILED or INCOMPLETE neighbor in the kernel, check if there is a corresponding zero MAC neighbor in APPL_DB
# 2. if no zero MAC neighbor entry exists, flush the kernel neighbor entry
# - generates the command 'ip neigh flush <neighbor IPv6>' for all such neighbors
theasianpianist marked this conversation as resolved.
Show resolved Hide resolved
unsync_neighbors=$(echo "$unresolved_kernel_neighbors" | cut -d ' ' -f 1 | xargs -I{} bash -c "if [[ -z \"\$(sonic-db-cli APPL_DB hget NEIGH_TABLE:$vlan:{} neigh)\" ]]; then echo '{}'; fi")
if [[ ! -z "$unsync_neighbors" ]]; then
ip_neigh_flush_cmd="echo \"$unsync_neighbors\" | sed -e 's/^/ip neigh flush /' -e 's/$/;/'"
eval `eval "$ip_neigh_flush_cmd"`
fi

# generates the following command for each FAILED or INCOMPLETE IPv6 neighbor
# timeout 0.2 ping <neighbor IPv6> -n -q -i 0 -c 1 -W 1 -I <VLAN name> >/dev/null
ping6_template="sed -e 's/^/timeout 0.2 ping /' -e 's/,/ -n -q -i 0 -c 1 -W 1 -I /' -e 's/$/ >\/dev\/null;/'"
failed_ip6_neigh_cmd="ip -6 neigh show | grep -v fe80 | grep $vlan | grep -E 'FAILED|INCOMPLETE' | cut -d ' ' -f 1,3 --output-delimiter=',' | $ping6_template"
eval `eval $failed_ip6_neigh_cmd`
if [[ ! -z "$unresolved_kernel_neighbors" ]]; then
theasianpianist marked this conversation as resolved.
Show resolved Hide resolved
ping6_template="sed -e 's/^/timeout 0.2 ping /' -e 's/,/ -n -q -i 0 -c 1 -W 1 -I /' -e 's/$/ >\/dev\/null;/'"
failed_ip6_neigh_cmd="echo \"$unresolved_kernel_neighbors\" | cut -d ' ' -f 1,3 --output-delimiter=',' | $ping6_template"
eval `eval "$failed_ip6_neigh_cmd"`
fi

# wait for any transient INCOMPLETE neighbors to transition to FAILED
# once these transition to FAILED, we can guarantee that the kernel has generated a netlink message for the neighbor
# and proceed setting it to permanent INCOMPLETE
# this is necessary for:
# 1. unsyncronized neighbors that were flushed, then pinged
# 2. existing FAILED neighbors which will be set to transient INCOMPLETE by the above ping
wait_neighbors="$unsync_neighbors"$'\n'"$failed_kernel_neighbors"
if [[ ! -z "$wait_neighbors" ]]; then
trans_incomplete_check=$(echo "$wait_neighbors" | sed -e 's/^/ip neigh show /' -e 's/$/ | grep -q 'INCOMPLETE' \&\&/')
theasianpianist marked this conversation as resolved.
Show resolved Hide resolved
trans_incomplete_check=$(echo $trans_incomplete_check | sed -e 's/\&\&$//')
timeout 10 bash -c "while eval \"$trans_incomplete_check\"; do echo 'waiting'; sleep 1; done"
fi

# manually set any remaining FAILED entries to permanently INCOMPLETE
# once these entries are INCOMPLETE, any subsequent neighbor advertisement messages are able to resolve the entry
# ignore INCOMPLETE neighbors since if they are transiently incomplete (i.e. new kernel neighbors that we are attempting to resolve for the first time),
# setting them to permanently incomplete here means the kernel will never generate a netlink message for that neighbor
# generates the following command for each FAILED IPv6 neighbor
# ip neigh replace <neighbor IPv6> dev <VLAN name> nud incomplete
failed_kernel_neighbors=$(ip -6 neigh show | grep -v fe80 | grep $vlan | grep -E 'FAILED')
if [[ ! -z "$failed_kernel_neighbors" ]]; then
neigh_replace_template="sed -e 's/^/ip neigh replace /' -e 's/,/ dev /' -e 's/$/ nud incomplete;/'"
ip_neigh_replace_cmd="echo \"$failed_kernel_neighbors\" | cut -d ' ' -f 1,3 --output-delimiter=',' | $neigh_replace_template"
eval `eval "$ip_neigh_replace_cmd"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please test this with traffic which can resolve a neighbor during a ping and another where neighbor cannot be resolved?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested with one resolvable IP, one unresolvable IP, and one unresolvable IP where I manually deleted the APPL_DB entry. The script behaved as expected - the deleted IP was flushed, all three IPs were pinged, and the final state of the IPs was REACHABLE (for the resolvable IP) and INCOMPLETE for the other two.

fi
fi
done

Expand Down