antrea-gw0 doesn't respond IPv6 neighbor solicitation message with a link-local source address in ipv6 env #5482
Labels
area/transit/ipv6
Issues or PRs related to IPv6.
kind/bug
Categorizes issue or PR as related to a bug.
Describe the bug
On an IPv6 cluster (suitable for both IPv6 only and dual-stack cluster), we observe that Pod can not access addresses on a different Node or on external world after the agent run for some time.
After some investigations inside the client Pod, we found that the IPv6 neighbor referring to antrea-gw0 is with state "FAILED". With packet capturing on both Pod interface and antrea-gw0, we observed that Pod's vNIC is trying to use its link-local address to send Neighbor Solicitation message, and antrea-gw0 has received it but not respondes it. Then we observed that a route with the antrea-gw0's link-local address CIDR (
fe80::/64 dev antrea-gw0
) is lost on the Node. It is suspected that Linux kernel has dropped the Pod's NS packets because of the missing routes ( similar to rp_filter check), so I made some test to manually add the route back, then the traffic issue is resolved.Then we found this log in antrea-agent,
Obviously, it is a bug in antrea-agent to delete such routes.
To Reproduce
Deploy IPv6 only or dual-stack cluster, try to access an external address from a Pod, e.g., nslookup google.com . After several trials, the access would be failed. Then check the IPv6 neighbors inside Pod, it would show the entry to antrea-gw0's global IPv6 address is with state "FAILED"
Expected
Pod should always successfully access an external address in IPv6 cluster.
Actual behavior
The access is failed after some trials.
Versions:
Antrea: latest
Additional context
The text was updated successfully, but these errors were encountered: