-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Antrea-1.4 connecitivity between windows pod and linux pods fails with different CNIs #3081
Comments
I am assigning this to @tnqn because this is related to #2161 which he worked on. #2161 relies on Linux Nodes annotating their Node resource with @shettyg if there is any possibility to annotate the Nodes with I understand why the path is asymmetric (this is by design when the annotation is missing). I do not understand however why the Windows host would do SNAT on the reply traffic (Pod-to-Pod traffic in encapMode...). |
Thank you for the immediate response. We will consider adding "node.antrea.io/mac-address" if there is no easy fix to avoid it. (In this case, we are using noencapmode.) |
@antoninbas is correct. To get better dataplane performance, it bypasses the windows host network when possible, but that requires overwriting the dst MAC by OpenFlow rules. If it doesn't know the dst MAC, it falls through to the host network to forward the traffic. I guess the windows host didn't even know this was reply traffic as it's the first packet it saw for this connection. Even it can check TCP flags, it doesn't work for UDP. Except the workaround, a possible solution is we don't bypass the host network for incoming traffic if we don't know the MAC of the Node which they come from, to make the path symmetry. But we need to check if Windows will still do SNAT when it can see the request packet, and how much it will affect the dataplane performance. |
@tnqn why is there SNAT in this case given that it is Pod-to-Pod traffic (even if the host doesn't know this is reply traffic)? |
@antoninbas I forgot this is Pod-to-Pod. You are right. It shouldn't do SNAT at all regardless of the direction. Then we should look at the NAT configuration of the windows host. For Linux, we use |
Windows NetNat confguration doesn't support "exclude" options on either internal addresses (PodCIDR) or external addresses (SNATed address), so I don't think we could do it on the Windows host as what we have done on Linux. A substitution is, maybe we could query peer Node's MAC (using "Get-NetNeighbor" on Windows by Agent) if the Node's annotation is not set. What do you think? |
There may be no neighbor cache if the two Nodes never communicated. If adding another step to trigger communication, maybe it makes make sense to just send ARP query to retrieve the MAC address. There should be some Go libraries doing it. And it needs to handle cross-subnet case. Bypassing windows host network was optional and affected performance only before, now it becomes necessary as it will cause destination Pods not seeing source Pod IPs. |
If sending ARP, I have thought of 3 options to leverage the ARP reply: 1) packet in the reply to Antrea Agent, then Antrea Agent could install OpenFlow entries for peer PodCIDR on a different Node; 2) use "learn" action to dynamic use the src MAC in the reply, then Antrea Agent could not wait for the ARP reply, but directly installs a flow on OVS. 3) let the Windows Host to learn the ARP reply (by output the reply packet to OVS bridge interface), and use "Get-NetNeighbor" command to learn the MAC from Windows host. option 1 and 3 are similar, the difference is if using the packetIn or not. And Antrea Agent should wait the ARP reply. option 2 leverage OVS and not need to wait in Antrea Agent. A disadvantage is Antrea Agent doesn't know the MAC of the peer Node, and not able to cache the L3 flow entry. If a disconnection with OVS happens, Antrea Agent is not able to replay the flows. Which option would you like, do you have other suggestions? @tnqn @jianjuns @lzhecheng @XinShuYang |
But Windows should have a way to skip NAT for specific IPs, no? Using ARP to discover MAC is much complexer, and can be another source of traffic issues, esp. when we have a large number of Nodes. |
I didn't find a valid configuration to exclude some IP/CIDR in Windows NetNat yet. |
The CIDR for Windows NAT could be 169.254.0.128/25, which can be configured on Windows NetNAT |
Having some offline discussion and test with @hongliangl @XinShuYang , we have another solution: add a new OVS internal port for the traffic that doesn't need to perform SNAT, enable IP-Forwarding on the interface, and then we add OpenFlow entries in OVS to ensure the packets are output to the new interface. Back to this issue, we could add OpenFlow entry for the Pod traffic on a Node which is not annotated with MAC, and set the dst MAC of packet with the new Interface in L3ForwardTable, then set the output ofport number in L2ForwardCalcTable. Since IP forwarding is enabled on the new interface, the packet can be forwarded to br-int from the interface, and then output to the uplink. |
Any thoughts about this option? @jianjuns @tnqn @antoninbas |
I feel the new proposal sounds better than the previous two. So, there is no other way like packet mark, etc. to identify a packet requires NAT? |
I didn't find a valid configuration from the Windows host. Windows SNAT uses source CIDR as the filter, and we can't use it to differentiate the pod traffic to an external destination or to a Node |
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment, or this will be closed in 90 days |
Describe the bug
We have a setup with windows nodes using Antrea. linux nodes use a different CNI plugin. In this setup, with Antrea1.4, connectivity between linux pods and windows pods do not work.
There is asymmetry in the packet path. A ping from linux pods reached the windows pods. The response from the windows pod ends up being sent to antrea-gw0 and then gets SNATed to the host IP.
This behavior looks happen because table70 of openflow pipeline sets the destination mac address of the packet as antrea-gw0 mac for the podCIDR of linux node. But for podCIDR of other windows nodes, it sets it as the mac address of the physical interface of the other windows nodes.
e.g:
Relevent ofproto/trace for failure case
To Reproduce
Expected
Connection should work. It worked correctly with Antrea 1.2
Actual behavior
Description above
Versions:
Antrea 1.4
kubernetes v1.21.5
containerd
The text was updated successfully, but these errors were encountered: