-
Notifications
You must be signed in to change notification settings - Fork 347
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix EIP inconsistency and ESVC inconsistency
This PR fixes two issues: EgressIP inconsistency: If a pod is on an egressNode and it tries to reach another node in the cluster OR it tries to reach a service backed by a host-net backend which is another node in the cluster, then packet is SNAT-ed to egressIP. If a pod is on a non-egressNode packet is SNAT-ed to nodeIP. Desired solution: in-cluster packets shouldn't be SNAT-ed to egressIP. EgressSVC inconsistency: If a backendpod is on an egressSVC node and it tries to reach another node in the cluster OR it tries to reach a service backed by a host-net backend which is another node in the cluster, then packet is SNAT-ed to svcVIP of LB svc. If a pod is on a non-egressNode packet is SNAT-ed to nodeIP. This totally breaks the traffic flow for pod2othernode because if the traffic is SNAT-ed to svcVIP, then reply traffic goes to serviceVIP which could end up in any backend and conntrack will not be able to match it to the existing onward entry. Desired solution: in-cluster packets shouldn't be SNAT-ed to egressSVC. Design: Change the default allow 102 priority policies on the ovn_cluster_router to look like this: 102 (ip4.src == $a12749576804119081385 || ip4.src == $a16335301576733828072) && ip4.dst == $a11079093880111560446 allow pkt_mark=1008 Here first address-set will contain IPs of all egressIPPods. Second address-set will contain IPs of all egressSVCPods. Destination = all the nodeIPs in the cluster. In addition to simply "allow"-ing so that reroute policies are not matched upon, we also mark these packets with the 1008 mark. To ensure packets going out for egressIP (SGW MODE topology for both modes) are SNAT-ed to nodeIP, we add a flow on br-ex: cookie=0xdeff105, duration=759.429s, table=0, n_packets=0, n_bytes=0, idle_age=759, priority=105,pkt_mark=0x3f0,ip,in_port=2 actions=ct(commit,zone=64000,nat(src=172.19.0.2),exec(load:0x1->NXM_NX_CT_MARK[])),output:1 This will ensure that even if the SNAT towards egressIP is done we do another SNAT on top. To ensure packets going out for egressSVC (LGW mode topology for both modes) are SNAT-ed to nodeIP, we add a new iptable rule on OVN-KUBE-EGRESS-SVC chain: [1:64] -A OVN-KUBE-EGRESS-SVC -m mark --mark 0x3f0 -j RETURN This ensures the other SNAT rules are skipped. NOTE that 0x3f0 = 1008. NOTE2: EIP SNAT issue is only in sgw while ESVC SNAT issue is only in lgw the vice versa combo works due to the way egress traffic topology is. Does this seem like a design change/feature for a bugfix? YES Bug is complicated so unless we ask for a conditionalSNAT change in OVN which the last time we spoke about was deemed unnecessary for this small corner case, we need this fix. Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
- Loading branch information
Showing
22 changed files
with
710 additions
and
194 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.