-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix incorrect MTU configurations #5880
Conversation
@hjiajing the sign-off info is missing. DCO check failed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Please check the usage of the result of the calculation globally and not cause new problems.
- If possible, we could add a validation to existing wireguard e2e test to check packet of MTU size not get dropped.
- Link to the original issue
pkg/agent/config/node_config.go
Outdated
if nc.TrafficEncryptionMode == TrafficEncryptionModeWireGuard { | ||
mtuDeduction = WireGuardOverhead | ||
} else if nc.TrafficEncryptionMode == TrafficEncryptionModeIPSec { | ||
mtuDeduction = IPSecESPOverhead |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The handling will cause some other issues:
- The comment of
NetworkConfig.MTUDeduction
says it doesn't count IPsec and WireGuard overhead.
MTUDeduction only counts IPv4 tunnel overhead, no IPsec and WireGuard overhead.
- mc_route_controller.go will deduct the overhead another time:
MTU: controller.nodeConfig.NodeTransportInterfaceMTU - controller.networkConfig.MTUDeduction - config.WireGuardOverhead,
getInterfaceMTU
will deduct the overhead another time:
if i.networkConfig.TrafficEncryptionMode == config.TrafficEncryptionModeIPSec {
mtu -= config.IPSecESPOverhead
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comment, I removed the return value of CalculateMTUDeduction
, when we need to calculate MTUs, the caller just uses networkConfig.MTUDeduction
directly. And when the TrafficEncryptionMode
is WireGuard, it will deduct the WG overhead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- If possible, we could add a validation to existing wireguard e2e test to check packet of MTU size not get dropped.
- Link to the original issue
We need both unit tests and e2e tests to prove the fix works as expected and avoid another regression in the future.
pkg/agent/agent.go
Outdated
if i.networkConfig.TrafficEncryptionMode == config.TrafficEncryptionModeWireGuard { | ||
mtu -= config.WireGuardOverhead | ||
} else { | ||
mtu -= i.networkConfig.MTUDeduction | ||
if i.networkConfig.TrafficEncryptionMode == config.TrafficEncryptionModeIPSec { | ||
mtu -= config.IPSecESPOverhead | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel it's still a bit unconsolidated to spread the deduction logic in several places, which is error prone and caused the previous bug. Could we take all cases into consideration in CalculateMTUDeduction and use the result everywhere?
By all cases, I mean wireguard only, wireguard + encap (multi-cluster case), ipsec, overlay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I will refactor this part to one MTU calculator for all scenarios.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a new variable "EncryptionDeduction" in NetworkConfig, and the MTU will deducts TunnelDeduction
and EncryptionDeduction
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we just calculate a single deduction value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TunnelDeduction
is also used in calculating multi-cluster Pod MTU, so I divided the deduction into two parts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed with @hjiajing, he will change to a single deduction value.
9aff2f1
to
b2c145b
Compare
/test-e2e |
1 similar comment
/test-e2e |
/test-e2e |
pkg/agent/config/node_config.go
Outdated
// When Multi-cluster Gateway is enabled, we need to reduce MTU for potential cross-cluster traffic. | ||
if nc.TrafficEncapMode.SupportsEncap() || nc.EnableMulticlusterGW { | ||
if nc.TunnelType == ovsconfig.VXLANTunnel { | ||
mtuDeduction = vxlanOverhead | ||
nc.MTUDeduction = vxlanOverhead |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IPsec work together with those encap tunnel modes, I think this should be "nc.MTUDeduction += vxlanOverhead". cc @xliuxu
I have a question regarding the IPSec with Encap mode and Multicluster enabled, do we verified this kind of traffic before for MC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to @luolanzone for the IPsec case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. changed =
to +=
.
test/e2e/framework.go
Outdated
// Example stdout: | ||
// 22: vxlan-29acf3@if5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master ovs-system state UP mode DEFAULT group default | ||
// link/ether be:41:93:69:87:02 brd ff:ff:ff:ff:ff:ff link-netns cni-320ae61f-5e51-d123-0169-9f1807390500 | ||
fields := strings.Fields(strings.Split(strings.TrimSpace(stdout), "\n")[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be simper to get the MTU from the system file directly: cat /sys/class/net/<interface_name>/mtu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, done.
c2c3d32
to
0cbb17f
Compare
/test-e2e |
@@ -129,7 +129,7 @@ func NewMCDefaultRouteController( | |||
controller.wireGuardConfig = &config.WireGuardConfig{ | |||
Port: multiclusterConfig.WireGuard.Port, | |||
Name: multiclusterWireGuardInterface, | |||
MTU: controller.nodeConfig.NodeTransportInterfaceMTU - controller.networkConfig.MTUDeduction - config.WireGuardOverhead, | |||
MTU: controller.nodeConfig.NodeTransportInterfaceMTU - controller.networkConfig.MTUDeduction, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it just controller.networkConfig.InterfaceMTU
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
test/e2e/framework.go
Outdated
@@ -2481,6 +2481,47 @@ func (data *TestData) GetTransportInterface() (string, error) { | |||
return "", fmt.Errorf("no interface was assigned with Node IP %s", nodeIP) | |||
} | |||
|
|||
func (data *TestData) GetPodInterfaceName(nodeName string, podName string) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mean checking MTU in e2e. The calculation has already been validated in unit tests, the mistake of calculation can be caught by unit test. What we really is to ensure the calculation can work by testing real traffic, which was not there before.
My suggestion of how to write the e2e:
Just change the existing connectivity test to test the MTU size packet can be forwarded. The test should fail without the fix and succeed with the fix.
For example, in runPingMesh
, change the packet size to the MTU-28 and specify -M do
to set Don't Frangment
bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I will add this part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E2E test added.
cmd/antrea-agent/agent.go
Outdated
@@ -205,7 +205,7 @@ func run(o *Options) error { | |||
IPsecConfig: config.IPsecConfig{ | |||
AuthenticationMode: ipsecAuthenticationMode, | |||
}, | |||
EnableMulticlusterGW: enableMulticlusterGW, | |||
MulticlusterConfig: o.config.Multicluster, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Do not just replace it without removing the original attribute.
- It's inconsistent to use o.config.Multicluster as the source of whether multicluster gateway is enabled. If you take a look at how enableMulticlusterGW is calculated, there is other condition.
I believe we should keep passing enableMulticlusterGW, and add a parsed nc.MulticlusterConfig.TrafficEncryptionMode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I removed this line by mistake, added back.
d8ce863
to
c8bc079
Compare
/test-e2e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also check golang-lint failures.
/test-e2e |
e751c22
to
d8dbf0b
Compare
@antoninbas could you take a look at the PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
pkg/agent/config/node_config.go
Outdated
} | ||
if nc.TrafficEncryptionMode == TrafficEncryptionModeWireGuard { | ||
// When WireGuard is enabled, cross-node traffic will only be encrypted, just reduce MTU for encryption. | ||
nc.MTUDeduction = WireGuardOverhead |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you want to do 60B when IPv6 is disabled and 80B when IPv6 is enabled?
see https://lists.zx2c4.com/pipermail/wireguard/2017-December/002201.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the info, will add it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
/test-e2e |
/test-e2e |
test/e2e/connectivity_test.go
Outdated
if t.Failed() { | ||
time.Sleep(15 * time.Minute) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for testing only
/test-e2e |
Signed-off-by: Quan Tian <qtian@vmware.com>
/test-e2e |
It may be related by MTU calculation bug in specific versions, created #5922 |
/test-conformance |
/test-windows-e2e |
/test-multicluster-e2e |
/test-multicluster-e2e |
The commit fixes 3 incorrect MTU configurations: 1. When using the WireGuard encryption mode, the Pod eth0's MTU was not correct. The MTU deducted Geneve overhead because the default tunnel type is Geneve while it should deduct the WireGuard overhead as traffic will be encrypted instead of encapsulated. 2. When using the GRE tunnel type, the Pod eth0's MTU was not correct. The actual overhead is 14 outer MAC, 20 outer IP, and 8 GRE header (4 standard header + 4 key field), summing up to 42 bytes. 3. When enabling Wireguard for Multicluster, the MTU of all Pod interfaces and wireguard interface were reduced 130 bytes (50 for geneve + 80 for wireguard), however, cross-cluster traffic sent from Pods were not forwarded by wireguard interface. This is because traffic originated from Pods will be encapsulated on gateway Node, and it's the encapsulated packet which will be encrypted. If the wireguard interface is set with the same MTU as the Pod interface, the encapsulated packet will exceed wireguard interface's MTU. Signed-off-by: Jiajing Hu <hjiajing@vmware.com> Signed-off-by: Quan Tian <qtian@vmware.com> Co-authored-by: Quan Tian <qtian@vmware.com>
The commit fixes 3 incorrect MTU configurations: 1. When using the WireGuard encryption mode, the Pod eth0's MTU was not correct. The MTU deducted Geneve overhead because the default tunnel type is Geneve while it should deduct the WireGuard overhead as traffic will be encrypted instead of encapsulated. 2. When using the GRE tunnel type, the Pod eth0's MTU was not correct. The actual overhead is 14 outer MAC, 20 outer IP, and 8 GRE header (4 standard header + 4 key field), summing up to 42 bytes. 3. When enabling Wireguard for Multicluster, the MTU of all Pod interfaces and wireguard interface were reduced 130 bytes (50 for geneve + 80 for wireguard), however, cross-cluster traffic sent from Pods were not forwarded by wireguard interface. This is because traffic originated from Pods will be encapsulated on gateway Node, and it's the encapsulated packet which will be encrypted. If the wireguard interface is set with the same MTU as the Pod interface, the encapsulated packet will exceed wireguard interface's MTU. Signed-off-by: Jiajing Hu <hjiajing@vmware.com> Signed-off-by: Quan Tian <qtian@vmware.com> Co-authored-by: Quan Tian <qtian@vmware.com>
The commit fixes 3 incorrect MTU configurations: 1. When using the WireGuard encryption mode, the Pod eth0's MTU was not correct. The MTU deducted Geneve overhead because the default tunnel type is Geneve while it should deduct the WireGuard overhead as traffic will be encrypted instead of encapsulated. 2. When using the GRE tunnel type, the Pod eth0's MTU was not correct. The actual overhead is 14 outer MAC, 20 outer IP, and 8 GRE header (4 standard header + 4 key field), summing up to 42 bytes. 3. When enabling Wireguard for Multicluster, the MTU of all Pod interfaces and wireguard interface were reduced 130 bytes (50 for geneve + 80 for wireguard), however, cross-cluster traffic sent from Pods were not forwarded by wireguard interface. This is because traffic originated from Pods will be encapsulated on gateway Node, and it's the encapsulated packet which will be encrypted. If the wireguard interface is set with the same MTU as the Pod interface, the encapsulated packet will exceed wireguard interface's MTU. Signed-off-by: Jiajing Hu <hjiajing@vmware.com> Signed-off-by: Quan Tian <qtian@vmware.com> Co-authored-by: Quan Tian <qtian@vmware.com>
The commit fixes 3 incorrect MTU configurations: 1. When using the WireGuard encryption mode, the Pod eth0's MTU was not correct. The MTU deducted Geneve overhead because the default tunnel type is Geneve while it should deduct the WireGuard overhead as traffic will be encrypted instead of encapsulated. 2. When using the GRE tunnel type, the Pod eth0's MTU was not correct. The actual overhead is 14 outer MAC, 20 outer IP, and 8 GRE header (4 standard header + 4 key field), summing up to 42 bytes. 3. When enabling Wireguard for Multicluster, the MTU of all Pod interfaces and wireguard interface were reduced 130 bytes (50 for geneve + 80 for wireguard), however, cross-cluster traffic sent from Pods were not forwarded by wireguard interface. This is because traffic originated from Pods will be encapsulated on gateway Node, and it's the encapsulated packet which will be encrypted. If the wireguard interface is set with the same MTU as the Pod interface, the encapsulated packet will exceed wireguard interface's MTU. Signed-off-by: Jiajing Hu <hjiajing@vmware.com> Signed-off-by: Quan Tian <qtian@vmware.com> Co-authored-by: Quan Tian <qtian@vmware.com>
The commit fixes 3 incorrect MTU configurations:
Fixes #5868
Fixes #5913
Fixes #5914