Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MTU is wrong for Pod's eth0 interface when using Wireguard #5868

Closed
antoninbas opened this issue Jan 11, 2024 · 1 comment · Fixed by #5880
Closed

MTU is wrong for Pod's eth0 interface when using Wireguard #5868

antoninbas opened this issue Jan 11, 2024 · 1 comment · Fixed by #5880
Assignees
Labels
area/transit/encapsulation Issues or PRs related to encapsulation. area/transit/encryption Issues or PRs related to transit encryption (IPSec, SSL). kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.

Comments

@antoninbas
Copy link
Contributor

antoninbas commented Jan 11, 2024

Describe the bug
When using Wireguard, the MTU configured for each Pod's eth0 interface does not look correct.
It should be set to the same value as for the antrea-wg0 interface: transportMTU - 80.

# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
3: ip6tnl0@NONE: <NOARP> mtu 1452 qdisc noop state DOWN group default qlen 1000
    link/tunnel6 :: brd :: permaddr 4677:d46a:24b1::
4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 6a:9f:1c:f5:5b:2c brd ff:ff:ff:ff:ff:ff
5: genev_sys_6081: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65000 qdisc noqueue master ovs-system state UNKNOWN group default qlen 1000
    link/ether 46:ab:5d:06:3f:85 brd ff:ff:ff:ff:ff:ff
6: antrea-gw0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65485 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 6a:46:b8:75:db:ef brd ff:ff:ff:ff:ff:ff
    inet 10.10.1.1/24 brd 10.10.1.255 scope global antrea-gw0
       valid_lft forever preferred_lft forever
7: antrea-wg0: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 65455 qdisc noqueue state UNKNOWN group default
    link/none
    inet 10.10.1.1/32 scope global antrea-wg0
       valid_lft forever preferred_lft forever
8: antrea-egress0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
    link/ether 7a:74:3e:b5:66:11 brd ff:ff:ff:ff:ff:ff
9: coredns--70cbe6@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65485 qdisc noqueue master ovs-system state UP group default
    link/ether ae:b6:f9:99:20:c7 brd ff:ff:ff:ff:ff:ff link-netnsid 1
10: eth0@if11: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65535 qdisc noqueue state UP group default
    link/ether 02:42:ac:12:00:03 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.18.0.3/16 brd 172.18.255.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fc00:f853:ccd:e793::3/64 scope global nodad
       valid_lft forever preferred_lft forever
    inet6 fe80::42:acff:fe12:3/64 scope link
       valid_lft forever preferred_lft forever
11: local-pa-daf9ad@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65485 qdisc noqueue master ovs-system state UP group default
    link/ether 26:33:e9:2b:1d:2a brd ff:ff:ff:ff:ff:ff link-netnsid 2
12: coredns--3bcc0c@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65485 qdisc noqueue master ovs-system state UP group default
    link/ether b6:f5:55:02:8e:5d brd ff:ff:ff:ff:ff:ff link-netnsid 3
13: antrea-t-579008@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65485 qdisc noqueue master ovs-system state UP group default
    link/ether 22:78:19:2f:c2:81 brd ff:ff:ff:ff:ff:ff link-netnsid 4

This is a Kind cluster, which is why the MTU for the transport interface (eth0) is 65535.
For antrea-wg0, the MTU is set to 65535 - 80 = 65455, but for Pod veths, it is set to 65535 - 50 = 65485.

The MTU for Pod interfaces is computed by

func (nc *NetworkConfig) CalculateMTUDeduction(isIPv6 bool) int {
var mtuDeduction int
// When Multi-cluster Gateway is enabled, we need to reduce MTU for potential cross-cluster traffic.
if nc.TrafficEncapMode.SupportsEncap() || nc.EnableMulticlusterGW {
if nc.TunnelType == ovsconfig.VXLANTunnel {
mtuDeduction = vxlanOverhead
} else if nc.TunnelType == ovsconfig.GeneveTunnel {
mtuDeduction = geneveOverhead
} else if nc.TunnelType == ovsconfig.GRETunnel {
mtuDeduction = greOverhead
}
}
if nc.TrafficEncapMode.SupportsEncap() && isIPv6 {
mtuDeduction += ipv6ExtraOverhead
}
nc.MTUDeduction = mtuDeduction
return mtuDeduction
}

which ignores the fact that we are using Wireguard, and deducts Geneve overhead (even though we are not actually using Geneve here, it is just the default value of tunnelType in the config).

The function should be updated, and the MTU should be reduced by 80B instead of 50B when using Wireguard.

To Reproduce
Create a Kind cluster and install Antrea with helm install -n kube-system antrea antrea/antrea --set trafficEncryptionMode="wireGuard".
Deploy an iperf client Pod and an iperf server Pod on 2 different Nodes.
On the server Pod, run iperf3 -s.
On the client Pod, run iperf3 -c <server IP> -u -b 0 -l 65457. 65457 is computed by taking the MTU of the eth0 interface (of the Pod) and subtracting 28B (UDP & IPv4 headers).
You should get one of the following:

  • An error:
iperf3: error - unable to write to stream socket: Message too long
  • Or, iperf runs successfully, but packets are fragmented (can be confirmed by capturing traffic on receiver Node):
22:27:04.528644 IP (tos 0x0, ttl 64, id 44941, offset 0, flags [none], proto UDP (17), length 65515)
    172.18.0.2.51820 > 172.18.0.3.51820: [bad udp cksum 0x5813 -> 0xbaa9!] UDP, length 65487
22:27:04.528649 IP (tos 0x0, ttl 64, id 44942, offset 0, flags [none], proto UDP (17), length 124)
    172.18.0.2.51820 > 172.18.0.3.51820: [bad udp cksum 0x58a3 -> 0x8feb!] UDP, length 96

If you reduce the datagram length by 30B (to compensate for the MTU misconfiguration described above), iperf works fine and you can maximize throughput:

# iperf3 -c 10.10.1.5 -u -b 0 -l 65427
warning: UDP block size 65427 exceeds TCP MSS 32716, may result in fragmentation / drops
Connecting to host 10.10.1.5, port 5201
[  5] local 10.10.2.2 port 59552 connected to 10.10.1.5 port 5201
[ ID] Interval           Transfer     Bitrate         Total Datagrams
[  5]   0.00-1.00   sec  1.44 GBytes  12.3 Gbits/sec  23600
[  5]   1.00-2.00   sec  1.50 GBytes  12.9 Gbits/sec  24560
[  5]   2.00-3.00   sec  1.51 GBytes  13.0 Gbits/sec  24780
[  5]   3.00-4.00   sec  1.42 GBytes  12.2 Gbits/sec  23310
[  5]   4.00-5.00   sec  1.38 GBytes  11.9 Gbits/sec  22720
[  5]   5.00-6.00   sec  1.17 GBytes  10.0 Gbits/sec  19190
[  5]   6.00-7.00   sec  1.33 GBytes  11.4 Gbits/sec  21850
[  5]   7.00-8.00   sec  1.40 GBytes  12.0 Gbits/sec  22910
[  5]   8.00-9.00   sec  1.33 GBytes  11.5 Gbits/sec  21880
[  5]   9.00-10.00  sec  1.38 GBytes  11.9 Gbits/sec  22720
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-10.00  sec  13.9 GBytes  11.9 Gbits/sec  0.000 ms  0/227520 (0%)  sender
[  5]   0.00-10.04  sec  8.58 GBytes  7.34 Gbits/sec  0.073 ms  86706/227519 (38%)  receiver

Versions:
Antrea 1.14.x, but also probably earlier minor versions

Additional context
Note that we can either use 80B unconditionally for Wireguard overhead, or we can use 60B when only IPv4 is in use.

We may want to adjust the MTU for antrea-gw0 as well, assuming it is possible for traffic that will eventually be encapsulated to be output through this interface. Currently we set it to the same value as for Pod's veth interfaces.

Credit goes to @AlexisDucastel for reporting this.

@antoninbas antoninbas added kind/bug Categorizes issue or PR as related to a bug. area/transit/encapsulation Issues or PRs related to encapsulation. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. labels Jan 11, 2024
@antoninbas
Copy link
Contributor Author

@xliuxu @tnqn @luolanzone

@antoninbas antoninbas added this to the Antrea v1.15 release milestone Jan 11, 2024
@antoninbas antoninbas added area/transit/encryption Issues or PRs related to transit encryption (IPSec, SSL). area/transit/encapsulation Issues or PRs related to encapsulation. and removed area/transit/encapsulation Issues or PRs related to encapsulation. labels Jan 11, 2024
@tnqn tnqn added action/backport Indicates a PR that requires backports. and removed action/backport Indicates a PR that requires backports. labels Jan 12, 2024
@hjiajing hjiajing linked a pull request Jan 16, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/transit/encapsulation Issues or PRs related to encapsulation. area/transit/encryption Issues or PRs related to transit encryption (IPSec, SSL). kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants