MTU is wrong for Pod's eth0 interface when using Wireguard #5868

antoninbas · 2024-01-11T22:31:34Z

Describe the bug
When using Wireguard, the MTU configured for each Pod's eth0 interface does not look correct.
It should be set to the same value as for the antrea-wg0 interface: transportMTU - 80.

# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
3: ip6tnl0@NONE: <NOARP> mtu 1452 qdisc noop state DOWN group default qlen 1000
    link/tunnel6 :: brd :: permaddr 4677:d46a:24b1::
4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 6a:9f:1c:f5:5b:2c brd ff:ff:ff:ff:ff:ff
5: genev_sys_6081: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65000 qdisc noqueue master ovs-system state UNKNOWN group default qlen 1000
    link/ether 46:ab:5d:06:3f:85 brd ff:ff:ff:ff:ff:ff
6: antrea-gw0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65485 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 6a:46:b8:75:db:ef brd ff:ff:ff:ff:ff:ff
    inet 10.10.1.1/24 brd 10.10.1.255 scope global antrea-gw0
       valid_lft forever preferred_lft forever
7: antrea-wg0: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 65455 qdisc noqueue state UNKNOWN group default
    link/none
    inet 10.10.1.1/32 scope global antrea-wg0
       valid_lft forever preferred_lft forever
8: antrea-egress0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
    link/ether 7a:74:3e:b5:66:11 brd ff:ff:ff:ff:ff:ff
9: coredns--70cbe6@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65485 qdisc noqueue master ovs-system state UP group default
    link/ether ae:b6:f9:99:20:c7 brd ff:ff:ff:ff:ff:ff link-netnsid 1
10: eth0@if11: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65535 qdisc noqueue state UP group default
    link/ether 02:42:ac:12:00:03 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.18.0.3/16 brd 172.18.255.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fc00:f853:ccd:e793::3/64 scope global nodad
       valid_lft forever preferred_lft forever
    inet6 fe80::42:acff:fe12:3/64 scope link
       valid_lft forever preferred_lft forever
11: local-pa-daf9ad@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65485 qdisc noqueue master ovs-system state UP group default
    link/ether 26:33:e9:2b:1d:2a brd ff:ff:ff:ff:ff:ff link-netnsid 2
12: coredns--3bcc0c@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65485 qdisc noqueue master ovs-system state UP group default
    link/ether b6:f5:55:02:8e:5d brd ff:ff:ff:ff:ff:ff link-netnsid 3
13: antrea-t-579008@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65485 qdisc noqueue master ovs-system state UP group default
    link/ether 22:78:19:2f:c2:81 brd ff:ff:ff:ff:ff:ff link-netnsid 4

This is a Kind cluster, which is why the MTU for the transport interface (eth0) is 65535.
For antrea-wg0, the MTU is set to 65535 - 80 = 65455, but for Pod veths, it is set to 65535 - 50 = 65485.

The MTU for Pod interfaces is computed by

antrea/pkg/agent/config/node_config.go

Lines 267 to 285 in 792e244

    
           func (nc *NetworkConfig) CalculateMTUDeduction(isIPv6 bool) int { 
        
           	var mtuDeduction int 
        
           	// When Multi-cluster Gateway is enabled, we need to reduce MTU for potential cross-cluster traffic. 
        
           	if nc.TrafficEncapMode.SupportsEncap() || nc.EnableMulticlusterGW { 
        
           		if nc.TunnelType == ovsconfig.VXLANTunnel { 
        
           			mtuDeduction = vxlanOverhead 
        
           		} else if nc.TunnelType == ovsconfig.GeneveTunnel { 
        
           			mtuDeduction = geneveOverhead 
        
           		} else if nc.TunnelType == ovsconfig.GRETunnel { 
        
           			mtuDeduction = greOverhead 
        
           		} 
        
           	} 
        
           	if nc.TrafficEncapMode.SupportsEncap() && isIPv6 { 
        
           		mtuDeduction += ipv6ExtraOverhead 
        
           	} 
        
           	nc.MTUDeduction = mtuDeduction 
        
           	return mtuDeduction 
        
           }

which ignores the fact that we are using Wireguard, and deducts Geneve overhead (even though we are not actually using Geneve here, it is just the default value of tunnelType in the config).

The function should be updated, and the MTU should be reduced by 80B instead of 50B when using Wireguard.

To Reproduce
Create a Kind cluster and install Antrea with helm install -n kube-system antrea antrea/antrea --set trafficEncryptionMode="wireGuard".
Deploy an iperf client Pod and an iperf server Pod on 2 different Nodes.
On the server Pod, run iperf3 -s.
On the client Pod, run iperf3 -c <server IP> -u -b 0 -l 65457. 65457 is computed by taking the MTU of the eth0 interface (of the Pod) and subtracting 28B (UDP & IPv4 headers).
You should get one of the following:

An error:

iperf3: error - unable to write to stream socket: Message too long

Or, iperf runs successfully, but packets are fragmented (can be confirmed by capturing traffic on receiver Node):

22:27:04.528644 IP (tos 0x0, ttl 64, id 44941, offset 0, flags [none], proto UDP (17), length 65515)
    172.18.0.2.51820 > 172.18.0.3.51820: [bad udp cksum 0x5813 -> 0xbaa9!] UDP, length 65487
22:27:04.528649 IP (tos 0x0, ttl 64, id 44942, offset 0, flags [none], proto UDP (17), length 124)
    172.18.0.2.51820 > 172.18.0.3.51820: [bad udp cksum 0x58a3 -> 0x8feb!] UDP, length 96

If you reduce the datagram length by 30B (to compensate for the MTU misconfiguration described above), iperf works fine and you can maximize throughput:

# iperf3 -c 10.10.1.5 -u -b 0 -l 65427
warning: UDP block size 65427 exceeds TCP MSS 32716, may result in fragmentation / drops
Connecting to host 10.10.1.5, port 5201
[  5] local 10.10.2.2 port 59552 connected to 10.10.1.5 port 5201
[ ID] Interval           Transfer     Bitrate         Total Datagrams
[  5]   0.00-1.00   sec  1.44 GBytes  12.3 Gbits/sec  23600
[  5]   1.00-2.00   sec  1.50 GBytes  12.9 Gbits/sec  24560
[  5]   2.00-3.00   sec  1.51 GBytes  13.0 Gbits/sec  24780
[  5]   3.00-4.00   sec  1.42 GBytes  12.2 Gbits/sec  23310
[  5]   4.00-5.00   sec  1.38 GBytes  11.9 Gbits/sec  22720
[  5]   5.00-6.00   sec  1.17 GBytes  10.0 Gbits/sec  19190
[  5]   6.00-7.00   sec  1.33 GBytes  11.4 Gbits/sec  21850
[  5]   7.00-8.00   sec  1.40 GBytes  12.0 Gbits/sec  22910
[  5]   8.00-9.00   sec  1.33 GBytes  11.5 Gbits/sec  21880
[  5]   9.00-10.00  sec  1.38 GBytes  11.9 Gbits/sec  22720
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-10.00  sec  13.9 GBytes  11.9 Gbits/sec  0.000 ms  0/227520 (0%)  sender
[  5]   0.00-10.04  sec  8.58 GBytes  7.34 Gbits/sec  0.073 ms  86706/227519 (38%)  receiver

Versions:
Antrea 1.14.x, but also probably earlier minor versions

Additional context
Note that we can either use 80B unconditionally for Wireguard overhead, or we can use 60B when only IPv4 is in use.

We may want to adjust the MTU for antrea-gw0 as well, assuming it is possible for traffic that will eventually be encapsulated to be output through this interface. Currently we set it to the same value as for Pod's veth interfaces.

Credit goes to @AlexisDucastel for reporting this.

The text was updated successfully, but these errors were encountered:

antoninbas · 2024-01-11T23:25:47Z

@xliuxu @tnqn @luolanzone

antoninbas added kind/bug Categorizes issue or PR as related to a bug. area/transit/encapsulation Issues or PRs related to encapsulation. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. labels Jan 11, 2024

antoninbas mentioned this issue Jan 11, 2024

Should we create the antrea-tun0 OVS port when using Wireguard? #5869

Closed

antoninbas added this to the Antrea v1.15 release milestone Jan 11, 2024

antoninbas added area/transit/encryption Issues or PRs related to transit encryption (IPSec, SSL). area/transit/encapsulation Issues or PRs related to encapsulation. and removed area/transit/encapsulation Issues or PRs related to encapsulation. labels Jan 11, 2024

tnqn added action/backport Indicates a PR that requires backports. and removed action/backport Indicates a PR that requires backports. labels Jan 12, 2024

antoninbas assigned hjiajing Jan 12, 2024

hjiajing linked a pull request Jan 16, 2024 that will close this issue

Fix incorrect MTU configurations #5880

Merged

This was referenced Jan 24, 2024

MTU is wrong for Pod's eth0 interface when using GRE tunnel #5913

Closed

MTU is wrong when enabling Wireguard for Multicluster #5914

Closed

Fix incorrect MTU configurations #5880

Merged

tnqn closed this as completed in #5880 Jan 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MTU is wrong for Pod's eth0 interface when using Wireguard #5868

MTU is wrong for Pod's eth0 interface when using Wireguard #5868

antoninbas commented Jan 11, 2024 •

edited

Loading

antoninbas commented Jan 11, 2024

MTU is wrong for Pod's eth0 interface when using Wireguard #5868

MTU is wrong for Pod's eth0 interface when using Wireguard #5868

Comments

antoninbas commented Jan 11, 2024 • edited Loading

antoninbas commented Jan 11, 2024

antoninbas commented Jan 11, 2024 •

edited

Loading