-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UDP: bad checksum on VXLAN interface #1279
Comments
We hit this too. It was an absolute pain to figure out. It seems to only affect service IPs (I'm guessing because of masquerade??), and specifically UDP (nslookup doesn't work, nslookup in tcp mode does) If anyone knows where this bug comes from (besides checksum offloading) I'd be very interested. Our environment:
Weird thing is we're running almost the same versions of things (etcd is different, but I really doubt it comes from that) on a fedora 30 server and things work fine. Settings are the same, and while routing tables differ the base idea is the same... |
check iptables versions on Centos 7 and 8 |
We are facing the same issue as mentioned by @dmitry-irtegov and @CMajeri. Environment:
This workaround worked for us -
|
It's definitely related to this one: kubernetes/kubernetes#88986 The solution kubernetes/kubernetes#92035 has a good description on the issue. It's the change on iptables rule exposing some existing kernel bug, especially in RHEL7. Here is another workaround for the issue not requiring turning off chksum offload:
UDP port 8472 is the default port for flannel encapsulating packet. It clears the mark to avoid doing SNAT on the encapsulating packet, thus no double SNAT. |
Similar to flannel-io/flannel#1279, unmark output to bypaas kernel bug and enable checksum for better performance.
Similar to flannel-io/flannel#1279, unmark output to bypaas kernel bug and enable checksum for better performance. (cherry picked from commit dcda11d)
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Can you tell me why or give a link? Please. |
On k8s 1.17 cluster with RHeL 7 nodes, service IPs for pods on other nodes are not accessible.
Pod IP seem to work fine. Most noticeably, CoreDNS does not work.
Target node dmesg is filled with messages like:
Turning IP checksum offloading on the flannel.1 interface fixes the issue:
Other people also hit this: https://t.du9l.com/2020/03/kubernetes-flannel-udp-packets-dropped-for-wrong-checksum-workaround/
This happens both with cni-canal and pure cni-flannel, so we decided to report the issue here.
Expected Behavior
I do not have to adjust interface settings to get flannel to work.
Your Environment
The text was updated successfully, but these errors were encountered: