Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't access anything on the 10.43.x.x range #1247

Closed
BlackTurtle123 opened this issue Dec 25, 2019 · 37 comments
Closed

Can't access anything on the 10.43.x.x range #1247

BlackTurtle123 opened this issue Dec 25, 2019 · 37 comments

Comments

@BlackTurtle123
Copy link

BlackTurtle123 commented Dec 25, 2019

Version:
v1.0.1

Describe the bug
The internal connection on range 10.43.x.x doesn't seem to work. Old iptables has been anabled, system is debian based.

To Reproduce
Use the playbook in contributions in the repo, update the version, and only install the service (master) not node

Expected behavior
K3S cluster installs and starts up

Actual behavior
Pods fail and timeout on 10.43.x.x ip's

Additional context
Error: Get https://10.43.0.1:443/api/v1/namespaces/kube-system/secrets?labelSelector=OWNER%!D(MISSING)TILLER: dial tcp 10.43.0.1:443: i/o timeout

panic: Get https://10.43.0.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication: dial tcp 10.43.0.1:443: i/o timeout

@maxirus
Copy link

maxirus commented Jan 28, 2020

This was happening to me while trying to "auto-install" a Helm Chart via

apiVersion: helm.cattle.io/v1
kind: HelmChart

The helm-install pod was scheduled to a Raspberry Pi 4 node. Deleted the Pod which caused it to be scheduled on an x86_64 node and it ran fine (running a mixed CPU Arch.). Running iptables-save on the Raspberry Pi yielded no rules pertaining to Kubernetes. Not sure why yet...

@pennywise53
Copy link

If you run iptables-save, it tells you that you need to run iptables-legacy-save to see the rest of the rules output. That's where I am seeing all of the kubernetes rules listed.

@pennywise53
Copy link

After being plagued with the issue for several days, I decided to try a removal of iptables, and all of a sudden my services are working and I am able to get DNS resolution on everything. The issue related above, 977, talks about some iptables conflicts, and placement of the REJECT rule. I haven't dug into a correct order on Raspbian yet, but this was a quick fix for me to get everything up and running.

@ghost
Copy link

ghost commented Sep 8, 2020

Hi There, is there any news here? I am experiencing this issue and I've been trying to solve it for several weeks on and off with no success. I see things like

 1 reflector.go:322] github.com/containous/traefik/vendor/k8s.io/client-go/informers/factory.go:86: Failed to watch *v1.Service: Get https://172.18.0.1:443/api/v1/services?resourceVersion=2205&timeoutSeconds=460&watch=true: dial tcp 172.18.0.1:443: connect: connection refused

in the Traefik log or

Failed to watch *v1.Namespace: Get "https://172.18.0.1:443/api/v1/namespaces?allowWatchBookmarks=true&resourceVersion=2205&timeout=8m20s&timeoutSeconds=500&watch=true": dial tcp 172.18.0.1:443: connect: connection refused

in the coredns log.
I am using k3s v1.18.8+k3s1 on CentOS 8.

 iptables --version
iptables v1.8.2 (nf_tables)

I have tried modprobe br_netfilter and also tried to add nftables rules:

nft add rule filter INPUT ip saddr 172.17.0.0/24 iif cni0 accept
nft add rule filter OUTPUT ip saddr 172.18.0.0/24 accept

I run k3s server --pause-image k8s.gcr.io/pause:3.1 --cluster-cidr=172.17.0.0/24 --service-cidr=172.18.0.0/24.
Any idea how to solve it? Is k3s supposed to work on CentOS 8 with nftables after all, or only with firewalld/iptables?

@Berndinox
Copy link

Berndinox commented Dec 4, 2020

I got the same issue with Ubuntu 20!
Details could be found here: https://serverfault.com/questions/1044971/k3s-dial-tcp-10-43-0-1443-connect-connection-refused

Error
E1204 11:42:25.216392 8 leaderelection.go:321] error retrieving resource lock ingress-nginx/ingress-controller-leader-nginx: Get "https://10.43.0.1:443/api/v1/namespaces/ingress-nginx/configmaps/ingress-controller-leader-nginx": dial tcp 10.43.0.1:443: connect: connection refused

All 10.43.x.x IPs seem not be be working!

Any Ideas / Solution onto this?

EDIT:
Seems Like K3s does not work when colocating master and node on the same hosts... at least it was the problem for me

@Id2ndR
Copy link

Id2ndR commented Feb 9, 2021

You can try to run sudo iptables -I FORWARD -j ACCEPT and see it it (temporary) solves the issue.

K3S (I'm not sure of what component exactly) regenerates iptables rules each 30 seconds, and adds KUBE-POD-FW-* chain rules above the manually insert one, so the issue will still be there.

Currently I have the issue with traefik for some ingresses but not all. I'm still digging to find out how the rules are generated and diagnose it.
EDIT: in my case, it was Networkpolicies that blocks my egress traefik. I found it by narrowing the problem through iptables chain rules.

@brandond
Copy link
Member

brandond commented Feb 9, 2021

The kubelet is responsible for most of the forwarding rules. The remainder are handled by the network policy controller, although their tables will likely be empty if you don't have any policies in your cluster to restrict communication.

Do you perhaps have your host-based firewall (ufw, firewalld, etc) enabled?

@Id2ndR
Copy link

Id2ndR commented Feb 9, 2021

In my case, there were no problem at all: all worked as expected, but I just did not known it.
To be more specific a Network Policy was added by the helm chart I used, but I did not noticed it.

@brandond
Copy link
Member

brandond commented Feb 9, 2021

Ah yeah. Kubernetes network policy would definitely block it, by design.

@brunnels
Copy link

I found the problem on my arch arm and regular arch is related to this #1812

I doesn't look like the iptables-detect.sh properly supports arch. When I run it on one of my nodes:

[k8s@k8s-master-01 ~]$ sudo find / -type f -name iptables-detect.sh
/var/lib/rancher/k3s/data/912de41a65c99bc4d50bbb78e6106f3acbf3a70b8dead77b4c4ebc6755b4f9d6/bin/aux/iptables-detect.sh

[k8s@k8s-master-01 ~]$ sudo /var/lib/rancher/k3s/data/912de41a65c99bc4d50bbb78e6106f3acbf3a70b8dead77b4c4ebc6755b4f9d6/bin/aux/iptables-detect.sh
mode is legacy detected via rules and containerized is false

But it's nft, not legacy:

[k8s@k8s-master-01 ~]$ ls -l /sbin/iptables
lrwxrwxrwx 1 root root 12 Mar 10 15:12 /sbin/iptables -> iptables-nft

quick workaround is to change the links and then restart your cluster:

sudo -s
cd /bin
rm iptables && ln -s iptables-legacy iptables && rm ip6tables && ln -s ip6tables-legacy ip6tables

Can test if it's working by doing this

[k8s@k8s-master-01 ~]$ kubectl apply -f https://k8s.io/examples/admin/dns/dnsutils.yaml
[k8s@k8s-master-01 ~]$ kubectl exec -it dnsutils -n default -- nslookup google.com
Server:		10.43.0.10
Address:	10.43.0.10#53

Non-authoritative answer:
Name:	google.com
Address: 172.217.8.174
Name:	google.com
Address: 2607:f8b0:4000:803::200e

If it's broken you'll get ;; connection timed out; no servers could be reached

@brandond
Copy link
Member

[k8s@k8s-master-01 ~]$ sudo /var/lib/rancher/k3s/data/912de41a65c99bc4d50bbb78e6106f3acbf3a70b8dead77b4c4ebc6755b4f9d6/bin/aux/iptables-detect.sh
mode is legacy detected via rules and containerized is false

detected via rules indicates that you have legacy iptables rules present on the system. This is determined by running iptables-legacy-save and ip6tables-legacy-save - if these return more than 10 lines of output between the two of them then it is assumed that you are using legacy iptables. Can you determine what it is that was creating these legacy rules?

@brunnels
Copy link

brunnels commented Mar 10, 2021

I just cleared out the cluster and did k3s-uninstall.sh on all nodes. Then I did the following and rebooted to make sure there were no legacy rules.

iptables-save | awk '/^[*]/ { print $1 } 
/^:[A-Z]+ [^-]/ { print $1 " ACCEPT" ; }
/COMMIT/ { print $0; }' | iptables-restore

ip6tables-save | awk '/^[*]/ { print $1 } 
/^:[A-Z]+ [^-]/ { print $1 " ACCEPT" ; }
/COMMIT/ { print $0; }' | ip6tables-restore

rmmod iptable_filter iptable_mangle iptable_nat iptable_raw iptable_security
rmmod ip6table_filter ip6table_mangle ip6table_nat ip6table_raw ip6table_security

I also ensured the original symlinks were restored:

[k8s@k8s-master-01 ~]$ ls -l /sbin/iptables
lrwxrwxrwx 1 root root 12 Mar 10 15:12 /sbin/iptables -> iptables-nft

Then I brought the cluster back up. I'm using this ansible role and example except with one worker node, servicelb disabled, and traefik disabled. https://github.com/PyratLabs/ansible-role-k3s/blob/main/documentation/quickstart-ha-cluster.md

Once it's back up I'm still getting mode is legacy detected via rules and containerized is false and iptables-legacy-save shows lots of rules and iptables-nft-save shows the warning # Warning: iptables-legacy tables present, use iptables-legacy-save to see them but all rules were added by k3s.

@brunnels
Copy link

brunnels commented Mar 10, 2021

I tested as detailed before and the google.com can't be resolved by dnsutils pod

Then I went to each node and changed the symlink for iptables and ip6tables to point to iptables-legacy and ip6tables-legacy, ran k3s-uninstall.sh on each node of the cluster, and rebuilt the cluster again with ansible, and then tested again. Now it resolves properly.

@brandond
Copy link
Member

The rules check is here, can you compare the output on your systems?
https://github.com/k3s-io/k3s-root/blob/e2afbdfc30e9bc2f020b307504cc5d1a31b35404/iptables-detect/iptables-detect.sh#L73

(iptables-legacy-save || true; ip6tables-legacy-save || true) 2>/dev/null | grep '^-'

@brunnels
Copy link

I reset the cluster, updated all nodes, cleared all iptables rules, and re-installed iptables. I downloaded the iptables-detect.sh and ran it before installing k3s. Here's what I get for arch armv7l and arch amd64.

[k8s@k8s-master-01 ~]$ sudo ./iptables-detect.sh 
mode is nft detected via unknownos and containerized is false

@brandond
Copy link
Member

I'm curious what was initially there that lead it to detect legacy iptables, though.

@brunnels
Copy link

Okay, I think this might be this issue then? After re-installing iptables it changes the links but the detect script is saying nft.

[root@k8s-master-01 k8s]# ls -l /sbin/iptables
lrwxrwxrwx 1 root root 20 Jan 21 22:56 /sbin/iptables -> xtables-legacy-multi
[root@k8s-master-01 k8s]# ./iptables-detect.sh 
mode is nft detected via unknownos and containerized is false

To test this I just spun up the cluster again and I can't resolve anything from a pod. This is creating legacy rules though.

[root@k8s-master-01 k8s]# /sbin/iptables-nft-save 
# Warning: iptables-legacy tables present, use iptables-legacy-save to see them

@brunnels
Copy link

I found another package for arch iptables-nft which conflicts with iptables package so I installed it.

After install I have this.

[k8s@k8s-master-01 ~]$ sudo ls -l /sbin/iptables
lrwxrwxrwx 1 root root 17 Jan 21 22:56 /sbin/iptables -> xtables-nft-multi
[k8s@k8s-master-01 ~]$ sudo ./iptables-detect.sh 
mode is nft detected via unknownos and containerized is false

Spinning up the cluster now to see if this resolves things.

@brunnels
Copy link

Now after spinning up the cluster I get this

[k8s@k8s-master-01 ~]$ sudo ./iptables-detect.sh 
mode is nft detected via rules and containerized is false
[k8s@k8s-master-01 ~]$ sudo ls -l /sbin/iptables
lrwxrwxrwx 1 root root 17 Jan 21 22:56 /sbin/iptables -> xtables-nft-multi

Which is good I think. But something else going on now. If I open a shell into a busybox container in the default namespace I get a nslookup timeout I can't ping the dns server that nslookup is using, which is the ip of kube-dns service.

@brandond
Copy link
Member

Most distros have an update-alternatives script that you are supposed to use to do this sort of thing, as opposed to symlinking things manually. You might check to see if Arch has a similar tool that you're intended to use.

@brunnels
Copy link

I was only changing symlinks to test. When I removed iptables and installed iptables-nft it removed all the old executables and symlinks so everything is as intended on all nodes now. I'm going through these steps now so hopefully that will shed some light on the problem. https://rancher.com/docs/rancher/v2.x/en/troubleshooting/dns/

@brunnels
Copy link

I've re-imaged a few times now and tried both iptables and iptables-nft packages. I'm pretty certain at this point that it's something wrong with the iptables rules that k3s is adding because I can get outside the pod and resolve with dns if I set the server in nslookup to 1.1.1.1 or 8.8.8.8. I just can't communicate with the coredns service on 10.43.0.10.

I don't know how it started working before but it's certainly not working now no matter what I try.

@brandond
Copy link
Member

So in your current state, should your system be using iptables or nftables? Which one is k3s adding rules to?

@brunnels
Copy link

brunnels commented Mar 13, 2021 via email

@brunnels
Copy link

I was able to get this working which uses standard k8s and iptables. https://github.com/raspbernetes/k8s-cluster-installation

@ChristianCiach
Copy link

ChristianCiach commented Apr 15, 2021

I have exactly the same issue after a fresh installation of k3s on a fresh CentOS 8 VM (Virtualbox). Is k3s even supposed to work with CentOS 8?

@ChristianCiach
Copy link

On my CentOS 8 machine, the package iptables-services that installes the systemd service iptables was the issue. After uninstalling this package, everything works fine. See here: #1817 (comment)

@corpix
Copy link

corpix commented Jul 9, 2021

Same issue on NixOS, iptables v1.8.7 (legacy)

@Timvissers
Copy link

Same issue, on RHEL 8.4 (aws ami) without iptables, k3s v1.20.2+k3s1
Detect script returns mode is nft detected via os and containerized is false
On Centos 8 the result is the same, however the problem does not occur.
Tried some of the suggestions: modprobes, ip forward. Installing iptables-service did not help. No success yet.

@brandond
Copy link
Member

Have you seen https://rancher.com/docs/k3s/latest/en/advanced/#additional-preparation-for-red-hat-centos-enterprise-linux

@Timvissers
Copy link

Have you seen https://rancher.com/docs/k3s/latest/en/advanced/#additional-preparation-for-red-hat-centos-enterprise-linux

oh my. I thought this didn't apply because I have no firewall. But the
systemctl disable nm-cloud-setup.service nm-cloud-setup.timer
fixed this problem for me. Thank you!

@rajivml
Copy link

rajivml commented Jul 14, 2021

shouldn't disable-cloud-controller set to true disable both the below services if they are enabled ?

systemctl disable nm-cloud-setup.service nm-cloud-setup.timer

@brandond
Copy link
Member

Network Manager's interference with container virtual interfaces is a separate issue from firewalld/ufw blocking traffic... so you need to ensure both are disabled.

@rajivml
Copy link

rajivml commented Jul 15, 2021

@brandond if we stop these services before the RKE2 install, would it still require a reboot ? If it doesn't then as part of our infra automation we will stop these if they exists and are active before RKE2 install

systemctl disable nm-cloud-setup.service nm-cloud-setup.timer
reboot

@Id2ndR
Copy link

Id2ndR commented Jul 15, 2021

IMO this issue turns into a FAQ related to network configuration. Comments either are or should be written in the documentation.
So, can we close this issue now?

@brandond
Copy link
Member

It is covered in the documentation, I linked it up above.

@stale
Copy link

stale bot commented Jan 11, 2022

This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 180 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.

@stale stale bot added the status/stale label Jan 11, 2022
@stale stale bot closed this as completed Jan 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests