DNS not working after reboot #2383

hobyte · 2021-07-23T08:37:17Z

What happened:
I created a new kind cluster, then rebooted my computer. After reboot, the dns cannot resolve adresses

What you expected to happen:

dns can resolve adresses

How to reproduce it (as minimally and precisely as possible):

create a new kind cluster
test dns: it's working
reboot your machine (dont't stop docker before reboot)
test dns again:

#APISERVER=https://kubernetes.default.svc
#SERVICEACCOUNT=/var/run/secrets/kubernetes.io/serviceaccount
#NAMESPACE=$(cat ${SERVICEACCOUNT}/namespace)
#TOKEN=$(cat ${SERVICEACCOUNT}/token)
#CACERT=${SERVICEACCOUNT}/ca.crt
#curl --cacert ${CACERT} --header "Authorization: Bearer ${TOKEN}" -X GET ${APISERVER}/api
curl: (6) Could not resolve host: kubernetes.default.svc

Taken from https://kubernetes.io/docs/tasks/run-application/access-api-from-pod/#without-using-a-proxy

Anything else we need to know?:

dns pods are running
dns logs:

.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.8.0
linux/amd64, go1.15.3, 054c9ae

dns lookup:

#nslookup kubernetes.default
Server:         10.96.0.10
Address:        10.96.0.10#53

Name:   kubernetes.default.svc.cluster.local
Address: 10.96.0.1
#nslookup kubernetes.default.svc
;; connection timed out; no servers could be reached

rslov.conf:

#cat /etc/resolv.conf 
search default.svc.cluster.local svc.cluster.local cluster.local fritz.box
nameserver 10.96.

Environment:

kind version: (use kind version): kind v0.11.1 go1.16.4 linux/amd64
Kubernetes version: (use kubectl version): Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.1", GitCommit:"5e58841cce77d4bc13713ad2b91fa0d961e69192", GitTreeState:"clean", BuildDate:"2021-05-12T14:18:45Z", GoVersion:"go1.16.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.1", GitCommit:"5e58841cce77d4bc13713ad2b91fa0d961e69192", GitTreeState:"clean", BuildDate:"2021-05-21T23:01:33Z", GoVersion:"go1.16.4", Compiler:"gc", Platform:"linux/amd64"}
Docker version: (use docker info): Client:
Context: default
Debug Mode: false

Server:
Containers: 5
Running: 2
Paused: 0
Stopped: 3
Images: 11
Server Version: 20.10.6-ce
Storage Driver: btrfs
Build Version: Btrfs v4.15
Library Version: 102
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: oci runc io.containerd.runc.v2 io.containerd.runtime.v1.linux
Default Runtime: runc
Init Binary: docker-init
containerd version: 05f951a3781f4f2c1911b05e61c160e9c30eaa8e
runc version: 12644e614e25b05da6fd08a38ffa0cfe1903fdec
init version:
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 5.3.18-59.16-default
Operating System: openSUSE Leap 15.3
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 7.552GiB
Name: Proxima-Centauri
ID: M6J5:OLHQ:FXVM:M7WG:2OUA:SKGW:UCF5:DWJZ:4M7T:YA2W:6FBT:DOLG
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support

OS (e.g. from /etc/os-release): NAME="openSUSE Leap"
VERSION="15.3"
ID="opensuse-leap"
ID_LIKE="suse opensuse"
VERSION_ID="15.3"
PRETTY_NAME="openSUSE Leap 15.3"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:opensuse:leap:15.3"
BUG_REPORT_URL="https://bugs.opensuse.org"
HOME_URL="https://www.opensuse.org/"

The text was updated successfully, but these errors were encountered:

aojea · 2021-07-27T06:01:15Z

I assume this snippet was a copy paste error, is missin the latest 2 digits for the ip address

#cat /etc/resolv.conf
search default.svc.cluster.local svc.cluster.local cluster.local fritz.box
nameserver 10.96.

are you using one node or mulitple nodes in the cluster?
clusters with multiple nodes doesn't handle the reboots

faiq · 2021-08-25T00:16:58Z

Hi also running into this issue! Although I'm not sure that this is caused by a restart for me necessarily.

$  kubectl run -it --rm --restart=Never busybox1 --image=busybox sh
If you don't see a command prompt, try pressing enter.
/ # nslookup kubernetes.default
Server:		10.96.0.10
Address:	10.96.0.10:53

** server can't find kubernetes.default: NXDOMAIN

*** Can't find kubernetes.default: No answer

/ #

Here is what I get when I inspect the kind network

$ docker network inspect kind
[
    {
        "Name": "kind",
        "Id": "7d815ef0d0c4adc297aa523aa3336ba89bc6d7212373d3098f12169618c16563",
        "Created": "2021-08-24T16:41:41.258730207-07:00",
        "Scope": "local",
        "Driver": "bridge",
        "EnableIPv6": true,
        "IPAM": {
            "Driver": "default",
            "Options": {},
            "Config": [
                {
                    "Subnet": "172.18.0.0/16",
                    "Gateway": "172.18.0.1"
                },
                {
                    "Subnet": "fc00:f853:ccd:e793::/64"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {
            "1c47d1b38fe7b0b75e71c21c150aba4d5110ade54d74e2f3db45c5d15d013c59": {
                "Name": "konvoy-capi-bootstrapper-control-plane",
                "EndpointID": "4b176452133a1881380cae8b3fc55963ec0427ee809bc1b678d261f3c1711931",
                "MacAddress": "02:42:ac:12:00:02",
                "IPv4Address": "172.18.0.2/16",
                "IPv6Address": "fc00:f853:ccd:e793::2/64"
            }
        },
        "Options": {
            "com.docker.network.bridge.enable_ip_masquerade": "true",
            "com.docker.network.driver.mtu": "1454"
        },
        "Labels": {}
    }
]

$ kind get nodes --name konvoy-capi-bootstrapper
konvoy-capi-bootstrapper-control-plane

output from ip addr

$ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: enp0s31f6: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN group default qlen 1000
    link/ether 48:2a:e3:0a:7a:8c brd ff:ff:ff:ff:ff:ff
3: wlp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 30:24:32:43:a0:e9 brd ff:ff:ff:ff:ff:ff
    inet 192.168.42.76/24 brd 192.168.42.255 scope global dynamic noprefixroute wlp2s0
       valid_lft 83634sec preferred_lft 83634sec
    inet6 fe80::c3e2:7427:34c8:c265/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
25: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1400 qdisc noqueue state DOWN group default 
    link/ether 02:42:0c:bc:be:aa brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
28: br-7d815ef0d0c4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1454 qdisc noqueue state UP group default 
    link/ether 02:42:08:aa:2f:bb brd ff:ff:ff:ff:ff:ff
    inet 172.18.0.1/16 brd 172.18.255.255 scope global br-7d815ef0d0c4
       valid_lft forever preferred_lft forever
    inet6 fc00:f853:ccd:e793::1/64 scope global 
       valid_lft forever preferred_lft forever
    inet6 fe80::42:8ff:feaa:2fbb/64 scope link 
       valid_lft forever preferred_lft forever
    inet6 fe80::1/64 scope link 
       valid_lft forever preferred_lft forever
30: vethba7cc46@if29: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1454 qdisc noqueue master br-7d815ef0d0c4 state UP group default 
    link/ether 82:3a:43:df:a0:c1 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::803a:43ff:fedf:a0c1/64 scope link 
       valid_lft forever preferred_lft forever

finally logs from a coredns pod

35365->172.18.0.1:53: i/o timeout
[ERROR] plugin/errors: 2 rhel82-tester-faiq2-apiserver-1592573265.us-west-2.elb.amazonaws.com.gateway.sonic.net. AAAA: read udp 10.244.0.6:36799->172.18.0.1:53: i/o timeout
[ERROR] plugin/errors: 2 rhel82-tester-faiq2-apiserver-1592573265.us-west-2.elb.amazonaws.com.gateway.sonic.net. A: read udp 10.244.0.6:55841->172.18.0.1:53: i/o timeout
[ERROR] plugin/errors: 2 rhel82-tester-faiq2-apiserver-1592573265.us-west-2.elb.amazonaws.com.gateway.sonic.net. AAAA: read udp 10.244.0.6:38716->172.18.0.1:53: i/o timeout
[ERROR] plugin/errors: 2 rhel82-tester-faiq2-apiserver-1592573265.us-west-2.elb.amazonaws.com.gateway.sonic.net. A: read udp 10.244.0.6:51342->172.18.0.1:53: i/o timeout
[ERROR] plugin/errors: 2 rhel82-tester-faiq2-apiserver-1592573265.us-west-2.elb.amazonaws.com.gateway.sonic.net. AAAA: read udp 10.244.0.6:46009->172.18.0.1:53: i/o timeout
[ERROR] plugin/errors: 2 rhel82-tester-faiq2-apiserver-1592573265.us-west-2.elb.amazonaws.com.gateway.sonic.net. A: read udp 10.244.0.6:33070->172.18.0.1:53: i/o timeout
[ERROR] plugin/errors: 2 rhel82-tester-faiq2-apiserver-1592573265.us-west-2.elb.amazonaws.com.gateway.sonic.net. AAAA: read udp 10.244.0.6:34194->172.18.0.1:53: i/o timeout
[ERROR] plugin/errors: 2 rhel82-tester-faiq2-apiserver-1592573265.us-west-2.elb.amazonaws.com.gateway.sonic.net. A: read udp 10.244.0.6:56925->172.18.0.1:53: i/o timeout
[ERROR] plugin/errors: 2 rhel82-tester-faiq2-apiserver-1592573265.us-west-2.elb.amazonaws.com.gateway.sonic.net. AAAA: read udp 10.244.0.6:35681->172.18.0.1:53: i/o timeout
[ERROR] plugin/errors: 2 rhel82-tester-faiq2-apiserver-1592573265.us-west-2.elb.amazonaws.com.gateway.sonic.net. A: read udp 10.244.0.6:42683->172.18.0.1:53: i/o timeout
[ERROR] plugin/errors: 2 rhel82-tester-faiq2-apiserver-1592573265.us-west-2.elb.amazonaws.com.gateway.sonic.net. A: read udp 10.244.0.6:40842->172.18.0.1:53: i/o timeout

AlmogBaku · 2021-11-06T23:10:37Z

Hey, for us, the same issue happens after stopping/rebooting docker.
The same issue keeps reproducing with 2 different hosts @RomansWorks

Edit: we're running a single node setup, with the following config (copied from the website):

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
  - role: control-plane
    kubeadmConfigPatches:
      - |
        kind: InitConfiguration
        nodeRegistration:
          kubeletExtraArgs:
            node-labels: "ingress-ready=true"
    extraPortMappings:
      - containerPort: 80
        hostPort: 80
        protocol: TCP
      - containerPort: 443
        hostPort: 443
        protocol: TCP

BenTheElder · 2021-11-08T15:53:28Z

@AlmogBaku I still can't reproduce this in any of our environments. We need to know more about yours.

AlmogBaku · 2021-11-08T18:46:14Z

That usually happens after a few times I'm closing the Docker.

Both me and @RomansWorks are using macOS

alexandresgf · 2021-12-08T18:35:57Z

I have the same issue here in my dev environment... the weird thing is when I connect into the pod using bash and try nslookup the DNS works as you can see in the image below:

But when I try it into my application it can not be solved and everything just doesn't work... and there is no error returned (that is weird too)

Although, if I use the POD IP it works normally...

My stack is:

Docker 20.10.11
K8s 1.21.1 (kindest/node default, but I already tested with all others supported versions)
Kind 0.11.1 (single cluster)

NOTES:

I created the kind cluster using the script with the local registry that can be found here https://kind.sigs.k8s.io/docs/user/local-registry/

aojea · 2021-12-08T18:59:51Z

@alexandresgf please don't use screenshot, those are hard to read.

Is this problem happening after reboot or it never worked?

alexandresgf · 2021-12-10T14:31:45Z

@alexandresgf please don't use screenshot, those are hard to read.

Sorry for that!

Is this problem happening after reboot or it never worked?

At first it worked for a while, then sundenlly it happened after a reboot and the DNS never worked anymore even I removing the kind completely and doing a fresh install.

brpaz · 2022-10-17T20:37:18Z

I got a similar problem. I created a local kind cluster and it was working fine during the entire weekend, but today, when I rebooted my PC, the dns is completely down. I tried restart docker, and even manually the CoreDNS container, but doens´t fix the issue.

I got errors like this all over my containers:

 dial tcp: lookup notification-controller.flux-system.svc.cluster.local. on 10.96.0.10:53: read udp 10.244.0.3:52830->10.96.0.10:53: read: connection refused"

And it´s not only on the internal network. Even external requests are failing with the same error.

dial tcp: lookup github.com on 10.96.0.10:53: read udp 10.244.0.15:41035->10.96.0.10:53: read: connection refused'

Any idea?

ben-foxmoore · 2023-02-02T14:16:10Z

I observe the same issues when using KinD in a WSL2/Windows 11 environment. Example logs from the CoreDNS pod:

[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
E0202 14:14:20.711784       1 reflector.go:127] pkg/mod/k8s.io/client-go@v0.19.2/tools/cache/reflector.go:156: Failed to watch *v1.Namespace: failed to list *v1.Namespace: Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: connect: network is unreachable
E0202 14:14:22.917864       1 reflector.go:127] pkg/mod/k8s.io/client-go@v0.19.2/tools/cache/reflector.go:156: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: connect: network is unreachable

aojea · 2023-02-02T15:57:04Z

pkg/mod/k8s.io/client-go@v0.19.2

this is an old version, also wsl2/windows11 environments had some known issue, are you using latest version?

This bug is starting to become a placeholder, I wonder if we should close it an open more specific bugs, is not the same cluster not works after reboot in windows, that with podman, or with lima, ...

ben-foxmoore · 2023-02-02T16:30:55Z

Hi @aojea, which component are you saying is outdated?

I'm using kind 0.17.0 and I created the cluster using the command kind create cluster --image kindest/node:v1.21.14@sha256:9d9eb5fb26b4fbc0c6d95fa8c790414f9750dd583f5d7cee45d92e8c26670aa1 which is listed as a supported image in the 0.17.0 release.

I don't believe any of the WSL2 known issues are related to this? They all seem to be related to Docker Desktop behaviour.

hobyte added the kind/bug Categorizes issue or PR as related to a bug. label Jul 23, 2021

BenTheElder added the triage/needs-information Indicates an issue needs more information in order to work on it. label Jul 29, 2021

achetronic mentioned this issue Oct 26, 2022

Create Bucket error, dial not working crossplane-contrib/provider-aws#1540

Closed

stmcginnis mentioned this issue Dec 20, 2023

command add start cluster #3458

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DNS not working after reboot #2383

DNS not working after reboot #2383

hobyte commented Jul 23, 2021

aojea commented Jul 27, 2021

faiq commented Aug 25, 2021

AlmogBaku commented Nov 6, 2021 •

edited

Loading

BenTheElder commented Nov 8, 2021

AlmogBaku commented Nov 8, 2021

alexandresgf commented Dec 8, 2021 •

edited

Loading

aojea commented Dec 8, 2021

alexandresgf commented Dec 10, 2021 •

edited

Loading

brpaz commented Oct 17, 2022

ben-foxmoore commented Feb 2, 2023

aojea commented Feb 2, 2023

ben-foxmoore commented Feb 2, 2023

DNS not working after reboot #2383

DNS not working after reboot #2383

Comments

hobyte commented Jul 23, 2021

aojea commented Jul 27, 2021

faiq commented Aug 25, 2021

AlmogBaku commented Nov 6, 2021 • edited Loading

BenTheElder commented Nov 8, 2021

AlmogBaku commented Nov 8, 2021

alexandresgf commented Dec 8, 2021 • edited Loading

aojea commented Dec 8, 2021

alexandresgf commented Dec 10, 2021 • edited Loading

brpaz commented Oct 17, 2022

ben-foxmoore commented Feb 2, 2023

aojea commented Feb 2, 2023

ben-foxmoore commented Feb 2, 2023

AlmogBaku commented Nov 6, 2021 •

edited

Loading

alexandresgf commented Dec 8, 2021 •

edited

Loading

alexandresgf commented Dec 10, 2021 •

edited

Loading