i/o timeout in coredns pod #1427

knkski · 2020-07-22T20:39:16Z

Copying from #958 (comment), as this looks to be the crux of the issue. For some reason, some queries to 8.8.8.8 and 8.8.4.4 for api.jujucharms.com are failing:

[ERROR] plugin/errors: 2 2577415620770853004.8780143945028532492. HINFO: read udp 10.1.21.36:44631->8.8.4.4:53: i/o timeout
[INFO] 10.1.21.38:33385 - 46047 "AAAA IN api.jujucharms.com.localdomain. udp 48 false 512" NOERROR - 0 2.000755655s
[ERROR] plugin/errors: 2 api.jujucharms.com.localdomain. AAAA: read udp 10.1.21.36:57766->8.8.8.8:53: i/o timeout
[INFO] 10.1.21.38:36178 - 22767 "A IN api.jujucharms.com.localdomain. udp 48 false 512" NOERROR - 0 2.000677982s
[ERROR] plugin/errors: 2 api.jujucharms.com.localdomain. A: read udp 10.1.21.36:44211->8.8.8.8:53: i/o timeout
[INFO] 10.1.21.38:60455 - 56232 "AAAA IN api.jujucharms.com.localdomain. udp 48 false 512" NOERROR - 0 2.000352953s
[ERROR] plugin/errors: 2 api.jujucharms.com.localdomain. AAAA: read udp 10.1.21.36:45297->8.8.8.8:53: i/o timeout
[INFO] 10.1.21.38:50299 - 52151 "A IN api.jujucharms.com.localdomain. udp 48 false 512" NOERROR - 0 2.000257257s
[ERROR] plugin/errors: 2 api.jujucharms.com.localdomain. A: read udp 10.1.21.36:40710->8.8.4.4:53: i/o timeout
[INFO] 10.1.21.38:39253 - 3642 "AAAA IN api.jujucharms.com. udp 36 false 512" NOERROR - 0 2.000834815s
[ERROR] plugin/errors: 2 api.jujucharms.com. AAAA: read udp 10.1.21.36:38538->8.8.8.8:53: i/o timeout
[INFO] 10.1.21.38:47372 - 10457 "A IN api.jujucharms.com. udp 36 false 512" NOERROR - 0 2.000768237s
[ERROR] plugin/errors: 2 api.jujucharms.com. A: read udp 10.1.21.36:33745->8.8.4.4:53: i/o timeout
[INFO] 10.1.21.38:47852 - 7227 "AAAA IN api.jujucharms.com. udp 36 false 512" NOERROR - 0 2.000442768s
[ERROR] plugin/errors: 2 api.jujucharms.com. AAAA: read udp 10.1.21.36:42672->8.8.4.4:53: i/o timeout
[INFO] 10.1.21.38:60290 - 23521 "A IN api.jujucharms.com. udp 36 false 512" NOERROR - 0 2.000334072s
[ERROR] plugin/errors: 2 api.jujucharms.com. A: read udp 10.1.21.36:52312->8.8.8.8:53: i/o timeout

The text was updated successfully, but these errors were encountered:

ktsakalozos · 2020-07-23T09:07:08Z

@davigar15 has reported the same issue.

This is sporadic, right? Some things we could try is to use the host to resolver. forward . /etc/resolv.conf or maybe increase the cache to something longer than 30 secs [1].

Could anyone try any of the above and report back any results?

[1] https://kubernetes.io/docs/tasks/administer-cluster/dns-custom-nameservers/

bipinm · 2020-07-23T16:22:35Z

Do not think this is a sporadic/transient issue. I have tried for 3 days continuously (microk8s.enable kubeflow), sometimes at different times of day and got the same error each time. Tried to debug a little bit after N failures and then found the errors in coredns pod. Also went into one another pod and tried ping 8.8.8.8, which worked fine but pings to other public IPs/api.jujucharms.com failed.

As mentioned in my comment here, I did not face this issue on AWS EC2 instance with Ubuntu 2004 Server image. Until yesterday i was trying on Ubuntu Desktop 2004 (running in VMware Player) and today i have tested with Ubuntu 2004 Server locally (running in VMware Player) and not very surprising, Kubeflow was deployed successfully. Similar behavior is mentioned here

atamahjoubfar · 2020-07-23T23:51:46Z

I just tried microk8s enable kubeflow on another machine with Ubuntu 18.04. Same error:

ERROR cannot deploy bundle: cannot add charm "cs:~kubeflow-charmers/ambassador-89": cannot retrieve charm "cs:~kubeflow-charmers/ambassador-89": cannot get archive: Get https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/ambassador-89/archive?channel=stable: dial tcp: lookup api.jujucharms.com on 10.152.183.10:53: server misbehaving
Command '('microk8s-juju.wrapper', 'deploy', 'cs:kubeflow-195', '--channel', 'stable', '--overlay', '/tmp/tmpfdpsap90')' returned non-zero exit status 1
Failed to enable kubeflow

knkski · 2020-07-27T14:36:32Z

@bipinm: can you expand a little more on this?

Also went into one another pod and tried ping 8.8.8.8, which worked fine but pings to other public IPs/api.jujucharms.com failed.

Which pod(s) did you go into, and which other public IPs did you try?

knkski · 2020-07-27T18:36:33Z

After debugging with @davigar15, I think this issue is not actually Kubeflow-specific, and is a general networking issue that starts happening after a computer with microk8s is rebooted. @bipinm, @atamahjoubfar, can you verify that this is related to rebooting the host machine for microk8s? @davigar15 says that when he runs into this issue, reinstalling the microk8s snap fixes things for him.

atamahjoubfar · 2020-07-28T00:12:31Z

I reinstalled microk8s, and enabled kubeflow without reboot. I still get the same error message.

bipinm · 2020-07-28T09:32:41Z

Without a restart after sudo snap install microk8s --classic , Kubeflow was deployed successfully. So far I was always restarting after seeing this message

bipinm@ubuntu:~$ microk8s.status --wait-ready
Insufficient permissions to access MicroK8s.
You can either try again with sudo or add the user bipinm to the 'microk8s' group:

    sudo usermod -a -G microk8s bipinm
    sudo chown -f -R bipinm ~/.kube

The new group will be available on the user's next login.

This time instead of a re-start, ran sudo usermod ... & gnome-session-quit commands followed by rest

microk8s.status --wait-ready
microk8s.enable dns dashboard storage
microk8s.enable kubeflow

In my previous tests on Ubuntu 2004 server i followed similar restart step, but did not encounter this problem. Only seem to occur on 2004 Desktop version.

@knkski: The pod i used for ping test was nginx-ingress-microk8s-controller

Now
Without restart, i go into the pod and can ping public domain names (google.com, jujucharms.com etc)
After restart, cannot ping any public domain names but can ping their IP addresses

Test after install of microk8s + enable kubelow but without restart (from within nginx-ingress-microk8s-controller)

bash-5.0$ nslookup google.com
Server:         10.152.183.10
Address:        10.152.183.10:53

Non-authoritative answer:
Name:   google.com
Address: 142.250.67.78

Non-authoritative answer:
Name:   google.com
Address: 2404:6800:4007:807::200e

ash-5.0$ nslookup jujucharms.com
Server:         10.152.183.10
Address:        10.152.183.10:53

Non-authoritative answer:
Name:   jujucharms.com
Address: 91.189.88.181
Name:   jujucharms.com
Address: 91.189.91.45
Name:   jujucharms.com
Address: 91.189.91.44
Name:   jujucharms.com
Address: 91.189.88.180

Non-authoritative answer:
Name:   jujucharms.com
Address: 2001:67c:1562::20
Name:   jujucharms.com
Address: 2001:67c:1360:8001::2c
Name:   jujucharms.com
Address: 2001:67c:1562::1f
Name:   jujucharms.com
Address: 2001:67c:1360:8001::2b

Test after install of microk8s restart (+ failed enable kubelow)

bash-5.0$ nslookup google.com
Server:         10.152.183.10
Address:        10.152.183.10:53

;; connection timed out; no servers could be reached

bash-5.0$ nslookup api.jujucharms.com
Server:         10.152.183.10
Address:        10.152.183.10:53

;; connection timed out; no servers could be reached

ktsakalozos · 2020-07-28T11:08:02Z

@bipinm immediately after a reboot the k8s networking is not correctly setup. During that period I was getting:

bash-5.0$ nslookup jujucharms.com
nslookup: write to '10.152.183.10': Operation not permitted
;; connection timed out; no servers could be reached

Within 2 to 3 minutes the pods were reporting state Unknown and the control plane was rescheduling them getting them into Ready state again. After that point name resolution is working again.

bipinm · 2020-07-28T15:22:07Z

@ktsakalozos, i was running nslookup > 15+ minutes post reboot. Sequence of steps which always failed for me

sudo snap install microk8s --classic
sudo usermod -a -G microk8s user
Reboot
After system is up, check status with microk8s.status --wait-ready
Also confirm all pods are running
microk8s.enable dns dashboard storage
Open dashboard UI and confirm everything is fine
microk8s.enable kubeflow
ping test and nslookup from pod nginx-ingress-microk8s-controller-xxxx (this is probably 15+ minutes after reboot)

Will try this once again to confirm, i am not absolutely sure if i was rebooting after step 6

LinoBert · 2020-07-29T16:42:32Z

@ktsakalozos , same here.
I'm running microk8s on my Ubuntu 18.04 dev machine. After booting the machine none of the Pods are starting because they are downloading some files but can't resolve the domain names so can't resolve the imports(Deno).

After manually stopping and starting the cluster via microk8s stop / microk8s start DNS resolution works perfectly fine, so something with the automatic startup process seems not to work as expected.

atamahjoubfar · 2020-07-30T03:26:47Z

@ktsakalozos not rebooting the machine or microk8s stop/start did not resolve the issue for me. I have confirmed that juju can deploy apps on the host machine, so it should not be a networking issue of the host:

+ microk8s-juju.wrapper --debug add-model kubeflow microk8s

02:01:45 INFO  juju.cmd supercommand.go:83 running juju [2.7.6 4da406fb326d7a1255f97a7391056641ee86715b gc go1.12.17]
02:01:45 DEBUG juju.cmd supercommand.go:84   args: []string{"/snap/microk8s/1551/bin/juju", "--debug", "add-model", "kubeflow", "microk8s"}
02:01:45 INFO  juju.juju api.go:67 connecting to API addresses: [10.152.183.24:17070]
02:01:45 DEBUG juju.api apiclient.go:1092 successfully dialed "wss://10.152.183.24:17070/api"
02:01:45 INFO  juju.api apiclient.go:624 connection established to "wss://10.152.183.24:17070/api"
02:01:45 INFO  cmd authkeys.go:114 Adding contents of "/var/snap/microk8s/1551/juju/share/juju/ssh/juju_id_rsa.pub" to authorized-keys
02:01:45 INFO  cmd addmodel.go:301 Added 'kubeflow' model on microk8s/localhost with credential 'microk8s' for user 'admin'
02:01:45 DEBUG juju.api monitor.go:35 RPC connection died
02:01:45 INFO  cmd supercommand.go:525 command finished

+ microk8s-juju.wrapper --debug deploy cs:kubeflow-195 --channel stable --overlay /tmp/tmpt7h9ykaa
Kubeflow could not be enabled:
02:01:45 INFO  juju.cmd supercommand.go:83 running juju [2.7.6 4da406fb326d7a1255f97a7391056641ee86715b gc go1.12.17]
02:01:45 DEBUG juju.cmd supercommand.go:84   args: []string{"/snap/microk8s/1551/bin/juju", "--debug", "deploy", "cs:kubeflow-195", "--channel", "stable", "--overlay", "/tmp/tmpt7h9ykaa"}
02:01:45 INFO  juju.juju api.go:67 connecting to API addresses: [10.152.183.24:17070]
02:01:45 DEBUG juju.api apiclient.go:1092 successfully dialed "wss://10.152.183.24:17070/model/644c781a-2e54-4ea7-8f5a-13448c037141/api"
02:01:45 INFO  juju.api apiclient.go:624 connection established to "wss://10.152.183.24:17070/model/644c781a-2e54-4ea7-8f5a-13448c037141/api"
02:01:46 INFO  juju.juju api.go:67 connecting to API addresses: [10.152.183.24:17070]
02:01:46 DEBUG juju.api apiclient.go:1092 successfully dialed "wss://10.152.183.24:17070/api"
02:01:46 INFO  juju.api apiclient.go:624 connection established to "wss://10.152.183.24:17070/api"
02:01:46 DEBUG juju.cmd.juju.application deploy.go:1442 cannot interpret as local charm: file does not exist
02:01:46 DEBUG juju.cmd.juju.application deploy.go:1294 cannot interpret as a redeployment of a local charm from the controller
02:01:46 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/kubeflow-195/meta/any?channel=stable&include=id&include=supported-series&include=published {
02:01:47 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:47 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/bundle/kubeflow-195/archive {
02:01:47 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:47 INFO  cmd deploy.go:1546 Located bundle "cs:bundle/kubeflow-195"
02:01:47 DEBUG juju.cmd.juju.application bundle.go:312 model: &bundlechanges.Model{
    Applications: {
    },
    Machines: {
    },
    Relations:        nil,
    ConstraintsEqual: func(string, string) bool {...},
    Sequence:         {},
    sequence:         {},
    MachineMap:       {},
    logger:           nil,
}
02:01:47 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/ambassador-89
02:01:47 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/ambassador-89/meta/any?include=id&include=supported-series&include=published {
02:01:47 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:47 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/argo-controller-173
02:01:47 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/argo-controller-173/meta/any?include=id&include=supported-series&include=published {
02:01:47 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:47 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/argo-ui-89
02:01:47 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/argo-ui-89/meta/any?include=id&include=supported-series&include=published {
02:01:48 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:48 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/dex-auth-32
02:01:48 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/dex-auth-32/meta/any?include=id&include=supported-series&include=published {
02:01:48 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:48 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/jupyter-controller-187
02:01:48 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/jupyter-controller-187/meta/any?include=id&include=supported-series&include=published {
02:01:48 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:48 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/jupyter-web-93
02:01:48 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/jupyter-web-93/meta/any?include=id&include=supported-series&include=published {
02:01:48 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:48 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/katib-controller-87
02:01:48 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/katib-controller-87/meta/any?include=id&include=supported-series&include=published {
02:01:48 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:48 INFO  cmd bundle.go:370 Resolving charm: cs:~charmed-osm/mariadb-k8s
02:01:48 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~charmed-osm/mariadb-k8s/meta/any?include=id&include=supported-series&include=published {
02:01:48 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:48 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/katib-manager-86
02:01:48 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/katib-manager-86/meta/any?include=id&include=supported-series&include=published {
02:01:48 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:48 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/katib-ui-82
02:01:48 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/katib-ui-82/meta/any?include=id&include=supported-series&include=published {
02:01:48 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:48 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/kubeflow-dashboard-47
02:01:48 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/kubeflow-dashboard-47/meta/any?include=id&include=supported-series&include=published {
02:01:48 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:48 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/kubeflow-profiles-53
02:01:48 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/kubeflow-profiles-53/meta/any?include=id&include=supported-series&include=published {
02:01:48 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:48 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/metacontroller-79
02:01:48 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/metacontroller-79/meta/any?include=id&include=supported-series&include=published {
02:01:49 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:49 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/metadata-api-42
02:01:49 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/metadata-api-42/meta/any?include=id&include=supported-series&include=published {
02:01:49 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:49 INFO  cmd bundle.go:370 Resolving charm: cs:~charmed-osm/mariadb-k8s
02:01:49 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~charmed-osm/mariadb-k8s/meta/any?include=id&include=supported-series&include=published {
02:01:49 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:49 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/metadata-envoy-25
02:01:49 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/metadata-envoy-25/meta/any?include=id&include=supported-series&include=published {
02:01:49 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:49 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/metadata-grpc-25
02:01:49 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/metadata-grpc-25/meta/any?include=id&include=supported-series&include=published {
02:01:49 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:49 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/metadata-ui-45
02:01:49 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/metadata-ui-45/meta/any?include=id&include=supported-series&include=published {
02:01:49 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:49 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/minio-89
02:01:49 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/minio-89/meta/any?include=id&include=supported-series&include=published {
02:01:49 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:49 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/modeldb-backend-86
02:01:49 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/modeldb-backend-86/meta/any?include=id&include=supported-series&include=published {
02:01:49 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:49 INFO  cmd bundle.go:370 Resolving charm: cs:~charmed-osm/mariadb-k8s
02:01:49 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~charmed-osm/mariadb-k8s/meta/any?include=id&include=supported-series&include=published {
02:01:49 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:49 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/modeldb-store-80
02:01:49 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/modeldb-store-80/meta/any?include=id&include=supported-series&include=published {
02:01:50 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:50 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/modeldb-ui-80
02:01:50 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/modeldb-ui-80/meta/any?include=id&include=supported-series&include=published {
02:01:50 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:50 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/oidc-gatekeeper-30
02:01:50 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/oidc-gatekeeper-30/meta/any?include=id&include=supported-series&include=published {
02:01:50 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:50 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/pipelines-api-93
02:01:50 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/pipelines-api-93/meta/any?include=id&include=supported-series&include=published {
02:01:50 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:50 INFO  cmd bundle.go:370 Resolving charm: cs:~charmed-osm/mariadb-k8s
02:01:50 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~charmed-osm/mariadb-k8s/meta/any?include=id&include=supported-series&include=published {
02:01:50 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:50 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/pipelines-persistence-178
02:01:50 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/pipelines-persistence-178/meta/any?include=id&include=supported-series&include=published {
02:01:50 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:50 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/pipelines-scheduledworkflow-174
02:01:50 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/pipelines-scheduledworkflow-174/meta/any?include=id&include=supported-series&include=published {
02:01:50 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:50 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/pipelines-ui-89
02:01:50 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/pipelines-ui-89/meta/any?include=id&include=supported-series&include=published {
02:01:50 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:50 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/pipelines-viewer-114
02:01:50 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/pipelines-viewer-114/meta/any?include=id&include=supported-series&include=published {
02:01:50 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:50 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/pipelines-visualization-24
02:01:50 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/pipelines-visualization-24/meta/any?include=id&include=supported-series&include=published {
02:01:51 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:51 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/pytorch-operator-174
02:01:51 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/pytorch-operator-174/meta/any?include=id&include=supported-series&include=published {
02:01:51 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:51 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/seldon-core-27
02:01:51 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/seldon-core-27/meta/any?include=id&include=supported-series&include=published {
02:01:51 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:51 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/tf-job-operator-170
02:01:51 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/tf-job-operator-170/meta/any?include=id&include=supported-series&include=published {
02:01:51 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:51 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/ambassador-89/meta/any?include=id&include=supported-series&include=published {
02:01:51 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:59 DEBUG juju.api monitor.go:35 RPC connection died
ERROR cannot deploy bundle: cannot add charm "cs:~kubeflow-charmers/ambassador-89": cannot retrieve charm "cs:~kubeflow-charmers/ambassador-89": cannot get archive: Get https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/ambassador-89/archive?channel=stable: dial tcp: lookup api.jujucharms.com on 10.152.183.10:53: server misbehaving
02:01:59 DEBUG cmd supercommand.go:519 error stack: 
cannot retrieve charm "cs:~kubeflow-charmers/ambassador-89": cannot get archive: Get https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/ambassador-89/archive?channel=stable: dial tcp: lookup api.jujucharms.com on 10.152.183.10:53: server misbehaving
/workspace/_build/src/github.com/juju/juju/rpc/client.go:178: 
/workspace/_build/src/github.com/juju/juju/api/apiclient.go:1187: 
/workspace/_build/src/github.com/juju/juju/api/client.go:459: 
/workspace/_build/src/github.com/juju/juju/cmd/juju/application/store.go:68: 
/workspace/_build/src/github.com/juju/juju/cmd/juju/application/bundle.go:549: cannot add charm "cs:~kubeflow-charmers/ambassador-89"
/workspace/_build/src/github.com/juju/juju/cmd/juju/application/bundle.go:481: 
/workspace/_build/src/github.com/juju/juju/cmd/juju/application/bundle.go:165: 
/workspace/_build/src/github.com/juju/juju/cmd/juju/application/deploy.go:960: cannot deploy bundle
/workspace/_build/src/github.com/juju/juju/cmd/juju/application/deploy.go:1548: 

Command '('microk8s-juju.wrapper', '--debug', 'deploy', 'cs:kubeflow-195', '--channel', 'stable', '--overlay', '/tmp/tmpt7h9ykaa')' returned non-zero exit status 1
Failed to enable kubeflow

dkolbly · 2020-09-17T18:20:46Z

I am having this issue as well; I noticed it after a reboot. Restarting (disable/enable) dns did not fix for me either. I also tried switching off of 8.8.x.x but that did not help either.

Stopping all of microk8s (microk8s stop / microk8s start) did get things back to a working state 👍

root@jupiter:~# microk8s.kubectl logs coredns-86f78bb79c-9k4f4 -n kube-system
.:53
[INFO] plugin/reload: Running configuration MD5 = be0f52d3c13480652e0c73672f2fa263
CoreDNS-1.6.6
linux/amd64, go1.13.5, 6a7a75e
[INFO] 127.0.0.1:46538 - 47659 "HINFO IN 8681453584971852102.1290459051933204281. udp 57 false 512" NOERROR - 0 6.002244698s
[ERROR] plugin/errors: 2 8681453584971852102.1290459051933204281. HINFO: read udp 10.1.71.251:35934->8.8.8.8:53: i/o timeout
[INFO] 127.0.0.1:42170 - 16039 "HINFO IN 8681453584971852102.1290459051933204281. udp 57 false 512" NOERROR - 0 6.001102944s
[ERROR] plugin/errors: 2 8681453584971852102.1290459051933204281. HINFO: read udp 10.1.71.251:44386->8.8.4.4:53: i/o timeout
[INFO] 127.0.0.1:33814 - 33621 "HINFO IN 8681453584971852102.1290459051933204281. udp 57 false 512" NOERROR - 0 2.000673367s
[ERROR] plugin/errors: 2 8681453584971852102.1290459051933204281. HINFO: read udp 10.1.71.251:52975->8.8.4.4:53: i/o timeout
[INFO] 127.0.0.1:58732 - 22568 "HINFO IN 8681453584971852102.1290459051933204281. udp 57 false 512" NOERROR - 0 6.001175924s
[ERROR] plugin/errors: 2 8681453584971852102.1290459051933204281. HINFO: read udp 10.1.71.251:47671->8.8.8.8:53: i/o timeout
[INFO] 127.0.0.1:40808 - 32260 "HINFO IN 8681453584971852102.1290459051933204281. udp 57 false 512" NOERROR - 0 2.000327329s
[ERROR] plugin/errors: 2 8681453584971852102.1290459051933204281. HINFO: read udp 10.1.71.251:44845->8.8.8.8:53: i/o timeout
[INFO] 127.0.0.1:60227 - 27681 "HINFO IN 8681453584971852102.1290459051933204281. udp 57 false 512" NOERROR - 0 2.000475429s
[ERROR] plugin/errors: 2 8681453584971852102.1290459051933204281. HINFO: read udp 10.1.71.251:38552->8.8.4.4:53: i/o timeout
[INFO] 127.0.0.1:53987 - 2865 "HINFO IN 8681453584971852102.1290459051933204281. udp 57 false 512" NOERROR - 0 2.000480646s
[ERROR] plugin/errors: 2 8681453584971852102.1290459051933204281. HINFO: read udp 10.1.71.251:54745->8.8.8.8:53: i/o timeout
[INFO] 127.0.0.1:53931 - 13702 "HINFO IN 8681453584971852102.1290459051933204281. udp 57 false 512" NOERROR - 0 2.000535349s
[ERROR] plugin/errors: 2 8681453584971852102.1290459051933204281. HINFO: read udp 10.1.71.251:33555->8.8.8.8:53: i/o timeout
[INFO] 127.0.0.1:32825 - 40340 "HINFO IN 8681453584971852102.1290459051933204281. udp 57 false 512" NOERROR - 0 2.000450614s
[ERROR] plugin/errors: 2 8681453584971852102.1290459051933204281. HINFO: read udp 10.1.71.251:39063->8.8.8.8:53: i/o timeout
[INFO] 127.0.0.1:37796 - 25979 "HINFO IN 8681453584971852102.1290459051933204281. udp 57 false 512" NOERROR - 0 2.000400431s
[ERROR] plugin/errors: 2 8681453584971852102.1290459051933204281. HINFO: read udp 10.1.71.251:59722->8.8.4.4:53: i/o timeout
[INFO] 10.1.71.250:53768 - 48206 "AAAA IN www.googleapis.com. udp 36 false 512" NOERROR - 0 2.000422322s
[ERROR] plugin/errors: 2 www.googleapis.com. AAAA: read udp 10.1.71.251:40001->8.8.8.8:53: i/o timeout
[INFO] 10.1.71.250:38399 - 7349 "A IN www.googleapis.com. udp 36 false 512" NOERROR - 0 2.000599888s
[ERROR] plugin/errors: 2 www.googleapis.com. A: read udp 10.1.71.251:36263->8.8.8.8:53: i/o timeout
[INFO] 10.1.71.250:43097 - 40033 "AAAA IN www.googleapis.com. udp 36 false 512" NOERROR - 0 2.000564206s
[ERROR] plugin/errors: 2 www.googleapis.com. AAAA: read udp 10.1.71.251:55784->8.8.8.8:53: i/o timeout
[INFO] 10.1.71.250:60260 - 65419 "A IN www.googleapis.com. udp 36 false 512" NOERROR - 0 2.000480316s
[ERROR] plugin/errors: 2 www.googleapis.com. A: read udp 10.1.71.251:57656->8.8.4.4:53: i/o timeout

The symptom for me is that other services running the in cluster get errors like dial tcp: lookup www.googleapis.com on 10.152.183.10:53: server misbehaving

ktsakalozos · 2020-09-18T13:26:56Z

Hi @dkolbly could you attach the tarball produced by microk8s.inspect?

There is also this page that may help in debugging DNS resolution issues: https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/

dkolbly · 2020-09-18T20:19:24Z

Thanks @ktsakalozos I was not aware of that debugging page, and I didn't think to grab an inspect while it was broken, but here is the current state of the system in case it helps.
inspection-report-20200918_151701.tar.gz

FWIW, I'm going to need to power cycle the system this weekend to put it on a UPS so I'll keep an eye for a recurrence of the problem.

exi · 2020-10-23T22:58:30Z

Any updates on this? I have run into this issue multiple times as well.
Last time, i clean uninstall/reinstall of microk8s fixed it.
This time i reverted from a ha cluster to a non-ha cluster and I have the same issue.

In my case coredns cannot even talk to the master on the same machine:
E1023 22:52:10.292560 1 reflector.go:125] pkg/mod/k8s.io/client-go@v0.0.0-20190620085101-78d2af792bab/tools/cache/reflector.go:98: Failed to list *v1.Namespace: Get https://10.152.183.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.152.183.1:443: i/o timeout

knkski · 2020-11-20T19:56:25Z

I believe this issue is fixed in #1635, which introduces handling around the calico networking. If anybody wants to try it out, it'll be available via latest/edge as soon as CD is done pushing that out, otherwise that fix will come in 1.20.

kingman · 2020-11-20T23:47:34Z

I'm currently getting in coredns pod with the latest/edge, could my issue be related to this one?

  Warning  FailedCreatePodSandBox  2m20s  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "7292c565f6c5da128168e118eac02eb869b47a9191ec320c923153e2dcd41ef6": error getting ClusterInformation: Get https://[10.152.183.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.152.183.1:443: i/o timeout
  Warning  FailedCreatePodSandBox  95s    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "e7a30d0d8b3d09b4a25957ef267ea2dd548b70552f42c043aa63d3f7cf9172ea": error getting ClusterInformation: Get https://[10.152.183.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.152.183.1:443: i/o timeout
  Warning  FailedCreatePodSandBox  54s    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "dd29dc68affcdcb012860dd9fd7a620c238e2d850c18dbe1834a6a0970c1e5e6": error getting ClusterInformation: Get https://[10.152.183.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.152.183.1:443: i/o timeout
  Warning  FailedCreatePodSandBox  12s    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "9deb3e6c4c955202232dbecf389987c479a0e8605db47cdb04b5053b9a9b5a75": error getting ClusterInformation: Get https://[10.152.183.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.152.183.1:443: i/o timeout

brendanmckenzie · 2021-05-31T05:47:15Z

I'm also experiencing this issue on Ubuntu 20.10 (GNU/Linux 5.8.0-26-generic x86_64)

I run:

sudo snap install microk8s --classic
microk8s enable dns

Wait for it to do its thing, then:

$ mkctl --namespace kube-system logs coredns-7f9c69c78c-pkmjt

And it's full of:

[ERROR] plugin/errors: 2 5968981741059125262.7191380029860206540. HINFO: read udp 10.1.89.66:60257->8.8.8.8:53: i/o timeout
[INFO] 127.0.0.1:45683 - 51409 "HINFO IN 5968981741059125262.7191380029860206540. udp 57 false 512" NOERROR - 0 2.000492982s
[ERROR] plugin/errors: 2 5968981741059125262.7191380029860206540. HINFO: read udp 10.1.89.66:45638->8.8.8.8:53: i/o timeout
[INFO] 127.0.0.1:47762 - 48500 "HINFO IN 5968981741059125262.7191380029860206540. udp 57 false 512" NOERROR - 0 2.000423213s
[ERROR] plugin/errors: 2 5968981741059125262.7191380029860206540. HINFO: read udp 10.1.89.66:39596->8.8.4.4:53: i/o timeout
[INFO] 127.0.0.1:60306 - 65170 "HINFO IN 5968981741059125262.7191380029860206540. udp 57 false 512" NOERROR - 0 2.000368166s
[ERROR] plugin/errors: 2 5968981741059125262.7191380029860206540. HINFO: read udp 10.1.89.66:33063->8.8.8.8:53: i/o timeout

microk8s inspect doesn't show any issue, iptables is configured to allow forwarding.

I've tried both v1.21 and v1.22-alpha.1 and the issue is present in both.

ktsakalozos · 2021-05-31T12:49:24Z

@brendanmckenzie I feel this might be the dns pod not being able to reach 8.8.8.8. Did you try setting a different forward dns as describe in [1]?

[1] https://discuss.kubernetes.io/t/add-on-dns/11287

brendanmckenzie · 2021-05-31T21:48:10Z

The issue is present no matter what forwarding DNS server I use.

Additionally - other pods are able to ping 8.8.8.8 (so is the host machine).

$ mkctl run test --image=alpine -ti -- sh
If you don't see a command prompt, try pressing enter.
/ # ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
64 bytes from 8.8.8.8: seq=0 ttl=111 time=1.079 ms
64 bytes from 8.8.8.8: seq=1 ttl=111 time=1.216 ms
^C
--- 8.8.8.8 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 1.079/1.147/1.216 ms
/ # nslookup dns.google
Server:		10.152.183.10
Address:	10.152.183.10:53

;; connection timed out; no servers could be reached

And the subsequent logs from coredns -

[ERROR] plugin/errors: 2 dns.google. AAAA: read udp 10.1.89.66:45472->8.8.4.4:53: i/o timeout
[ERROR] plugin/errors: 2 dns.google. A: read udp 10.1.89.66:57037->8.8.4.4:53: i/o timeout

brendanmckenzie · 2021-06-02T00:31:21Z

🤦‍♂️ for some reason, port 53 outbound requests from my server were being blocked. I switched to using my hosting provider's DNS and now coredns is working as expected.

mohammedi-haroune · 2022-03-25T10:28:44Z

I'm facing the exact same issue, anyone come up with a solution to this ?

sdarvell · 2022-03-28T14:06:06Z

I fixed this in my setup by changing the CIDR as detailed in the doco below. It appears there's network and DNS resolution issues when your hosts DNS / local network is within the default microk8s pod subnet of 10.1.0.0/16.

https://microk8s.io/docs/change-cidr

SidMorad · 2022-06-19T07:46:23Z

I did face this issue in following environment:

Ubuntu 20.04
Microk8s channel 1.24/stable

and upgrading to Ubuntu 22.04 solves the issue. I hope this be useful for fixing this issue.

amandahla · 2023-03-13T14:41:05Z

I did face this issue after rebooting my machine in following environment:

Ubuntu 22.10
MicroK8s v1.26.1 revision 4595

Fixed by what was already suggested here: microk8s stop/start

akzov · 2023-07-14T06:24:22Z

Didn't help

logici · 2023-09-12T07:55:44Z

I also face this issue in following environment:

Ubuntu 20.04
Microk8s channel 1.24/stable

stale · 2024-08-07T16:05:37Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

knkski mentioned this issue Jul 22, 2020

Failed to enable kubeflow #958

Closed

knkski mentioned this issue Jul 27, 2020

Microk8s.Kubeflow dashboard not accessible after EC2 instance stopped + re-started #1315

Closed

vadasambar mentioned this issue Nov 16, 2022

Fix core-dns issue for my local minikube cluster vadafoss/daily-updates#3

Closed

stale bot added the inactive label Aug 7, 2024

stale bot closed this as completed Sep 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

i/o timeout in coredns pod #1427

i/o timeout in coredns pod #1427

knkski commented Jul 22, 2020

ktsakalozos commented Jul 23, 2020 •

edited

Loading

bipinm commented Jul 23, 2020 •

edited

Loading

atamahjoubfar commented Jul 23, 2020 •

edited

Loading

knkski commented Jul 27, 2020

knkski commented Jul 27, 2020

atamahjoubfar commented Jul 28, 2020

bipinm commented Jul 28, 2020

ktsakalozos commented Jul 28, 2020

bipinm commented Jul 28, 2020

LinoBert commented Jul 29, 2020 •

edited

Loading

atamahjoubfar commented Jul 30, 2020 •

edited

Loading

dkolbly commented Sep 17, 2020 •

edited

Loading

ktsakalozos commented Sep 18, 2020

dkolbly commented Sep 18, 2020 •

edited

Loading

exi commented Oct 23, 2020

knkski commented Nov 20, 2020

kingman commented Nov 20, 2020 •

edited

Loading

brendanmckenzie commented May 31, 2021 •

edited

Loading

ktsakalozos commented May 31, 2021

brendanmckenzie commented May 31, 2021

brendanmckenzie commented Jun 2, 2021

mohammedi-haroune commented Mar 25, 2022 •

edited

Loading

sdarvell commented Mar 28, 2022

SidMorad commented Jun 19, 2022

amandahla commented Mar 13, 2023 •

edited

Loading

akzov commented Jul 14, 2023

logici commented Sep 12, 2023

stale bot commented Aug 7, 2024

i/o timeout in coredns pod #1427

i/o timeout in coredns pod #1427

Comments

knkski commented Jul 22, 2020

ktsakalozos commented Jul 23, 2020 • edited Loading

bipinm commented Jul 23, 2020 • edited Loading

atamahjoubfar commented Jul 23, 2020 • edited Loading

knkski commented Jul 27, 2020

knkski commented Jul 27, 2020

atamahjoubfar commented Jul 28, 2020

bipinm commented Jul 28, 2020

ktsakalozos commented Jul 28, 2020

bipinm commented Jul 28, 2020

LinoBert commented Jul 29, 2020 • edited Loading

atamahjoubfar commented Jul 30, 2020 • edited Loading

dkolbly commented Sep 17, 2020 • edited Loading

ktsakalozos commented Sep 18, 2020

dkolbly commented Sep 18, 2020 • edited Loading

exi commented Oct 23, 2020

knkski commented Nov 20, 2020

kingman commented Nov 20, 2020 • edited Loading

brendanmckenzie commented May 31, 2021 • edited Loading

ktsakalozos commented May 31, 2021

brendanmckenzie commented May 31, 2021

brendanmckenzie commented Jun 2, 2021

mohammedi-haroune commented Mar 25, 2022 • edited Loading

sdarvell commented Mar 28, 2022

SidMorad commented Jun 19, 2022

amandahla commented Mar 13, 2023 • edited Loading

akzov commented Jul 14, 2023

logici commented Sep 12, 2023

stale bot commented Aug 7, 2024

ktsakalozos commented Jul 23, 2020 •

edited

Loading

bipinm commented Jul 23, 2020 •

edited

Loading

atamahjoubfar commented Jul 23, 2020 •

edited

Loading

LinoBert commented Jul 29, 2020 •

edited

Loading

atamahjoubfar commented Jul 30, 2020 •

edited

Loading

dkolbly commented Sep 17, 2020 •

edited

Loading

dkolbly commented Sep 18, 2020 •

edited

Loading

kingman commented Nov 20, 2020 •

edited

Loading

brendanmckenzie commented May 31, 2021 •

edited

Loading

mohammedi-haroune commented Mar 25, 2022 •

edited

Loading

amandahla commented Mar 13, 2023 •

edited

Loading