Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Microk8s.Kubeflow dashboard not accessible after EC2 instance stopped + re-started #1315

Closed
rui-vas opened this issue Jun 15, 2020 · 3 comments
Assignees

Comments

@rui-vas
Copy link

rui-vas commented Jun 15, 2020

This error occurred by:

  1. EC2 instance running microk8s + kubeflow with no issues
  2. Stopped EC2 instance from AWS console (without disabling kubeflow)
  3. Started instance again

All services seem to reboot fine:

$ microk8s.kubectl get all --all-namespaces
NAMESPACE         NAME                                                  READY   STATUS    RESTARTS   AGE
admin             pod/kf-demo-0                                         1/1     Running   1          7d1h
admin             pod/rui-nb-0                                          1/1     Running   1          7d11h
controller-uk8s   pod/controller-0                                      2/2     Running   1          34h
ingress           pod/nginx-ingress-microk8s-controller-rxgsd           1/1     Running   1          7d12h
kube-system       pod/coredns-588fd544bf-m82wj                          1/1     Running   1          7d12h
kube-system       pod/dashboard-metrics-scraper-db65b9c6f-vfblw         1/1     Running   1          7d12h
kube-system       pod/heapster-v1.5.2-58fdbb6f4d-5vg6g                  4/4     Running   4          7d12h
kube-system       pod/hostpath-provisioner-75fdc8fccd-7nrzf             1/1     Running   1          7d12h
kube-system       pod/kubernetes-dashboard-67765b55f5-qb96c             1/1     Running   1          7d12h
kube-system       pod/monitoring-influxdb-grafana-v4-6dc675bf8c-nlfs5   2/2     Running   2          7d12h
kubeflow          pod/ambassador-ddc587cfc-5g2mm                        1/1     Running   0          34h
kubeflow          pod/argo-controller-768d775887-qfh9q                  1/1     Running   0          34h
kubeflow          pod/argo-ui-8df7d5959-7vwfj                           1/1     Running   0          34h
kubeflow          pod/cert-manager-webhook-operator-0                   1/1     Running   0          34h
kubeflow          pod/dex-auth-747545cb7d-2nsfw                         1/1     Running   0          34h
kubeflow          pod/jupyter-web-557bbbb54d-sr4s7                      1/1     Running   0          34h
kubeflow          pod/katib-db-0                                        1/1     Running   0          34h
kubeflow          pod/katib-manager-844d6794f9-cqpfq                    1/1     Running   0          34h
kubeflow          pod/katib-ui-65756df8c4-4jc2f                         1/1     Running   0          34h
kubeflow          pod/kubeflow-dashboard-7b77777dc8-q5fpj               1/1     Running   0          34h
kubeflow          pod/kubeflow-profiles-7b96c8cd8d-bm2ld                2/2     Running   0          34h
kubeflow          pod/metacontroller-d7d7d475c-zl22n                    1/1     Running   0          34h
kubeflow          pod/metadata-api-698dcf77d5-64znc                     1/1     Running   0          34h
kubeflow          pod/metadata-db-0                                     1/1     Running   0          34h
kubeflow          pod/metadata-envoy-85d94685b8-gc58n                   1/1     Running   0          34h
kubeflow          pod/metadata-grpc-667777bd-792pq                      1/1     Running   0          34h
kubeflow          pod/metadata-ui-644b5d8667-wwdtm                      1/1     Running   0          34h
kubeflow          pod/minio-0                                           1/1     Running   0          34h
kubeflow          pod/modeldb-backend-7bb66bf5b8-kxz5w                  2/2     Running   0          34h
kubeflow          pod/modeldb-db-0                                      1/1     Running   0          34h
kubeflow          pod/modeldb-store-fc777db77-7g8l4                     1/1     Running   0          34h
kubeflow          pod/modeldb-ui-8687cc7847-x8tl9                       1/1     Running   0          34h
kubeflow          pod/oidc-gatekeeper-54d94b78d5-mznhn                  1/1     Running   0          34h
kubeflow          pod/pipelines-api-869d8c5966-4kvlq                    1/1     Running   0          34h
kubeflow          pod/pipelines-db-0                                    1/1     Running   0          34h
kubeflow          pod/pipelines-persistence-7888d8649b-kf9f4            1/1     Running   1          34h
kubeflow          pod/pipelines-scheduledworkflow-d559c6b6-nwrbg        1/1     Running   0          34h
kubeflow          pod/pipelines-ui-9fd8df454-qkpzx                      1/1     Running   0          34h
kubeflow          pod/pipelines-viewer-5566947b98-zjwbr                 1/1     Running   0          34h
kubeflow          pod/pipelines-visualization-59b8dddc4f-l5n5p          1/1     Running   0          34h
kubeflow          pod/pytorch-operator-74b47d68f4-4p5sl                 1/1     Running   0          34h
kubeflow          pod/seldon-core-54d7fdb449-spwb5                      1/1     Running   0          34h
kubeflow          pod/tf-job-operator-5d6d8dd568-mgw2q                  1/1     Running   0          34h
metallb-system    pod/controller-5f98465b6b-cqhc4                       1/1     Running   1          7d12h
metallb-system    pod/speaker-jg52n                                     1/1     Running   1          7d12h

NAMESPACE         NAME                                           TYPE           CLUSTER-IP       EXTERNAL-IP    PORT(S)                  AGE
admin             service/kf-demo                                ClusterIP      10.152.183.188   <none>         80/TCP                   7d1h
admin             service/rui-nb                                 ClusterIP      10.152.183.221   <none>         80/TCP                   7d11h
controller-uk8s   service/controller-service                     ClusterIP      10.152.183.75    <none>         17070/TCP                34h
default           service/kubernetes                             ClusterIP      10.152.183.1     <none>         443/TCP                  7d12h
kube-system       service/dashboard-metrics-scraper              ClusterIP      10.152.183.225   <none>         8000/TCP                 7d12h
kube-system       service/heapster                               ClusterIP      10.152.183.62    <none>         80/TCP                   7d12h
kube-system       service/kube-dns                               ClusterIP      10.152.183.10    <none>         53/UDP,53/TCP,9153/TCP   7d12h
kube-system       service/kubernetes-dashboard                   ClusterIP      10.152.183.160   <none>         443/TCP                  7d12h
kube-system       service/monitoring-grafana                     ClusterIP      10.152.183.74    <none>         80/TCP                   7d12h
kube-system       service/monitoring-influxdb                    ClusterIP      10.152.183.109   <none>         8083/TCP,8086/TCP        7d12h
kubeflow          service/ambassador                             LoadBalancer   10.152.183.61    10.64.140.43   80:31433/TCP             34h
kubeflow          service/ambassador-operator                    ClusterIP      10.152.183.121   <none>         30666/TCP                34h
kubeflow          service/argo-controller-operator               ClusterIP      10.152.183.166   <none>         30666/TCP                34h
kubeflow          service/argo-ui                                ClusterIP      10.152.183.228   <none>         8001/TCP                 34h
kubeflow          service/argo-ui-operator                       ClusterIP      10.152.183.184   <none>         30666/TCP                34h
kubeflow          service/cert-manager-controller-operator       ClusterIP      10.152.183.190   <none>         30666/TCP                34h
kubeflow          service/cert-manager-webhook-operator          ClusterIP      10.152.183.60    <none>         30666/TCP                34h
kubeflow          service/dex-auth                               ClusterIP      10.152.183.69    <none>         5556/TCP                 34h
kubeflow          service/dex-auth-operator                      ClusterIP      10.152.183.72    <none>         30666/TCP                34h
kubeflow          service/jupyter-controller-operator            ClusterIP      10.152.183.12    <none>         30666/TCP                34h
kubeflow          service/jupyter-web                            ClusterIP      10.152.183.24    <none>         5000/TCP                 34h
kubeflow          service/jupyter-web-operator                   ClusterIP      10.152.183.180   <none>         30666/TCP                34h
kubeflow          service/katib-controller                       ClusterIP      10.152.183.138   <none>         443/TCP                  34h
kubeflow          service/katib-controller-operator              ClusterIP      10.152.183.102   <none>         30666/TCP                34h
kubeflow          service/katib-db                               ClusterIP      10.152.183.231   <none>         3306/TCP                 34h
kubeflow          service/katib-db-endpoints                     ClusterIP      None             <none>         <none>                   34h
kubeflow          service/katib-db-operator                      ClusterIP      10.152.183.117   <none>         30666/TCP                34h
kubeflow          service/katib-manager                          ClusterIP      10.152.183.151   <none>         6789/TCP                 34h
kubeflow          service/katib-manager-operator                 ClusterIP      10.152.183.76    <none>         30666/TCP                34h
kubeflow          service/katib-ui                               ClusterIP      10.152.183.16    <none>         8000/TCP                 34h
kubeflow          service/katib-ui-operator                      ClusterIP      10.152.183.64    <none>         30666/TCP                34h
kubeflow          service/kubeflow-dashboard                     ClusterIP      10.152.183.191   <none>         8082/TCP                 34h
kubeflow          service/kubeflow-dashboard-operator            ClusterIP      10.152.183.52    <none>         30666/TCP                34h
kubeflow          service/kubeflow-profiles                      ClusterIP      10.152.183.71    <none>         8081/TCP                 34h
kubeflow          service/kubeflow-profiles-operator             ClusterIP      10.152.183.152   <none>         30666/TCP                34h
kubeflow          service/metacontroller                         ClusterIP      10.152.183.45    <none>         9999/TCP                 34h
kubeflow          service/metacontroller-operator                ClusterIP      10.152.183.237   <none>         30666/TCP                34h
kubeflow          service/metadata-api                           ClusterIP      10.152.183.53    <none>         8080/TCP                 34h
kubeflow          service/metadata-api-operator                  ClusterIP      10.152.183.199   <none>         30666/TCP                34h
kubeflow          service/metadata-db                            ClusterIP      10.152.183.223   <none>         3306/TCP                 34h
kubeflow          service/metadata-db-endpoints                  ClusterIP      None             <none>         <none>                   34h
kubeflow          service/metadata-db-operator                   ClusterIP      10.152.183.112   <none>         30666/TCP                34h
kubeflow          service/metadata-envoy                         ClusterIP      10.152.183.29    <none>         9090/TCP,9091/TCP        34h
kubeflow          service/metadata-envoy-operator                ClusterIP      10.152.183.206   <none>         30666/TCP                34h
kubeflow          service/metadata-grpc                          ClusterIP      10.152.183.77    <none>         8080/TCP                 34h
kubeflow          service/metadata-grpc-operator                 ClusterIP      10.152.183.178   <none>         30666/TCP                34h
kubeflow          service/metadata-ui                            ClusterIP      10.152.183.170   <none>         3000/TCP                 34h
kubeflow          service/metadata-ui-operator                   ClusterIP      10.152.183.220   <none>         30666/TCP                34h
kubeflow          service/minio                                  ClusterIP      10.152.183.187   <none>         9000/TCP                 34h
kubeflow          service/minio-endpoints                        ClusterIP      None             <none>         <none>                   34h
kubeflow          service/minio-operator                         ClusterIP      10.152.183.134   <none>         30666/TCP                34h
kubeflow          service/modeldb-backend                        ClusterIP      10.152.183.208   <none>         8085/TCP,8080/TCP        34h
kubeflow          service/modeldb-backend-operator               ClusterIP      10.152.183.122   <none>         30666/TCP                34h
kubeflow          service/modeldb-db                             ClusterIP      10.152.183.65    <none>         3306/TCP                 34h
kubeflow          service/modeldb-db-endpoints                   ClusterIP      None             <none>         <none>                   34h
kubeflow          service/modeldb-db-operator                    ClusterIP      10.152.183.7     <none>         30666/TCP                34h
kubeflow          service/modeldb-store                          ClusterIP      10.152.183.207   <none>         8086/TCP                 34h
kubeflow          service/modeldb-store-operator                 ClusterIP      10.152.183.189   <none>         30666/TCP                34h
kubeflow          service/modeldb-ui                             ClusterIP      10.152.183.35    <none>         3000/TCP                 34h
kubeflow          service/modeldb-ui-operator                    ClusterIP      10.152.183.219   <none>         30666/TCP                34h
kubeflow          service/oidc-gatekeeper                        ClusterIP      10.152.183.205   <none>         8080/TCP                 34h
kubeflow          service/oidc-gatekeeper-operator               ClusterIP      10.152.183.33    <none>         30666/TCP                34h
kubeflow          service/pipelines-api                          ClusterIP      10.152.183.118   <none>         8887/TCP,8888/TCP        34h
kubeflow          service/pipelines-api-operator                 ClusterIP      10.152.183.66    <none>         30666/TCP                34h
kubeflow          service/pipelines-db                           ClusterIP      10.152.183.238   <none>         3306/TCP                 34h
kubeflow          service/pipelines-db-endpoints                 ClusterIP      None             <none>         <none>                   34h
kubeflow          service/pipelines-db-operator                  ClusterIP      10.152.183.251   <none>         30666/TCP                34h
kubeflow          service/pipelines-persistence-operator         ClusterIP      10.152.183.34    <none>         30666/TCP                34h
kubeflow          service/pipelines-scheduledworkflow-operator   ClusterIP      10.152.183.19    <none>         30666/TCP                34h
kubeflow          service/pipelines-ui                           ClusterIP      10.152.183.159   <none>         3000/TCP                 34h
kubeflow          service/pipelines-ui-operator                  ClusterIP      10.152.183.99    <none>         30666/TCP                34h
kubeflow          service/pipelines-viewer-operator              ClusterIP      10.152.183.192   <none>         30666/TCP                34h
kubeflow          service/pipelines-visualization                ClusterIP      10.152.183.44    <none>         8888/TCP                 34h
kubeflow          service/pipelines-visualization-operator       ClusterIP      10.152.183.30    <none>         30666/TCP                34h
kubeflow          service/pytorch-operator-operator              ClusterIP      10.152.183.136   <none>         30666/TCP                34h
kubeflow          service/seldon-core                            ClusterIP      10.152.183.202   <none>         8080/TCP,9876/TCP        34h
kubeflow          service/seldon-core-operator                   ClusterIP      10.152.183.89    <none>         30666/TCP                34h
kubeflow          service/tf-job-operator-operator               ClusterIP      10.152.183.63    <none>         30666/TCP                34h

NAMESPACE        NAME                                               DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                 AGE
ingress          daemonset.apps/nginx-ingress-microk8s-controller   1         1         1       1            1           <none>                        7d12h
metallb-system   daemonset.apps/speaker                             1         1         1       1            1           beta.kubernetes.io/os=linux   7d12h

NAMESPACE        NAME                                             READY   UP-TO-DATE   AVAILABLE   AGE
kube-system      deployment.apps/coredns                          1/1     1            1           7d12h
kube-system      deployment.apps/dashboard-metrics-scraper        1/1     1            1           7d12h
kube-system      deployment.apps/heapster-v1.5.2                  1/1     1            1           7d12h
kube-system      deployment.apps/hostpath-provisioner             1/1     1            1           7d12h
kube-system      deployment.apps/kubernetes-dashboard             1/1     1            1           7d12h
kube-system      deployment.apps/monitoring-influxdb-grafana-v4   1/1     1            1           7d12h
kubeflow         deployment.apps/ambassador                       1/1     1            1           34h
kubeflow         deployment.apps/argo-controller                  1/1     1            1           34h
kubeflow         deployment.apps/argo-ui                          1/1     1            1           34h
kubeflow         deployment.apps/dex-auth                         1/1     1            1           34h
kubeflow         deployment.apps/jupyter-web                      1/1     1            1           34h
kubeflow         deployment.apps/katib-manager                    1/1     1            1           34h
kubeflow         deployment.apps/katib-ui                         1/1     1            1           34h
kubeflow         deployment.apps/kubeflow-dashboard               1/1     1            1           34h
kubeflow         deployment.apps/kubeflow-profiles                1/1     1            1           34h
kubeflow         deployment.apps/metacontroller                   1/1     1            1           34h
kubeflow         deployment.apps/metadata-api                     1/1     1            1           34h
kubeflow         deployment.apps/metadata-envoy                   1/1     1            1           34h
kubeflow         deployment.apps/metadata-grpc                    1/1     1            1           34h
kubeflow         deployment.apps/metadata-ui                      1/1     1            1           34h
kubeflow         deployment.apps/modeldb-backend                  1/1     1            1           34h
kubeflow         deployment.apps/modeldb-store                    1/1     1            1           34h
kubeflow         deployment.apps/modeldb-ui                       1/1     1            1           34h
kubeflow         deployment.apps/oidc-gatekeeper                  1/1     1            1           34h
kubeflow         deployment.apps/pipelines-api                    1/1     1            1           34h
kubeflow         deployment.apps/pipelines-persistence            1/1     1            1           34h
kubeflow         deployment.apps/pipelines-scheduledworkflow      1/1     1            1           34h
kubeflow         deployment.apps/pipelines-ui                     1/1     1            1           34h
kubeflow         deployment.apps/pipelines-viewer                 1/1     1            1           34h
kubeflow         deployment.apps/pipelines-visualization          1/1     1            1           34h
kubeflow         deployment.apps/pytorch-operator                 1/1     1            1           34h
kubeflow         deployment.apps/seldon-core                      1/1     1            1           34h
kubeflow         deployment.apps/tf-job-operator                  1/1     1            1           34h
metallb-system   deployment.apps/controller                       1/1     1            1           7d12h

NAMESPACE        NAME                                                        DESIRED   CURRENT   READY   AGE
kube-system      replicaset.apps/coredns-588fd544bf                          1         1         1       7d12h
kube-system      replicaset.apps/dashboard-metrics-scraper-db65b9c6f         1         1         1       7d12h
kube-system      replicaset.apps/heapster-v1.5.2-58fdbb6f4d                  1         1         1       7d12h
kube-system      replicaset.apps/hostpath-provisioner-75fdc8fccd             1         1         1       7d12h
kube-system      replicaset.apps/kubernetes-dashboard-67765b55f5             1         1         1       7d12h
kube-system      replicaset.apps/monitoring-influxdb-grafana-v4-6dc675bf8c   1         1         1       7d12h
kubeflow         replicaset.apps/ambassador-ddc587cfc                        1         1         1       34h
kubeflow         replicaset.apps/argo-controller-768d775887                  1         1         1       34h
kubeflow         replicaset.apps/argo-ui-8df7d5959                           1         1         1       34h
kubeflow         replicaset.apps/dex-auth-6d56d6ff8                          0         0         0       34h
kubeflow         replicaset.apps/dex-auth-747545cb7d                         1         1         1       34h
kubeflow         replicaset.apps/jupyter-web-557bbbb54d                      1         1         1       34h
kubeflow         replicaset.apps/katib-manager-844d6794f9                    1         1         1       34h
kubeflow         replicaset.apps/katib-ui-65756df8c4                         1         1         1       34h
kubeflow         replicaset.apps/kubeflow-dashboard-7b77777dc8               1         1         1       34h
kubeflow         replicaset.apps/kubeflow-profiles-7b96c8cd8d                1         1         1       34h
kubeflow         replicaset.apps/metacontroller-d7d7d475c                    1         1         1       34h
kubeflow         replicaset.apps/metadata-api-698dcf77d5                     1         1         1       34h
kubeflow         replicaset.apps/metadata-envoy-85d94685b8                   1         1         1       34h
kubeflow         replicaset.apps/metadata-grpc-667777bd                      1         1         1       34h
kubeflow         replicaset.apps/metadata-ui-644b5d8667                      1         1         1       34h
kubeflow         replicaset.apps/modeldb-backend-7bb66bf5b8                  1         1         1       34h
kubeflow         replicaset.apps/modeldb-store-fc777db77                     1         1         1       34h
kubeflow         replicaset.apps/modeldb-ui-8687cc7847                       1         1         1       34h
kubeflow         replicaset.apps/oidc-gatekeeper-54d94b78d5                  1         1         1       34h
kubeflow         replicaset.apps/pipelines-api-869d8c5966                    1         1         1       34h
kubeflow         replicaset.apps/pipelines-persistence-7888d8649b            1         1         1       34h
kubeflow         replicaset.apps/pipelines-scheduledworkflow-d559c6b6        1         1         1       34h
kubeflow         replicaset.apps/pipelines-ui-9fd8df454                      1         1         1       34h
kubeflow         replicaset.apps/pipelines-viewer-5566947b98                 1         1         1       34h
kubeflow         replicaset.apps/pipelines-visualization-59b8dddc4f          1         1         1       34h
kubeflow         replicaset.apps/pytorch-operator-74b47d68f4                 1         1         1       34h
kubeflow         replicaset.apps/seldon-core-54d7fdb449                      1         1         1       34h
kubeflow         replicaset.apps/tf-job-operator-5d6d8dd568                  1         1         1       34h
metallb-system   replicaset.apps/controller-5f98465b6b                       1         1         1       7d12h

NAMESPACE         NAME                                             READY   AGE
admin             statefulset.apps/kf-demo                         1/1     7d1h
admin             statefulset.apps/rui-nb                          1/1     7d11h
controller-uk8s   statefulset.apps/controller                      1/1     34h
kubeflow          statefulset.apps/cert-manager-webhook-operator   1/1     34h
kubeflow          statefulset.apps/katib-db                        1/1     34h
kubeflow          statefulset.apps/metadata-db                     1/1     34h
kubeflow          statefulset.apps/minio                           1/1     34h
kubeflow          statefulset.apps/modeldb-db                      1/1     34h
kubeflow          statefulset.apps/pipelines-db                    1/1     34h

But dashboard can not be accessed:

image

And juju status mentions cert-manager-webhook terminated:

$ microk8s.juju status
Model     Controller  Cloud/Region        Version  SLA          Timestamp  Notes
kubeflow  uk8s        microk8s/localhost  2.7.3    unsupported  21:06:27Z  attempt 43 to destroy model failed (will retry):  model not empty, found 1 application (model not empty)

App                   Version  Status      Scale  Charm                 Store       Rev  OS          Address  Notes
cert-manager-webhook           terminated    0/1  cert-manager-webhook  jujucharms    9  kubernetes           

Unit                    Workload    Agent      Address  Ports  Message
cert-manager-webhook/0  terminated  executing                  (stop) 
@ktsakalozos
Copy link
Member

@knkski is it possible that the machine got a new IP on the second start and this affects Kubeflow?

@ktsakalozos ktsakalozos assigned ktsakalozos and knkski and unassigned ktsakalozos Jun 16, 2020
@knkski
Copy link
Contributor

knkski commented Jul 27, 2020

I'm going to close this as a duplicate of #1427, as that's happening due to the host machine for microk8s getting restarted. @RFMVasconcelos, if you run into this again, can you paste the output from microk8s.kubectl logs -n kube-system -l k8s-app=kube-dns, and reopen this issue if there aren't any error messages that look like the logs from that issue?

@knkski knkski closed this as completed Jul 27, 2020
@rui-vas
Copy link
Author

rui-vas commented Jul 28, 2020

Trying to replicate the error. Will send the logs if I can replicate it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants