kube-dns and kubernetes-dashboard crashing after deploying k8s dashboard on ver 1.8.0-beta.1 #3845

roeera · 2017-11-13T09:34:36Z

Thanks for submitting an issue!

-------------BUG REPORT --------------------

Fill in as much of the template below as you can.
What kops version are you running? use kops version
Version 1.8.0-beta.1 (git-9b71713)
What Kubernetes version are you running? use kubectl version
Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.2", GitCommit:"bdaeafa71f6c7c04636251031f93464384d54963", GitTreeState:"clean", BuildDate:"2017-10-24T21:07:53Z", GoVersion:"go1.9.1", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.0", GitCommit:"6e937839ac04a38cac63e6a7a306c5d035fe7b0a", GitTreeState:"clean", BuildDate:"2017-09-28T22:46:41Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
What cloud provider are you using?
AWS
What commands did you execute (Please provide cluster manifest kops get --name my.example.com, if available) and what happened after commands executed?
a. #!/bin/bash

export KOPS_STATE_STORE=s3://XXXXXXXX
export CLUSTER_NAME=test-k8s.gett.io # Edit with your cluster name
export VPC_ID=vpc-XXXXXXXX # Edit with your VPC id
export NETWORK_CIDR=XXXXXXXXX # Edit with the cidr of your VPC

kops create cluster
--zones=eu-west-1a,eu-west-1b,eu-west-1c
--master-zones=eu-west-1a,eu-west-1b,eu-west-1c
--name=${CLUSTER_NAME}
--vpc=${VPC_ID}
--cloud=aws
--cloud-labels "Environment="test",Name="${CLUSTER_NAME}",Role="node",Provisioner="kops""
--ssh-public-key=/Users/roee/.ssh/gett-20150327.pub
--node-count=2
--networking=flannel
--node-size=m4.large
--master-size=m4.large
--dns-zone=gett.io
--image=ami-785db401
--topology private
b. kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/master/src/deploy/recommended/kubernetes-dashboard.yaml
c. Before dash board creation things are well functioning (means cluster is ready), after deploying things stat crashing:

roee@Roees-MacBook-Pro  ~  kops validate cluster --name test-k8s.gett.io
Validating cluster test-k8s.gett.io

INSTANCE GROUPS
NAME ROLE MACHINETYPE MIN MAX SUBNETS
master-eu-west-1a Master m4.large 1 1 eu-west-1a
master-eu-west-1b Master m4.large 1 1 eu-west-1b
master-eu-west-1c Master m4.large 1 1 eu-west-1c
nodes Node m4.large 2 2 eu-west-1a,eu-west-1b,eu-west-1c

NODE STATUS
NAME ROLE READY
ip-172-23-77-167.eu-west-1.compute.internal master True
ip-172-23-78-218.eu-west-1.compute.internal master True
ip-172-23-79-214.eu-west-1.compute.internal node True
ip-172-23-80-34.eu-west-1.compute.internal node True
ip-172-23-81-65.eu-west-1.compute.internal master True

Pod Failures in kube-system
NAME
cluster-autoscaler-7f877d6965-5nnbx
kube-dns-7f56f9f8c7-prdhs
kubernetes-dashboard-747c4f7cf-4k8vc

Validation Failed
Ready Master(s) 3 out of 3.
Ready Node(s) 2 out of 2.

(output from : kops get --name my.example.com)
✘ roee@Roees-MacBook-Pro  ~/Temp/heapster   master  kops get --name test-k8s.gett.io
Cluster
NAME CLOUD ZONES
test-k8s.gett.io aws eu-west-1a,eu-west-1b,eu-west-1c

Instance Groups
NAME ROLE MACHINETYPE MIN MAX ZONES
master-eu-west-1a Master m4.large 1 1 eu-west-1a
master-eu-west-1b Master m4.large 1 1 eu-west-1b
master-eu-west-1c Master m4.large 1 1 eu-west-1c
nodes Node m4.large 2 2 eu-west-1a,eu-west-1b,eu-west-1c

How can we to reproduce it (as minimally and precisely as possible):
Install cluster using kops 1.8.0-beta.1
Deploy dashboard using :
kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/master/src/deploy/recommended/kubernetes-dashboard.yaml
Anything else do we need to know:
kops config file :

apiVersion: kops/v1alpha2
kind: Cluster
metadata:
  creationTimestamp: 2017-10-23T12:47:58Z
  name: test-k8s.gett.io
spec:
  api:
    loadBalancer:
      type: Internal
  authorization:
    alwaysAllow: {}
  channel: stable
  cloudLabels:
    Environment: test
    Name: test-k8s.gett.io
    Provisioner: kops
    Role: node
  cloudProvider: aws
  configBase: s3://xxxx
  dnsZone: gett.io
  etcdClusters:
  - etcdMembers:
    - instanceGroup: master-eu-west-1a
      name: a
    - instanceGroup: master-eu-west-1b
      name: b
    - instanceGroup: master-eu-west-1c
      name: c
    name: main
  - etcdMembers:
    - instanceGroup: master-eu-west-1a
      name: a
    - instanceGroup: master-eu-west-1b
      name: b
    - instanceGroup: master-eu-west-1c
      name: c
    name: events
  iam:
    allowContainerRegistry: true
    legacy: false
  kubeAPIServer:
    authorizationMode: RBAC
  kubernetesApiAccess:
  - 0.0.0.0/0
  kubernetesVersion: 1.8.0
  masterInternalName: api.internal.test-k8s.gett.io
  masterPublicName: api.test-k8s.gett.io
  networkCIDR: 172.23.0.0/16
  networkID: vpc-ed80478a
  networking:
    flannel:
      backend: vxlan
  nonMasqueradeCIDR: 100.64.0.0/10
  sshAccess:
  - 0.0.0.0/0
  subnets:
  - cidr: 172.23.76.0/23
    egress: nat-987654321
    id: subnet-789c061f
    name: eu-west-1a
    type: Private
    zone: eu-west-1a
  - cidr: 172.23.78.0/23
    egress: nat-987654321
    id: subnet-929efcdb
    name: eu-west-1b
    type: Private
    zone: eu-west-1b
  - cidr: 172.23.80.0/23
    egress: nat-987654321
    id: subnet-59f05602
    name: eu-west-1c
    type: Private
    zone: eu-west-1c
  topology:
    dns:
      type: Public
    masters: private
    nodes: private

The text was updated successfully, but these errors were encountered:

roeera · 2017-11-13T16:36:38Z

Finally (after long day) i found that running the following command inside my nodes, solved the problem :

iptables -P FORWARD ACCEPT

The open question is, why this command is necessary to be run during upgrade from 1.7.10 to 1.8.0 cluster ? @chrislovecnm any idea ?

chrislovecnm · 2017-11-13T20:15:11Z

We have an update for flannel going in. I am not certain about the IP tables command, as I have not used flannel at all.

@justinsb / @apenney thoughts?

roeera · 2017-11-19T09:08:01Z

@justinsb @apenney appreciate your comment.
thanks

chrislovecnm · 2017-11-23T02:13:35Z

@justinsb I do not want to address flannel issues this late in the game. Can we fix after 1.8.0?

greg-jaunt · 2017-11-25T23:00:47Z

Without working flannel-vxlan is there any option for deploying a fast overlay network (that is not restricted to layer 2 only topologies) in kops 1.8?

iMartyn · 2017-11-27T11:33:26Z

Workaround that we're using for now - add a hook to do the iptables magic as per the following :

  hooks:
  - execContainer:
      command:
      - sh
      - -c
      - chroot /rootfs /sbin/iptables -P FORWARD ACCEPT
      image: busybox

chrislovecnm · 2017-11-27T17:37:08Z

So it sounds like flannel expects ip tables changes. Really flannel should do this itself, and this bug needs to be filed on flannel.

chrislovecnm · 2017-11-27T17:40:55Z

#3880

Is an example of how to update flannel. We need to bump the version.

We need a side car pod or an init container, probably an init container.

iMartyn · 2017-11-27T19:33:49Z

I'm working on an initcontainer in the flannel manifest that functions just like the hook above to see if it fixes the issue. Not a nice solution, but lets us get across the 1.8 release line hopefully.

jkemp101 · 2017-11-27T19:53:23Z

I'm in the middle this myself and want to clarify what the plan is. Doesn't #3880 fix this issue because it has Flannel v0.9.1 which just includes this fix flannel-io/flannel#872? Will #3880 not be in the final kops 1.8?

chrislovecnm · 2017-11-27T22:07:17Z

What is in master is what will be in kops 1.8.x. Can someone please test a master build and report back?

chrislovecnm · 2017-11-27T22:09:18Z

@tomdee can you assist here? I think there is some confusion.

To be clear for everyone else, please download and compile kops from master for testing. Instructions are under the dev folder in docs.

tomdee · 2017-11-28T01:17:23Z

Yes, #3880 is to pull in flannel v0.9.1 which fixes this exact problem.

chrislovecnm · 2017-11-28T01:20:50Z

Can someone please test master ;)

chrislovecnm · 2017-11-28T23:34:11Z

So master was sorta fixed ;) we missed bumping flannel version, but we just PR’ed that fix. On mobile so I am not going to look up the PR number, but the PR is closed and will go into the release.

Again, we are feature frozen for 1.8.0, if some people can test upgrading and new clusters it will ensure that this release is stable.

Thanks

arielkung · 2017-11-29T09:27:06Z

Upgraded 1.7 + flannel vxlan to k8s 1.8.4 with kops Version 1.8.0-beta.2 (git-1bcf467b) and everything worked fine. Also upgraded Dashboard addon without issues.

chrislovecnm · 2017-11-29T09:28:08Z

Closing ... yay

iMartyn · 2017-11-29T18:54:05Z

Apologies for dropping off on this one, got busy with other stuff, hopefully my next PR will be quicker turnaround.

justinsb added the blocks-next label Nov 16, 2017

justinsb added this to the 1.8.0 milestone Nov 26, 2017

chrislovecnm closed this as completed Nov 29, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kube-dns and kubernetes-dashboard crashing after deploying k8s dashboard on ver 1.8.0-beta.1 #3845

kube-dns and kubernetes-dashboard crashing after deploying k8s dashboard on ver 1.8.0-beta.1 #3845

roeera commented Nov 13, 2017 •

edited

Loading

roeera commented Nov 13, 2017 •

edited

Loading

chrislovecnm commented Nov 13, 2017

roeera commented Nov 19, 2017

chrislovecnm commented Nov 23, 2017

greg-jaunt commented Nov 25, 2017

iMartyn commented Nov 27, 2017

chrislovecnm commented Nov 27, 2017

chrislovecnm commented Nov 27, 2017

iMartyn commented Nov 27, 2017

jkemp101 commented Nov 27, 2017

chrislovecnm commented Nov 27, 2017

chrislovecnm commented Nov 27, 2017 •

edited

Loading

tomdee commented Nov 28, 2017

chrislovecnm commented Nov 28, 2017

chrislovecnm commented Nov 28, 2017

arielkung commented Nov 29, 2017

chrislovecnm commented Nov 29, 2017

iMartyn commented Nov 29, 2017

kube-dns and kubernetes-dashboard crashing after deploying k8s dashboard on ver 1.8.0-beta.1 #3845

kube-dns and kubernetes-dashboard crashing after deploying k8s dashboard on ver 1.8.0-beta.1 #3845

Comments

roeera commented Nov 13, 2017 • edited Loading

roeera commented Nov 13, 2017 • edited Loading

chrislovecnm commented Nov 13, 2017

roeera commented Nov 19, 2017

chrislovecnm commented Nov 23, 2017

greg-jaunt commented Nov 25, 2017

iMartyn commented Nov 27, 2017

chrislovecnm commented Nov 27, 2017

chrislovecnm commented Nov 27, 2017

iMartyn commented Nov 27, 2017

jkemp101 commented Nov 27, 2017

chrislovecnm commented Nov 27, 2017

chrislovecnm commented Nov 27, 2017 • edited Loading

tomdee commented Nov 28, 2017

chrislovecnm commented Nov 28, 2017

chrislovecnm commented Nov 28, 2017

arielkung commented Nov 29, 2017

chrislovecnm commented Nov 29, 2017

iMartyn commented Nov 29, 2017

roeera commented Nov 13, 2017 •

edited

Loading

roeera commented Nov 13, 2017 •

edited

Loading

chrislovecnm commented Nov 27, 2017 •

edited

Loading