Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kube-dns and kubernetes-dashboard crashing after deploying k8s dashboard on ver 1.8.0-beta.1 #3845

Closed
roeera opened this issue Nov 13, 2017 · 18 comments
Milestone

Comments

@roeera
Copy link

roeera commented Nov 13, 2017

Thanks for submitting an issue!

-------------BUG REPORT --------------------

  1. Fill in as much of the template below as you can.

  2. What kops version are you running? use kops version
    Version 1.8.0-beta.1 (git-9b71713)

  3. What Kubernetes version are you running? use kubectl version
    Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.2", GitCommit:"bdaeafa71f6c7c04636251031f93464384d54963", GitTreeState:"clean", BuildDate:"2017-10-24T21:07:53Z", GoVersion:"go1.9.1", Compiler:"gc", Platform:"darwin/amd64"}
    Server Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.0", GitCommit:"6e937839ac04a38cac63e6a7a306c5d035fe7b0a", GitTreeState:"clean", BuildDate:"2017-09-28T22:46:41Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}

  4. What cloud provider are you using?
    AWS

  5. What commands did you execute (Please provide cluster manifest kops get --name my.example.com, if available) and what happened after commands executed?
    a. #!/bin/bash

export KOPS_STATE_STORE=s3://XXXXXXXX
export CLUSTER_NAME=test-k8s.gett.io # Edit with your cluster name
export VPC_ID=vpc-XXXXXXXX # Edit with your VPC id
export NETWORK_CIDR=XXXXXXXXX # Edit with the cidr of your VPC

kops create cluster
--zones=eu-west-1a,eu-west-1b,eu-west-1c
--master-zones=eu-west-1a,eu-west-1b,eu-west-1c
--name=${CLUSTER_NAME}
--vpc=${VPC_ID}
--cloud=aws
--cloud-labels "Environment="test",Name="${CLUSTER_NAME}",Role="node",Provisioner="kops""
--ssh-public-key=/Users/roee/.ssh/gett-20150327.pub
--node-count=2
--networking=flannel
--node-size=m4.large
--master-size=m4.large
--dns-zone=gett.io
--image=ami-785db401
--topology private
b. kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/master/src/deploy/recommended/kubernetes-dashboard.yaml
c. Before dash board creation things are well functioning (means cluster is ready), after deploying things stat crashing:

roee@Roees-MacBook-Pro  ~  kops validate cluster --name test-k8s.gett.io
Validating cluster test-k8s.gett.io

INSTANCE GROUPS
NAME ROLE MACHINETYPE MIN MAX SUBNETS
master-eu-west-1a Master m4.large 1 1 eu-west-1a
master-eu-west-1b Master m4.large 1 1 eu-west-1b
master-eu-west-1c Master m4.large 1 1 eu-west-1c
nodes Node m4.large 2 2 eu-west-1a,eu-west-1b,eu-west-1c

NODE STATUS
NAME ROLE READY
ip-172-23-77-167.eu-west-1.compute.internal master True
ip-172-23-78-218.eu-west-1.compute.internal master True
ip-172-23-79-214.eu-west-1.compute.internal node True
ip-172-23-80-34.eu-west-1.compute.internal node True
ip-172-23-81-65.eu-west-1.compute.internal master True

Pod Failures in kube-system
NAME
cluster-autoscaler-7f877d6965-5nnbx
kube-dns-7f56f9f8c7-prdhs
kubernetes-dashboard-747c4f7cf-4k8vc

Validation Failed
Ready Master(s) 3 out of 3.
Ready Node(s) 2 out of 2.

(output from : kops get --name my.example.com)
✘ roee@Roees-MacBook-Pro  ~/Temp/heapster   master  kops get --name test-k8s.gett.io
Cluster
NAME CLOUD ZONES
test-k8s.gett.io aws eu-west-1a,eu-west-1b,eu-west-1c

Instance Groups
NAME ROLE MACHINETYPE MIN MAX ZONES
master-eu-west-1a Master m4.large 1 1 eu-west-1a
master-eu-west-1b Master m4.large 1 1 eu-west-1b
master-eu-west-1c Master m4.large 1 1 eu-west-1c
nodes Node m4.large 2 2 eu-west-1a,eu-west-1b,eu-west-1c

  1. How can we to reproduce it (as minimally and precisely as possible):
    Install cluster using kops 1.8.0-beta.1
    Deploy dashboard using :
    kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/master/src/deploy/recommended/kubernetes-dashboard.yaml
  2. Anything else do we need to know:
    kops config file :
apiVersion: kops/v1alpha2
kind: Cluster
metadata:
  creationTimestamp: 2017-10-23T12:47:58Z
  name: test-k8s.gett.io
spec:
  api:
    loadBalancer:
      type: Internal
  authorization:
    alwaysAllow: {}
  channel: stable
  cloudLabels:
    Environment: test
    Name: test-k8s.gett.io
    Provisioner: kops
    Role: node
  cloudProvider: aws
  configBase: s3://xxxx
  dnsZone: gett.io
  etcdClusters:
  - etcdMembers:
    - instanceGroup: master-eu-west-1a
      name: a
    - instanceGroup: master-eu-west-1b
      name: b
    - instanceGroup: master-eu-west-1c
      name: c
    name: main
  - etcdMembers:
    - instanceGroup: master-eu-west-1a
      name: a
    - instanceGroup: master-eu-west-1b
      name: b
    - instanceGroup: master-eu-west-1c
      name: c
    name: events
  iam:
    allowContainerRegistry: true
    legacy: false
  kubeAPIServer:
    authorizationMode: RBAC
  kubernetesApiAccess:
  - 0.0.0.0/0
  kubernetesVersion: 1.8.0
  masterInternalName: api.internal.test-k8s.gett.io
  masterPublicName: api.test-k8s.gett.io
  networkCIDR: 172.23.0.0/16
  networkID: vpc-ed80478a
  networking:
    flannel:
      backend: vxlan
  nonMasqueradeCIDR: 100.64.0.0/10
  sshAccess:
  - 0.0.0.0/0
  subnets:
  - cidr: 172.23.76.0/23
    egress: nat-987654321
    id: subnet-789c061f
    name: eu-west-1a
    type: Private
    zone: eu-west-1a
  - cidr: 172.23.78.0/23
    egress: nat-987654321
    id: subnet-929efcdb
    name: eu-west-1b
    type: Private
    zone: eu-west-1b
  - cidr: 172.23.80.0/23
    egress: nat-987654321
    id: subnet-59f05602
    name: eu-west-1c
    type: Private
    zone: eu-west-1c
  topology:
    dns:
      type: Public
    masters: private
    nodes: private

@roeera
Copy link
Author

roeera commented Nov 13, 2017

Finally (after long day) i found that running the following command inside my nodes, solved the problem :

iptables -P FORWARD ACCEPT

The open question is, why this command is necessary to be run during upgrade from 1.7.10 to 1.8.0 cluster ? @chrislovecnm any idea ?

@chrislovecnm
Copy link
Contributor

We have an update for flannel going in. I am not certain about the IP tables command, as I have not used flannel at all.

@justinsb / @apenney thoughts?

@roeera
Copy link
Author

roeera commented Nov 19, 2017

@justinsb @apenney appreciate your comment.
thanks

@chrislovecnm
Copy link
Contributor

@justinsb I do not want to address flannel issues this late in the game. Can we fix after 1.8.0?

@greg-jaunt
Copy link

Without working flannel-vxlan is there any option for deploying a fast overlay network (that is not restricted to layer 2 only topologies) in kops 1.8?

@justinsb justinsb added this to the 1.8.0 milestone Nov 26, 2017
@iMartyn
Copy link
Contributor

iMartyn commented Nov 27, 2017

Workaround that we're using for now - add a hook to do the iptables magic as per the following :

  hooks:
  - execContainer:
      command:
      - sh
      - -c
      - chroot /rootfs /sbin/iptables -P FORWARD ACCEPT
      image: busybox

@chrislovecnm
Copy link
Contributor

So it sounds like flannel expects ip tables changes. Really flannel should do this itself, and this bug needs to be filed on flannel.

@chrislovecnm
Copy link
Contributor

#3880

Is an example of how to update flannel. We need to bump the version.

We need a side car pod or an init container, probably an init container.

@iMartyn
Copy link
Contributor

iMartyn commented Nov 27, 2017

I'm working on an initcontainer in the flannel manifest that functions just like the hook above to see if it fixes the issue. Not a nice solution, but lets us get across the 1.8 release line hopefully.

@jkemp101
Copy link

I'm in the middle this myself and want to clarify what the plan is. Doesn't #3880 fix this issue because it has Flannel v0.9.1 which just includes this fix flannel-io/flannel#872? Will #3880 not be in the final kops 1.8?

@chrislovecnm
Copy link
Contributor

What is in master is what will be in kops 1.8.x. Can someone please test a master build and report back?

@chrislovecnm
Copy link
Contributor

chrislovecnm commented Nov 27, 2017

@tomdee can you assist here? I think there is some confusion.

To be clear for everyone else, please download and compile kops from master for testing. Instructions are under the dev folder in docs.

@tomdee
Copy link
Contributor

tomdee commented Nov 28, 2017

Yes, #3880 is to pull in flannel v0.9.1 which fixes this exact problem.

@chrislovecnm
Copy link
Contributor

Can someone please test master ;)

@chrislovecnm
Copy link
Contributor

So master was sorta fixed ;) we missed bumping flannel version, but we just PR’ed that fix. On mobile so I am not going to look up the PR number, but the PR is closed and will go into the release.

Again, we are feature frozen for 1.8.0, if some people can test upgrading and new clusters it will ensure that this release is stable.

Thanks

@arielkung
Copy link

Upgraded 1.7 + flannel vxlan to k8s 1.8.4 with kops Version 1.8.0-beta.2 (git-1bcf467b) and everything worked fine. Also upgraded Dashboard addon without issues.

@chrislovecnm
Copy link
Contributor

Closing ... yay

@iMartyn
Copy link
Contributor

iMartyn commented Nov 29, 2017

Apologies for dropping off on this one, got busy with other stuff, hopefully my next PR will be quicker turnaround.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants