Skip to content
This repository has been archived by the owner on Jul 30, 2021. It is now read-only.

Instructions for multi master #311

Closed
rmenn opened this issue Feb 16, 2017 · 13 comments
Closed

Instructions for multi master #311

rmenn opened this issue Feb 16, 2017 · 13 comments
Assignees
Labels
kind/documentation Categorizes issue or PR as related to documentation. kind/support Categorizes issue or PR as a support question. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/P1

Comments

@rmenn
Copy link
Contributor

rmenn commented Feb 16, 2017

Greetings,

I have been trying to ask this on irc as well as the k8s slack, but am restoring to a ticket here. I apologize.

I wanted to know if a multi master setup is possible with bootkube, if so how do i do it, especially with the experimental etcd flag set.

Just need someone to point me in the right direction.

Thanks

@bzub
Copy link

bzub commented Feb 16, 2017

For Master k8s components you should only need to give another node the master=true label.

kubectl label node node2.zbrbdl "master=true"

Then you can either create a LoadBalancer with external IP to the kubernetes service (default namespace) or point external DNS to one or all of the API server nodes for kubectl clients to use.

For self-hosted-etcd you can try the steps in the etcd-operator README. I haven't tried that yet myself.

@aaronlevy
Copy link
Contributor

As @bzub points out, simply labeling the node as a master will start master components on these nodes. The main change is that you need some way of addressing your multiple api-servers from a single address. kubeconfig only supports a single api-server address, and even though you can specify multiple on the kubelet command line, only the first is really used.

So a loadbalancer which fronts all api-servers (master nodes), or DNS entry which maps to those nodes are usually good options. You would then set your api server address in the kubeconfig to point to the dns or loadbalancer.

There is also somewhat of a limitation in the internal kubernetes service where the multiple api-servers will all overwrite eachother as the only endpoint. (To see this kubectl get service kubernetes - if you have multiple apiservers running, the endpoint will constantly change).

This isn't the worst thing in the world, but it's not ideal (there is work to resolve this upstream). In the interim, an option is to also use your loadbalancer/dns entry for this endpoint as well, which can be done by setting the apiserver --advertise-address=<your loadbalancer> - so the endpoint will always point to the same location.

@aaronlevy aaronlevy added the kind/support Categorizes issue or PR as a support question. label Feb 17, 2017
@aaronlevy aaronlevy self-assigned this Feb 17, 2017
@bzub
Copy link

bzub commented Feb 18, 2017

I'm just starting to implement an automated HA failover system for kube-apiserver with keepalived-vip and @aaronlevy your comment about the default kubernetes service was very enlightening. I really would have overlooked that issue, as limited as it is.

Looking into it further I found that the correct behavior for the kubernetes api service is enabled by editing the kube-apiserver DaemonSet and passing the --apiserver-count=<int> to apiserver, with the correct number of master nodes. Once that's applied, delete each apiserver one at a time to apply the change and add the master label to a non-master node if desired. This way you can then keep --advertise-address=$(POD_IP) the way it is.

It's unfortunate that this isn't mentioned in the primary Kubernetes High-Availability document.

Also, please be warned if you try keepalived-vip's README example that the examples/echoheaders.yaml manifest has an improper ---- separator in the yaml, I had to remove one hyphen so there's only three. I'll file an issue there.

@aaronlevy
Copy link
Contributor

@bzub be careful about using the --apiserver-count flag -- its behavior is a little less than desirable (see: kubernetes/kubernetes#22609)

Essentially you're putting a fixed number of endpoints into the kubernetes service, and if those endpoints happen to be down, a certain percentage of requests just fail (because the endpoints are not cleaned up).

@aaronlevy aaronlevy added kind/documentation Categorizes issue or PR as related to documentation. priority/P1 labels Mar 6, 2017
This was referenced Mar 10, 2017
@klausenbusk
Copy link
Contributor

klausenbusk commented Jun 19, 2017

So a loadbalancer which fronts all api-servers (master nodes), or DNS entry which maps to those nodes are usually good options. You would then set your api server address in the kubeconfig to point to the dns or loadbalancer.

I'm currently using nginx-proxy (pod, template, nginx.conf) from the kubespray (with hard-coded upstreams) in my coreos-kubernetes cluster (I'm properly gonna setup a new 1.6 cluster with bootkube, the current cluster is 1.5.4 iirc).

Maybe we could add that pod to bootkube? We just miss a dynamic config writer which include all the master nodes in the nginx.conf.

@klausenbusk
Copy link
Contributor

Proof of concept: #684

@omkensey
Copy link

Correct me if I'm wrong -- this results in a single nginx pod, yes? Won't that potentially move around the cluster and thus the HA endpoint IP will change? (Actually, if it's just a Pod with no Deployment in front of it, won't it just go down if the current node fails? Pods explicitly do not survive node failures.) I like something like keepalived-vip better or even a full cluster service like Pacemaker managing things so that the IP never changes but follows the LB provider around. Alternately maybe the Pod could be a single-replica Deployment with an init container to handle registering the current IP with DNS.

@klausenbusk
Copy link
Contributor

Correct me if I'm wrong -- this results in a single nginx pod, yes?

Are you referring to #684? #684 runs a nginx-prox pod on every node and listen on localhost, you would then connect to the API server through localhost.
I'm using that design (but not #684) right now in a 12 node cluster (3 masters + 7 workers + 2 "vpn" servers), it works pretty well.

@anguslees
Copy link

anguslees commented Dec 7, 2018

I realise this is an ancient bug, but just in case anyone is still reading it:

  • add your 3 masters to round robin (external) dns, and use that DNS name from outside the cluster (or from net=host pods, importantly including kube-proxy)
  • configure kube-controller-manager and kube-scheduler to talk to apiserver via localhost, since these jobs always run on master nodes. They could also use the external DNS name, but localhost is simpler.
  • Add readinessprobes to apiserver manifest and change the default.kubernetes Service to refer to the self-hosted apiserver pods by selector, like a regular internal k8s Service.
  • use default.kubernetes (via kube-proxy) from inside the cluster as usual

No need for nginx or keepalived, failover is automatic (with at worst a TCP connect retry for external clients) - except for updating the external round-robin DNS record. Just make updating that DNS entry part of your master node replacement process (which is already somewhat special-cased).

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 27, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 27, 2019
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/documentation Categorizes issue or PR as related to documentation. kind/support Categorizes issue or PR as a support question. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/P1
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants