Masters should not be excluded from service load balancers #65618

ljani · 2018-06-29T06:17:17Z

Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug

What happened:
I'm running a single node cluster on AWS EC2, ie. scheduling work on the master as well. Now that I tried to add an ELB for my service, the EC2 instance is not associated with the ELB. The ELB gets created but there are 0 EC2 instances associated with it. I'm using ELB for SSL termination, if you wonder why I'd like to load balance a single node cluster.

The service controller logs this message:

I0628 17:54:48.853175       1 event.go:221] Event(v1.ObjectReference{Kind:"Service", Namespace:"default", Name:"nginx", UID:"59980933-7afc-11e8-9a8c-0621b993d602", APIVersion:"v1", ResourceVersion:"2052", FieldPath:""}): type: 'Warning' reason: 'UnAvailableLoadBalancer' There are no available nodes for LoadBalancer service default/nginx

What you expected to happen:
The EC2 instance is associated with the ELB.

How to reproduce it (as minimally and precisely as possible):

Boot an EC2 instance with correct IAM permissions
Install a single node cluster using kubeadm and cloud-provider=aws
Untaint the node
Schedule eg. nginx
Create an external load balancer for nginx
View the ELB in AWS console and note the Status: 0 of 0 instances in service

Anything else we need to know?:
The reason for this seems to be the node-role.kubernetes.io/master label. This blocks associating a load balancer with the node. On the other hand this changed what is included, because includeNodeFromNodeList did not check if a node is a master. I'm not sure what would be correct fix. I could try submit a PR, if you guide me how this should behave. Is my scenario even a supported one?

I think this bug should be reproducible on other clouds as well.

Environment:

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.0", GitCommit:"91e7b4fd31fcd3d5f436da26c980becec37ceefe", GitTreeState:"clean", BuildDate:"2018-06-27T20:17:28Z", GoVersion:"go1.10.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.0", GitCommit:"91e7b4fd31fcd3d5f436da26c980becec37ceefe", GitTreeState:"clean", BuildDate:"2018-06-27T20:08:34Z", GoVersion:"go1.10.2", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider or hardware configuration: AWS
OS (e.g. from /etc/os-release):

PRETTY_NAME="Debian GNU/Linux 9 (stretch)"
NAME="Debian GNU/Linux"
VERSION_ID="9"
VERSION="9 (stretch)"
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

Kernel (e.g. uname -a):

Linux ip-x.eu-central-1.compute.internal 4.9.0-6-amd64 #1 SMP Debian 4.9.88-1+deb9u1 (2018-05-07) x86_64 GNU/Linux

Install tools: kubeadm

The text was updated successfully, but these errors were encountered:

ljani · 2018-06-29T06:24:04Z

#33884 has area/nodecontroller, so I think the correct sig is @kubernetes/sig-node-bugs.

k8s-ci-robot · 2018-06-29T06:24:11Z

@ljani: Reiterating the mentions to trigger a notification:
@kubernetes/sig-node-bugs

In response to this:

#33884 has area/nodecontroller, so I think the correct sig is @kubernetes/sig-node-bugs.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

jhorwit2 · 2018-07-05T16:37:21Z

@ljani This is by design, load balancers do not include master nodes in the available pool of backend servers. You should remove the node-role.kubernetes.io/master label from your node and ensure it's schedulable & ready in order for it to be a backend. (exact filter logic lives here)

I'm going to close this issue as it's not a bug. If you would like to see the ability to use masters as available backends for LB's then please create another ticket with a feature request.

/close

ljani · 2018-07-05T16:57:12Z

@jhorwit2 Thanks for the response. So, it's okay to remove the node-role.kubernetes.io/master label even if there's only one node in the cluster? Are there any drawbacks? I was a bit wary of that, because I thought each cluster would need at least one master and you should only remove the NoSchedule taint from it. The kubeadm guide only refers to removing the taint from the master and not the label itself, but yeah, the guide is for a single master cluster, so it would be a little contradictory.

Searching for masterless does not really yield any related results.

Anyhow, if that's a supported scenario, then I'm very happy with it. Otherwise I should open the ticket.

jhorwit2 · 2018-07-05T17:14:31Z

It all depends on how the cluster is setup and if they depend on the labels or not. Single node clusters will probably always have some quirks.

ljani · 2018-07-05T17:56:02Z

I've been following the Creating a single master cluster with kubeadm guide and thus I'm using kubeadm. I'll try and see how it goes.

jhorwit2 · 2018-07-05T19:06:12Z

The docs are probably incorrect when it comes to this because it wasn't tested. They should be updated.

/reopen
/remove-sig node
/sig cluster-lifecycle

k8s-ci-robot · 2018-07-05T19:10:12Z

@jhorwit2: Those labels are not set on the issue: sig/node

In response to this:

The docs are probably incorrect when it comes to this because it wasn't tested. They should be updated.

/reopen
/remove-sig node
/sig cluster-lifecycle

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

neolit123 · 2018-07-07T18:14:20Z

please, mind that kubeadm documentation will desist from adding cloud provider specific documentation in the future, but instead link to external resources.

/cc @kubernetes/sig-cluster-lifecycle-bugs
/cc @kubernetes/sig-cloud-provider-bugs

smarterclayton · 2018-08-21T18:57:14Z

My comment in the other issue:

I'm not sure this is correct as we run more things on the master. If I have a service that I want to expose like an API proxy, an aggregated API server that wants its own LB, or a not quite control plane but not quite end user workload. We shouldn't bias towards the master (I.e. a lot of node port services for regular workloads should bypass the master), but this PR as it stands prevents using service load balancers with services that run on masters, which limits options like self hosting and use.

Now that node load balancing is more nuanced (with health checking endpoints), do we even need this? Masters should not be marked as healthy when pods aren't run on them, which means no traffic would go to masters?

I think excluding masters from service load balancer is a bug and needs a more nuanced design. I'm going to bump priority here because I've got users trying to run pods on masters that have service LB on them that can't use service LB.

smarterclayton · 2018-08-21T18:58:03Z

Changing title to be more accurate.

smarterclayton · 2018-08-21T19:00:24Z

I think the correct fix here was the service LB health check support for targeting the serving pool to nodes that hold the pod, and now that we have that we don't need this hack.

andrewsykim · 2018-08-21T19:02:37Z

To be clear, this would only apply for Services with externalTrafficPolicy: Local right? Is it okay to assume that users will always use "Local" for this case?

andrewsykim · 2018-08-21T19:03:35Z

I'm sure we could add logic in service controller to include master if externalTrafficPolicy == Local, exclude otherwise but that doesn't seem like an elegant solution 🤔

smarterclayton · 2018-08-21T19:35:17Z

For masters, it seems likely. I mean, in general I expect externalTrafficPolicy: Local to be the correct setting for the vast majority of LB services - is there a reason I would not want that policy in general use?

andrewsykim · 2018-08-21T20:06:13Z

In general I would agree that LBs would use externalTrafficPolicy: Local. One example where I wouldn't' though is if I have a service that needs to ingress traffic both from an LB and from another service in the same cluster. In that case, I would consider using externalTrafficPolicy: Cluster in which case I may not want the master to be in the pool of LB backends. Though to be fair this may be a rare case not worth looking into.

jhorwit2 · 2018-08-22T03:51:43Z

@smarterclayton this should perhaps be merged with #65013

The service controller today does some filtering based on masters in interesting ways. Historically it seems that masters were marked unschedulable to be excluded from lb services. Then once labels for roles became popular that got added as well.

jhorwit2 · 2018-08-22T03:55:32Z

I'm sure we could add logic in service controller to include master if externalTrafficPolicy == Local, exclude otherwise but that doesn't seem like an elegant solution thinking

This gets a little hacky with how backends are updated today. We would need to keep track of two different backend sets while updating each service.

We grew this in c22d042 (docs/user/aws/install_upi: Add 'sed' call to zero compute replicas, 2019-05-02, openshift#1649) to set the stage for changing the 'replicas: 0' semantics from "we'll make you some dummy MachineSets" to "we won't make you MachineSets". But that hasn't happened yet, and since 64f96df (scheduler: Use schedulable masters if no compute hosts defined, 2019-07-16, openshift#2004) 'replicas: 0' for compute has also meant "add the 'worker' role to control-plane nodes". That leads to racy problems when ingress comes through a load balancer, because Kubernetes load balancers exclude control-plane nodes from their target set [1,2] (although this may get relaxed soonish [3]). If the router pods get scheduled on the control plane machines due to the 'worker' role, they are not reachable from the load balancer and ingress routing breaks [4]. Seth says: > pod nodeSelectors are not like taints/tolerations. They only have > effect at scheduling time. They are not continually enforced. which means that attempting to address this issue as a day-2 operation would mean removing the 'worker' role from the control-plane nodes and then manually evicting the router pods to force rescheduling. So until we get the changes from [3], it's easier to just drop this section and keep the 'worker' role off the control-plane machines entirely. [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1671136#c1 [2]: kubernetes/kubernetes#65618 [3]: https://bugzilla.redhat.com/show_bug.cgi?id=1744370#c6 [4]: https://bugzilla.redhat.com/show_bug.cgi?id=1755073

We grew replicas-zeroing in c22d042 (docs/user/aws/install_upi: Add 'sed' call to zero compute replicas, 2019-05-02, openshift#1649) to set the stage for changing the 'replicas: 0' semantics from "we'll make you some dummy MachineSets" to "we won't make you MachineSets". But that hasn't happened yet, and since 64f96df (scheduler: Use schedulable masters if no compute hosts defined, 2019-07-16, openshift#2004) 'replicas: 0' for compute has also meant "add the 'worker' role to control-plane nodes". That leads to racy problems when ingress comes through a load balancer, because Kubernetes load balancers exclude control-plane nodes from their target set [1,2] (although this may get relaxed soonish [3]). If the router pods get scheduled on the control plane machines due to the 'worker' role, they are not reachable from the load balancer and ingress routing breaks [4]. Seth says: > pod nodeSelectors are not like taints/tolerations. They only have > effect at scheduling time. They are not continually enforced. which means that attempting to address this issue as a day-2 operation would mean removing the 'worker' role from the control-plane nodes and then manually evicting the router pods to force rescheduling. So until we get the changes from [3], we can either drop the zeroing [5] or adjust the scheduler configuration to remove the effect of the zeroing. In both cases, this is a change we'll want to revert later once we bump Kubernetes to pick up a fix for the service load-balancer targets. [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1671136#c1 [2]: kubernetes/kubernetes#65618 [3]: https://bugzilla.redhat.com/show_bug.cgi?id=1744370#c6 [4]: https://bugzilla.redhat.com/show_bug.cgi?id=1755073 [5]: openshift#2402

fejta-bot · 2019-11-07T23:08:45Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

We grew replicas-zeroing in c22d042 (docs/user/aws/install_upi: Add 'sed' call to zero compute replicas, 2019-05-02, openshift#1649) to set the stage for changing the 'replicas: 0' semantics from "we'll make you some dummy MachineSets" to "we won't make you MachineSets". But that hasn't happened yet, and since 64f96df (scheduler: Use schedulable masters if no compute hosts defined, 2019-07-16, openshift#2004) 'replicas: 0' for compute has also meant "add the 'worker' role to control-plane nodes". That leads to racy problems when ingress comes through a load balancer, because Kubernetes load balancers exclude control-plane nodes from their target set [1,2] (although this may get relaxed soonish [3]). If the router pods get scheduled on the control plane machines due to the 'worker' role, they are not reachable from the load balancer and ingress routing breaks [4]. Seth says: > pod nodeSelectors are not like taints/tolerations. They only have > effect at scheduling time. They are not continually enforced. which means that attempting to address this issue as a day-2 operation would mean removing the 'worker' role from the control-plane nodes and then manually evicting the router pods to force rescheduling. So until we get the changes from [3], we can either drop the zeroing [5] or adjust the scheduler configuration to remove the effect of the zeroing. In both cases, this is a change we'll want to revert later once we bump Kubernetes to pick up a fix for the service load-balancer targets. [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1671136#c1 [2]: kubernetes/kubernetes#65618 [3]: https://bugzilla.redhat.com/show_bug.cgi?id=1744370#c6 [4]: https://bugzilla.redhat.com/show_bug.cgi?id=1755073 [5]: openshift#2402

fejta-bot · 2019-12-07T23:51:33Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2020-01-07T00:38:00Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2020-01-07T00:38:08Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

wking · 2020-05-14T21:43:09Z

This was addressed via kubernetes/enhancements#1144 (see here) and #90126.

There should no longer be any issues running router pods on control plane nodes (i.e. kubernetes/kubernetes#65618 which was resolved in kubernetes/enhancements#1144). Remove this limitation from the docs. Signed-off-by: Stephen Finucane <stephenfin@redhat.com>

k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. kind/bug Categorizes issue or PR as related to a bug. labels Jun 29, 2018

k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 29, 2018

k8s-ci-robot assigned jhorwit2 Jul 5, 2018

k8s-ci-robot closed this as completed Jul 5, 2018

k8s-ci-robot reopened this Jul 5, 2018

k8s-ci-robot added sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. and removed sig/node Categorizes an issue or PR as relevant to SIG Node. labels Jul 5, 2018

k8s-ci-robot added the sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. label Jul 7, 2018

smarterclayton added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Aug 21, 2018

smarterclayton changed the title ~~A single node cluster cannot use external load balancer~~ Masters should not be excluded from service load balancers Aug 21, 2018

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 5, 2019

thockin removed their assignment Aug 9, 2019

wking mentioned this issue Sep 24, 2019

Bug 1755073: docs/user/*/install_upi: Drop compute replicas zeroing openshift/installer#2402

Closed

wking mentioned this issue Oct 1, 2019

Bug 1755073: docs/user/*/install_upi: explicitly-set-control-plane-unschedulable openshift/installer#2440

Merged

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 7, 2019

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 7, 2019

k8s-ci-robot closed this as completed Jan 7, 2020

wking mentioned this issue Jun 24, 2020

Rework the Understanding Node Rebooting topic openshift/openshift-docs#23256

Closed

LKaemmerling mentioned this issue Jun 25, 2020

v1.6.0: Targets are not adding to the load balancer hetznercloud/hcloud-cloud-controller-manager#53

Closed

This was referenced Dec 19, 2020

host-port-registry: Add HAProxy ports openshift/enhancements#568

Merged

Azure LB Availability Set assumptions too restrictive #97375

Closed

tomerleib mentioned this issue Jan 5, 2021

K3s reapplies "master" node-role label to single node cluster on server restart. k3s-io/k3s#2124

Closed

yuvipanda mentioned this issue Feb 14, 2021

Evaluate kops vs EKS 2i2c-org/farallon-image#28

Closed

GodSpeedXI mentioned this issue Feb 15, 2021

Istio can't synchronize to AWS NLB istio/istio#30856

Closed

sibucan mentioned this issue Mar 24, 2022

CCM needs agent/worker nodes to create the automatic loadbalancer for a service/ingress linode/linode-cloud-controller-manager#107

Closed

stephenfin mentioned this issue Jun 13, 2022

Bug 2096376: [openstack] Remove limitation on single node deployments openshift/installer#5997

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Masters should not be excluded from service load balancers #65618

Masters should not be excluded from service load balancers #65618

ljani commented Jun 29, 2018 •

edited

Loading

ljani commented Jun 29, 2018

k8s-ci-robot commented Jun 29, 2018

jhorwit2 commented Jul 5, 2018

ljani commented Jul 5, 2018

jhorwit2 commented Jul 5, 2018

ljani commented Jul 5, 2018

jhorwit2 commented Jul 5, 2018 •

edited

Loading

k8s-ci-robot commented Jul 5, 2018

neolit123 commented Jul 7, 2018 •

edited

Loading

smarterclayton commented Aug 21, 2018

smarterclayton commented Aug 21, 2018

smarterclayton commented Aug 21, 2018

andrewsykim commented Aug 21, 2018

andrewsykim commented Aug 21, 2018 •

edited

Loading

smarterclayton commented Aug 21, 2018

andrewsykim commented Aug 21, 2018 •

edited

Loading

jhorwit2 commented Aug 22, 2018

jhorwit2 commented Aug 22, 2018

fejta-bot commented Nov 7, 2019

fejta-bot commented Dec 7, 2019

fejta-bot commented Jan 7, 2020

k8s-ci-robot commented Jan 7, 2020

wking commented May 14, 2020

Masters should not be excluded from service load balancers #65618

Masters should not be excluded from service load balancers #65618

Comments

ljani commented Jun 29, 2018 • edited Loading

ljani commented Jun 29, 2018

k8s-ci-robot commented Jun 29, 2018

jhorwit2 commented Jul 5, 2018

ljani commented Jul 5, 2018

jhorwit2 commented Jul 5, 2018

ljani commented Jul 5, 2018

jhorwit2 commented Jul 5, 2018 • edited Loading

k8s-ci-robot commented Jul 5, 2018

neolit123 commented Jul 7, 2018 • edited Loading

smarterclayton commented Aug 21, 2018

smarterclayton commented Aug 21, 2018

smarterclayton commented Aug 21, 2018

andrewsykim commented Aug 21, 2018

andrewsykim commented Aug 21, 2018 • edited Loading

smarterclayton commented Aug 21, 2018

andrewsykim commented Aug 21, 2018 • edited Loading

jhorwit2 commented Aug 22, 2018

jhorwit2 commented Aug 22, 2018

fejta-bot commented Nov 7, 2019

fejta-bot commented Dec 7, 2019

fejta-bot commented Jan 7, 2020

k8s-ci-robot commented Jan 7, 2020

wking commented May 14, 2020

ljani commented Jun 29, 2018 •

edited

Loading

jhorwit2 commented Jul 5, 2018 •

edited

Loading

neolit123 commented Jul 7, 2018 •

edited

Loading

andrewsykim commented Aug 21, 2018 •

edited

Loading

andrewsykim commented Aug 21, 2018 •

edited

Loading