Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug that caused agents to bypass local loadbalancer #10280

Merged
merged 1 commit into from
Jun 4, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 22 additions & 8 deletions pkg/agent/config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -375,9 +375,10 @@ func get(ctx context.Context, envInfo *cmds.Agent, proxy proxy.Proxy) (*config.N
if controlConfig.SupervisorPort != controlConfig.HTTPSPort {
isIPv6 := utilsnet.IsIPv6(net.ParseIP([]string{envInfo.NodeIP.String()}[0]))
if err := proxy.SetAPIServerPort(controlConfig.HTTPSPort, isIPv6); err != nil {
return nil, errors.Wrapf(err, "failed to setup access to API Server port %d on at %s", controlConfig.HTTPSPort, proxy.SupervisorURL())
return nil, errors.Wrapf(err, "failed to set apiserver port to %d", controlConfig.HTTPSPort)
}
}
apiServerURL := proxy.APIServerURL()

var flannelIface *net.Interface
if controlConfig.FlannelBackend != config.FlannelBackendNone && len(envInfo.FlannelIface) > 0 {
Expand Down Expand Up @@ -482,40 +483,53 @@ func get(ctx context.Context, envInfo *cmds.Agent, proxy proxy.Proxy) (*config.N

os.Setenv("NODE_NAME", nodeName)

// Ensure that the kubelet's server certificate is valid for all configured node IPs. Note
// that in the case of an external CCM, additional IPs may be added by the infra provider
// that the cert will not be valid for, as they are not present in the list collected here.
nodeExternalAndInternalIPs := append(nodeIPs, nodeExternalIPs...)

// Ask the server to generate a kubelet server cert+key. These files are unique to this node.
servingCert, err := getServingCert(nodeName, nodeExternalAndInternalIPs, servingKubeletCert, servingKubeletKey, newNodePasswordFile, info)
if err != nil {
return nil, err
return nil, errors.Wrap(err, servingKubeletCert)
}

// Ask the server to genrate a kubelet client cert+key. These files are unique to this node.
if err := getNodeNamedHostFile(clientKubeletCert, clientKubeletKey, nodeName, nodeIPs, newNodePasswordFile, info); err != nil {
return nil, err
return nil, errors.Wrap(err, clientKubeletCert)
}

// Generate a kubeconfig for the kubelet.
kubeconfigKubelet := filepath.Join(envInfo.DataDir, "agent", "kubelet.kubeconfig")
if err := deps.KubeConfig(kubeconfigKubelet, proxy.APIServerURL(), serverCAFile, clientKubeletCert, clientKubeletKey); err != nil {
if err := deps.KubeConfig(kubeconfigKubelet, apiServerURL, serverCAFile, clientKubeletCert, clientKubeletKey); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity: given that the bug in how we set proxy.APIServerURL() is fixed, both proxy.APIServerURL() or apiServerURL will point to the same server, right? The former via the loadBalancer but at this "booting" step I expect it to only have one server backend

Copy link
Member Author

@brandond brandond Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but I figured it made a little more sense to just retrieve it once and then use a local variable? It shouldn't really make a difference, no since the APIServerURL function itself just exposes a field on the proxy struct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. I just wanted to doublecheck if my understanding was correct :)

return nil, err
}

clientKubeProxyCert := filepath.Join(envInfo.DataDir, "agent", "client-kube-proxy.crt")
clientKubeProxyKey := filepath.Join(envInfo.DataDir, "agent", "client-kube-proxy.key")

// Ask the server to send us its kube-proxy client cert+key. These files are not unique to this node.
if err := getHostFile(clientKubeProxyCert, clientKubeProxyKey, info); err != nil {
return nil, err
return nil, errors.Wrap(err, clientKubeProxyCert)
}

// Generate a kubeconfig for kube-proxy.
kubeconfigKubeproxy := filepath.Join(envInfo.DataDir, "agent", "kubeproxy.kubeconfig")
if err := deps.KubeConfig(kubeconfigKubeproxy, proxy.APIServerURL(), serverCAFile, clientKubeProxyCert, clientKubeProxyKey); err != nil {
if err := deps.KubeConfig(kubeconfigKubeproxy, apiServerURL, serverCAFile, clientKubeProxyCert, clientKubeProxyKey); err != nil {
return nil, err
}

clientK3sControllerCert := filepath.Join(envInfo.DataDir, "agent", "client-"+version.Program+"-controller.crt")
clientK3sControllerKey := filepath.Join(envInfo.DataDir, "agent", "client-"+version.Program+"-controller.key")

// Ask the server to send us its agent controller client cert+key. These files are not unique to this node.
if err := getHostFile(clientK3sControllerCert, clientK3sControllerKey, info); err != nil {
return nil, err
return nil, errors.Wrap(err, clientK3sControllerCert)
}

// Generate a kubeconfig for the agent controller.
kubeconfigK3sController := filepath.Join(envInfo.DataDir, "agent", version.Program+"controller.kubeconfig")
if err := deps.KubeConfig(kubeconfigK3sController, proxy.APIServerURL(), serverCAFile, clientK3sControllerCert, clientK3sControllerKey); err != nil {
if err := deps.KubeConfig(kubeconfigK3sController, apiServerURL, serverCAFile, clientK3sControllerCert, clientK3sControllerKey); err != nil {
return nil, err
}

Expand Down
7 changes: 6 additions & 1 deletion pkg/agent/loadbalancer/loadbalancer.go
Original file line number Diff line number Diff line change
Expand Up @@ -172,14 +172,19 @@ func (lb *LoadBalancer) dialContext(ctx context.Context, network, _ string) (net
return conn, nil
}
logrus.Debugf("Dial error from load balancer %s: %s", lb.serviceName, err)
// Don't close connections to the failed server if we're retrying with health checks ignored.
// We don't want to disrupt active connections if it is unlikely they will have anywhere to go.
if !allChecksFailed {
defer server.closeAll()
}
}

newServer, err := lb.nextServer(targetServer)
if err != nil {
return nil, err
}
if targetServer != newServer {
logrus.Debugf("Failed over to new server for load balancer %s: %s", lb.serviceName, newServer)
logrus.Debugf("Failed over to new server for load balancer %s: %s -> %s", lb.serviceName, targetServer, newServer)
}
if ctx.Err() != nil {
return nil, ctx.Err()
Expand Down
17 changes: 12 additions & 5 deletions pkg/agent/proxy/apiproxy.go
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,7 @@ func NewSupervisorProxy(ctx context.Context, lbEnabled bool, dataDir, supervisor
p.fallbackSupervisorAddress = u.Host
p.supervisorPort = u.Port()

logrus.Debugf("Supervisor proxy using supervisor=%s apiserver=%s lb=%v", p.supervisorURL, p.apiServerURL, p.lbEnabled)
return &p, nil
}

Expand Down Expand Up @@ -132,29 +133,35 @@ func (p *proxy) setSupervisorPort(addresses []string) []string {
// load-balancer, and the address of this load-balancer is returned instead of the actual apiserver
// addresses.
func (p *proxy) SetAPIServerPort(port int, isIPv6 bool) error {
if p.apiServerEnabled {
logrus.Debugf("Supervisor proxy apiserver port already set")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, this is the agent proxy, not the supervisor proxy. By supervisor proxy, I understand the one we use for egressSelector in the supervisor

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We call this the "supervisor proxy" because it is a proxy for retrieving supervisor (and now apiserver as well) addresses. This is covered in the doc comment:

// NewSupervisorProxy sets up a new proxy for retrieving supervisor and apiserver addresses. If
// lbEnabled is true, a load-balancer is started on the requested port to connect to the supervisor
// address, and the address of this local load-balancer is returned instead of the actual supervisor
// and apiserver addresses.
// NOTE: This is a proxy in the API sense - it returns either actual server URLs, or the URL of the
// local load-balancer. It is not actually responsible for proxying requests at the network level;
// this is handled by the load-balancers that the proxy optionally steers connections towards.
func NewSupervisorProxy(ctx context.Context, lbEnabled bool, dataDir, supervisorURL string, lbServerPort int, isIPv6 bool) (Proxy, error) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aaaah ok, ufff it can easily be confusing. The supervisorproxy carries the supervisor and the kube-api addresses and load-balancers

return nil
}

u, err := url.Parse(p.initialSupervisorURL)
if err != nil {
return errors.Wrapf(err, "failed to parse server URL %s", p.initialSupervisorURL)
}
p.apiServerPort = strconv.Itoa(port)
u.Host = sysnet.JoinHostPort(u.Hostname(), p.apiServerPort)

p.apiServerURL = u.String()
p.apiServerEnabled = true

if p.lbEnabled && p.apiServerLB == nil {
lbServerPort := p.lbServerPort
if lbServerPort != 0 {
lbServerPort = lbServerPort - 1
}
lb, err := loadbalancer.New(p.context, p.dataDir, loadbalancer.APIServerServiceName, p.apiServerURL, lbServerPort, isIPv6)
lb, err := loadbalancer.New(p.context, p.dataDir, loadbalancer.APIServerServiceName, u.String(), lbServerPort, isIPv6)
if err != nil {
return err
}
p.apiServerURL = lb.LoadBalancerServerURL()
p.apiServerLB = lb
p.apiServerURL = lb.LoadBalancerServerURL()
} else {
p.apiServerURL = u.String()
}

logrus.Debugf("Supervisor proxy apiserver port changed; apiserver=%s lb=%v", p.apiServerURL, p.lbEnabled)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment. I think supervisor proxy is confusing here

Copy link
Member Author

@brandond brandond Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it is a little confusing that we have so many proxies. This one is a proxy for retrieving supervisor and apiserver addresses without having to know if the loadbalancer is enabled, in that it provides a layer of abstraction that allows us to avoid hardcoding addresses.

We have a similar "proxy" for etcd here as well:

// NewETCDProxy initializes a new proxy structure that contain a load balancer
// which listens on port 2379 and proxy between etcd cluster members
func NewETCDProxy(ctx context.Context, supervisorPort int, dataDir, etcdURL string, isIPv6 bool) (Proxy, error) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, thanks for the clarification

p.apiServerEnabled = true
return nil
}

Expand Down
2 changes: 1 addition & 1 deletion pkg/agent/tunnel/tunnel.go
Original file line number Diff line number Diff line change
Expand Up @@ -451,7 +451,7 @@ func (a *agentTunnel) connect(rootCtx context.Context, waitGroup *sync.WaitGroup
err := remotedialer.ConnectToProxy(ctx, wsURL, nil, auth, ws, onConnect)
connected = false
if err != nil && !errors.Is(err, context.Canceled) {
logrus.WithField("url", wsURL).WithError(err).Error("Remotedialer proxy error; reconecting...")
logrus.WithField("url", wsURL).WithError(err).Error("Remotedialer proxy error; reconnecting...")
// wait between reconnection attempts to avoid hammering the server
time.Sleep(endpointDebounceDelay)
}
Expand Down