Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agent certificate generation retry causes agents to bypass local loadbalancer #10279

Closed
brandond opened this issue Jun 3, 2024 · 1 comment
Closed
Assignees
Milestone

Comments

@brandond
Copy link
Member

brandond commented Jun 3, 2024

Created based on incorrect behavior noticed when trying to reproduce rancher/rke2#5949. This was also reported in rancher/rke2#2101 but I failed to properly investigate the behavior.

This cannot be reproduced on K3s; it only affects RKE2 where the apiserver is not colocated with the supervisor.

If the server successfully returns agent config, but fails to generate agent certificates, config.get will be retried, which results in proxy.SetAPIServerPort being called multiple times:

if controlConfig.SupervisorPort != controlConfig.HTTPSPort {
isIPv6 := utilsnet.IsIPv6(net.ParseIP([]string{envInfo.NodeIP.String()}[0]))
if err := proxy.SetAPIServerPort(controlConfig.HTTPSPort, isIPv6); err != nil {
return nil, errors.Wrapf(err, "failed to setup access to API Server port %d on at %s", controlConfig.HTTPSPort, proxy.SupervisorURL())
}
}

When called multiple times, the agent will actually bypass the loadbalancer, and instead use the server directly. This is caused by a bug in SetAPIServerPort:

func (p *proxy) SetAPIServerPort(port int, isIPv6 bool) error {
u, err := url.Parse(p.initialSupervisorURL)
if err != nil {
return errors.Wrapf(err, "failed to parse server URL %s", p.initialSupervisorURL)
}
p.apiServerPort = strconv.Itoa(port)
u.Host = sysnet.JoinHostPort(u.Hostname(), p.apiServerPort)
p.apiServerURL = u.String()
p.apiServerEnabled = true
if p.lbEnabled && p.apiServerLB == nil {
lbServerPort := p.lbServerPort
if lbServerPort != 0 {
lbServerPort = lbServerPort - 1
}
lb, err := loadbalancer.New(p.context, p.dataDir, loadbalancer.APIServerServiceName, p.apiServerURL, lbServerPort, isIPv6)
if err != nil {
return err
}
p.apiServerURL = lb.LoadBalancerServerURL()
p.apiServerLB = lb
}

During the first call, p.apiServerURL is temporarily set to the default server URL, but then because p.lbEnabled && p.apiServerLB == nil is true, p.apiServerURL is set to the LoadBalancer address. On subsequent calls p.apiServerLB is not nil, so the temporary assignment is left in place, which causes the kubeconfig for various components to be generated pointing directly at the server URL, instead of the loadbalancer URL.

This can be seen with some additional debug logging:

INFO[0000] Starting rke2 agent v1.30.1+dev.d40e03c0 (d40e03c0b9a2ad9bd56d147272567c280278cf06)
INFO[0000] Adding server to load balancer rke2-agent-load-balancer: 172.17.0.8:9345
INFO[0000] Running load balancer rke2-agent-load-balancer 127.0.0.1:6444 -> [172.17.0.8:9345] [default: 172.17.0.8:9345]
DEBU[0000] Supervisor proxy started with supervisor=https://127.0.0.1:6444 apiserver=https://127.0.0.1:6444 lb=true
WARN[0000] Cluster CA certificate is not trusted by the host CA bundle, but the token does not include a CA hash. Use the full token from the server's node-token file to enable Cluster CA validation.
INFO[0000] Adding server to load balancer rke2-api-server-agent-load-balancer: 172.17.0.8:6443
INFO[0000] Running load balancer rke2-api-server-agent-load-balancer 127.0.0.1:6443 -> [172.17.0.8:6443] [default: 172.17.0.8:6443]
DEBU[0000] Supervisor proxy apiserver port changed; apiserver=https://127.0.0.1:6443 lb=true
INFO[0000] Waiting to retrieve agent configuration; server is not ready: get /var/lib/rancher/rke2/agent/serving-kubelet.crt: https://127.0.0.1:6444/v1-rke2/serving-kubelet.crt: 503 Service Unavailable
DEBU[0007] Supervisor proxy apiserver port changed; apiserver=https://172.17.0.8:6443 lb=true
INFO[0007] Waiting to retrieve agent configuration; server is not ready: get /var/lib/rancher/rke2/agent/serving-kubelet.crt: https://127.0.0.1:6444/v1-rke2/serving-kubelet.crt: 503 Service Unavailable
DEBU[0015] Supervisor proxy apiserver port changed; apiserver=https://172.17.0.8:6443 lb=true
INFO[0015] Waiting to retrieve agent configuration; server is not ready: get /var/lib/rancher/rke2/agent/serving-kubelet.crt: https://127.0.0.1:6444/v1-rke2/serving-kubelet.crt: 503 Service Unavailable
DEBU[0021] Supervisor proxy apiserver port changed; apiserver=https://172.17.0.8:6443 lb=true
INFO[0022] Waiting to retrieve agent configuration; server is not ready: get /var/lib/rancher/rke2/agent/serving-kubelet.crt: https://127.0.0.1:6444/v1-rke2/serving-kubelet.crt: 503 Service Unavailable
DEBU[0027] Supervisor proxy apiserver port changed; apiserver=https://172.17.0.8:6443 lb=true
INFO[0029] Using private registry config file at /etc/rancher/rke2/registries.yaml
@rancher-max
Copy link
Contributor

This was validated in rke2 where it can be reproduced, so closing this stub out without explicit testing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

2 participants