-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bug that caused agents to bypass local loadbalancer #10280
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #10280 +/- ##
==========================================
- Coverage 47.89% 41.69% -6.20%
==========================================
Files 177 177
Lines 14792 14801 +9
==========================================
- Hits 7085 6172 -913
- Misses 6362 7450 +1088
+ Partials 1345 1179 -166
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
If proxy.SetAPIServerPort was called multiple times, all calls after the first one would cause the apiserver address to be set to the default server address, bypassing the local load-balancer. This was most likely to occur on RKE2, where the supervisor may be up for a period of time before it is ready to manage node password secrets, causing the agent to retry. Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
4496622
to
967cc14
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM and that's a great catch! Just a few comments. BTW, is it possible to "decollocate" kube-api and supervisor in K3s via config?
@@ -132,29 +133,35 @@ func (p *proxy) setSupervisorPort(addresses []string) []string { | |||
// load-balancer, and the address of this load-balancer is returned instead of the actual apiserver | |||
// addresses. | |||
func (p *proxy) SetAPIServerPort(port int, isIPv6 bool) error { | |||
if p.apiServerEnabled { | |||
logrus.Debugf("Supervisor proxy apiserver port already set") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, this is the agent proxy, not the supervisor proxy. By supervisor proxy, I understand the one we use for egressSelector in the supervisor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We call this the "supervisor proxy" because it is a proxy for retrieving supervisor (and now apiserver as well) addresses. This is covered in the doc comment:
k3s/pkg/agent/proxy/apiproxy.go
Lines 28 to 35 in 79ba10f
// NewSupervisorProxy sets up a new proxy for retrieving supervisor and apiserver addresses. If | |
// lbEnabled is true, a load-balancer is started on the requested port to connect to the supervisor | |
// address, and the address of this local load-balancer is returned instead of the actual supervisor | |
// and apiserver addresses. | |
// NOTE: This is a proxy in the API sense - it returns either actual server URLs, or the URL of the | |
// local load-balancer. It is not actually responsible for proxying requests at the network level; | |
// this is handled by the load-balancers that the proxy optionally steers connections towards. | |
func NewSupervisorProxy(ctx context.Context, lbEnabled bool, dataDir, supervisorURL string, lbServerPort int, isIPv6 bool) (Proxy, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aaaah ok, ufff it can easily be confusing. The supervisorproxy carries the supervisor and the kube-api addresses and load-balancers
} | ||
|
||
logrus.Debugf("Supervisor proxy apiserver port changed; apiserver=%s lb=%v", p.apiServerURL, p.lbEnabled) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment. I think supervisor proxy
is confusing here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it is a little confusing that we have so many proxies. This one is a proxy for retrieving supervisor and apiserver addresses without having to know if the loadbalancer is enabled, in that it provides a layer of abstraction that allows us to avoid hardcoding addresses.
We have a similar "proxy" for etcd here as well:
Lines 34 to 36 in 79ba10f
// NewETCDProxy initializes a new proxy structure that contain a load balancer | |
// which listens on port 2379 and proxy between etcd cluster members | |
func NewETCDProxy(ctx context.Context, supervisorPort int, dataDir, etcdURL string, isIPv6 bool) (Proxy, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, thanks for the clarification
kubeconfigKubelet := filepath.Join(envInfo.DataDir, "agent", "kubelet.kubeconfig") | ||
if err := deps.KubeConfig(kubeconfigKubelet, proxy.APIServerURL(), serverCAFile, clientKubeletCert, clientKubeletKey); err != nil { | ||
if err := deps.KubeConfig(kubeconfigKubelet, apiServerURL, serverCAFile, clientKubeletCert, clientKubeletKey); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity: given that the bug in how we set proxy.APIServerURL()
is fixed, both proxy.APIServerURL()
or apiServerURL
will point to the same server, right? The former via the loadBalancer but at this "booting" step I expect it to only have one server backend
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but I figured it made a little more sense to just retrieve it once and then use a local variable? It shouldn't really make a difference, no since the APIServerURL
function itself just exposes a field on the proxy struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. I just wanted to doublecheck if my understanding was correct :)
Proposed Changes
Fix bug that caused agents to bypass local loadbalancer
If proxy.SetAPIServerPort was called multiple times, all calls after the first one would cause the apiserver address to be set to the default server address, bypassing the local load-balancer. This was most likely to occur on RKE2, where the supervisor may be up for a longer period of time before it is ready to manage node password secrets, causing the agent to retry. Also, K3s does not ever take the affected code path due to the apiserver and supervisor always using the same port.
I also added some comments to the code as I was stepping through it trying to figure out what's going on.
Types of Changes
bugfix
Verification
Testing
Linked Issues
User-Facing Change
Further Comments