-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix multi-node cluster not working after restarting docker #2671
Conversation
Welcome @tnqn! |
Hi @tnqn. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
# kubeconfig should be updated. However, the server certificate isn't valid for the new IP. We update it to | ||
# loopback address as it's an alternative address of the certificate. | ||
if [[ "${old_ipv4}" != "${curr_ipv4}" ]]; then | ||
sed -i "s#${old_ipv4}#127.0.0.1#" /etc/kubernetes/controller-manager.conf || true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, @neolit123 how are these files generated?
can we influence them?
it seems easier to generate them with the localhost IP instead of the node IP, they will only contact the local apiserver ... in HA we have one of these in each CP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what I'm saying is to modify the code in KIND so the scheduler and the controller-manager always use API server ip address = localhost
, maybe there is a knob or a flag to do it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @aojea. I agree with you and I actually tried it but it seems more complicated.
The address in controller-manager.conf and scheduler.conf comes from localAPIEndpoint.advertiseAddress
:
kind/pkg/cluster/internal/kubeadm/config.go
Lines 210 to 211 in b4ab02c
localAPIEndpoint: | |
advertiseAddress: "{{ .AdvertiseAddress }}" |
https://github.com/kubernetes/kubernetes/blob/cdee77a4a9ba77a01bab6b462c1780d7a52af8c3/cmd/kubeadm/app/phases/kubeconfig/kubeconfig.go#L436
But localAPIEndpoint.advertiseAddress
is used in many places in kubeadm so we can't simply replace it with localhost IP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the commit says kubernetes/kubernetes@d944190
Instead of using the CPE for these components, use the LocalAPIEndpoint.
This guarantees that the components would talk to the local
kube-apiserver, which should be the same version, unless the user
explicitly patched manifests.
wonder if we can patch it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess kubeadm uses the local node IP instead of localhost because kubeadm doesn't add localhost to alternative addresses of the server certificate by default.
kind adds it for specific reason:
kind/pkg/cluster/internal/kubeadm/config.go
Lines 167 to 171 in b4ab02c
# on docker for mac we have to expose the api server via port forward, | |
# so we need to ensure the cert is valid for localhost so we can talk | |
# to the cluster after rewriting the kubeconfig to point to localhost | |
apiServer: | |
certSANs: [localhost, "{{.APIServerAddress}}"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
localAPIEndpoint.advertiseAddress is an IP because that's what kube-apiserver supports. Advertise / bind addresses for k8s components are IPs...they cannot be hostnames. So it must be 127.0.0.1 and not localhost. Just wanted to be clear about that.
I think I've seen users binding everything to loopback and it should work. I haven't tried it. If something in kubeadm is blocking you let me know. But also check if the manifest patches work for you ALA initconfiguration.patches
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nah, this is not about advertise addess, is about setting the destination address for the controller-manager and the scheduler ... but I understand now, it uses the advertise address for that , so I found the circle now ...
If kubeadm was fine until now this way I don't think we should modify it just for this
/ok-to-test I would prefer to do less magic |
# controller-manager and scheduler connect to local API endpoint. If the Node's IP changes, server address in | ||
# kubeconfig should be updated. However, the server certificate isn't valid for the new IP. We update it to | ||
# loopback address as it's an alternative address of the certificate. | ||
if [[ "${old_ipv4}" != "${curr_ipv4}" ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are unconditionally changing the IP to 127.0.0.1 , do we need the if?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this check because if docker restart doesn't change the Node's IP, it can continue working with the original IP. I'm not sure which is better: use the trick when it has to, or use it unconditionally as long as the container is restarted. Would like to hear your opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, that sounds safer, we don't change the current behavior
/lgtm I can't think in a better solution , we are already doing "magic" for reboots, this is building on that so 🤷 |
Thank you for this! I think we should probably be altering the config during the initial bringup if at all possible, rather than on-reboot, where we have static changes like this (substituting loopback). If kubeadm config doesn't enable this directly, we may have to move into using the phases and generating, which will be trickier but probably the better longterm approach 🤔 |
@BenTheElder Thanks for the suggestion. I investigated several approaches to alter the config in the beginning but found they seem not ideal:
|
yes, sorry. this change will be more involved, alternatively if we can make it possible to specify everything to kubeadm between patches and other inputs that would be reasonable. |
[I just realized I never replied to this, I don't know how it slipped through ... so many notifications, I apologize] I think most ideal is one of:
Doing it only upon reboot will be trickier to debug / surprising and be more likely to leave this broken in the future as we won't be doing that in CI. |
@BenTheElder @neolit123 @aojea I didn't find a suitable option name to specify the server address for kube-controller-manager and kube-scheduler separately in kubeadm, and the change may sound too specific to kind. So I followed @neolit123's suggestion at #2671 (comment) and extended kubeadm's patching support to kubeconfig files (kubernetes/kubernetes#109886). This PR has been updated accordingly. Please let me know your opinions. |
New changes are detected. LGTM label has been removed. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: tnqn The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
In a multi-node cluster with single controlplane node, if the controlplane node's IP changes, kube-controller-manager and kube-scheduler would fail to connect kube-apiserver. Updating the server address to new IP doesn't work because the API server's certificate isn't valid for it. This patch uses "patches" option of kubeadm to replace the server address in the kubeconfig files of kube-controller-manager and kube-scheduler with loopback address, which is an alternative address of the API server's certificate. Signed-off-by: Quan Tian <qtian@vmware.com>
The kubeadm patches feature has a KEP so ideally we should document the change and rationale before adding it. Personally i do not see a problem with adding kubeconfig patches but i need to double check. Phases should already support what is required here:
Then again, one of the reasons we added patches was because users did not like phases for some customizations. I can comment more on kubernetes/kubernetes#109886 Trade off - If it merges, only kubeadm 1.25 will support this, while using phases is supported for a number of old kubeadm versions already. |
@neolit123 thanks for your comment. I thought about using phases but found:
It seems it would become a bit coupled with the content of phases and may need some adjustment when kubeadm phases change, not sure if it's a suitable long term solution. For kubernetes/kubernetes#109886, if we agree to this approach, I would be happy to follow your guidance to add documentation for the change. |
can you clarify what is the blocker for using this in kind via
errors are not returned for so unless i'm missing something kubeadm should allow you to bind all components and kubeconfigs (except kubelets) to loopback by setting it in |
these seem tolerable. patches also have complexity as kubeadm would need to load files from disk (i.e. from the kind nodes).
phases are CLI and have a GA deprecation policy (unless marked as experimental) in kubeadm (> 1 year / change if it happens).
i think the biggest issue is this trade off that i've mentioned:
|
It would return the error "unable to select an IP from lo network interface" because setting loopback address has a special meaning as explained by the below comment and it expects another global IP on loopback device. |
i vaguely recall when that logic was added and also recall the "-1"s from people when "just allowing loopback" was discussed. so one question here would be whether we want to change the logic and just allow loopback addresses. |
It seems even kubeadm allows loopback address, kube-apiserver validation doesn't allow it (when it tries to sync
|
Does this mean that even if we use kubeadm phases or patches, apiserver will just fail if we use loopback address? How would the kind multi CP node restart feature work in that case? |
I lost track and could not read all messages, so I may miss some context
that validation is for Services and Endpoints, those IPs should be reachable, localhost is not valid, but it's orthogonal to what we want to do , that is tell controller-manager and scheduler to use the 127.0.0.1 to communicate with the apiserver. The thing is that the apiserver address is obtained from ... it is something like controller-manager, scheduler --use-loopback |
Yes, what the patch does only affects which address kube-controller-manager and kube-scheduler use to connect to kube-apiserver, not the address kube-apiserver listens to. |
Incidentally, this is why our contributing guide asks that features first be discussed in issues prior to implementation, it's hard to keep up with review + approach discussion on PRs. https://kind.sigs.k8s.io/docs/contributing/getting-started/ I think this is also the case with the kubeadm OWNERS. I really appreciate the effort on this though. |
|
||
patches := map[string]string{} | ||
// Kubernetes older than v1.25 don't support patching kubeconfig files. | ||
if ver.AtLeast(version.MustParseSemantic("v1.25.0")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we'll want to make this more precise to the version that supports this, so we can start testing it sooner. we can get a version after merge if it merges with hack/print-workspace-status.sh
if ver.AtLeast(version.MustParseSemantic("v1.25.0")) { | ||
// controller-manager and scheduler connect to local API endpoint, which defaults to the advertise address of the | ||
// API server. If the Node's IP changes (which could happen after docker restarts), the server address in KubeConfig | ||
// should be updated. However, the server certificate isn't valid for the new IP. To resolve it, we update the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should probably fix this, perhaps in the entrypoint we can re-roll the cert if it exists?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion. It reminds me that since we have the CA cert and key, we could just re-roll the cert and replace the server address in kubeconfig with the new Node IP like the existing magic:
kind/images/base/files/usr/local/bin/entrypoint
Lines 413 to 414 in 1a6bfa7
# kubernetes manifests are only present on control-plane nodes | |
sed -i "s#${old_ipv4}#${curr_ipv4}#" /etc/kubernetes/manifests/*.yaml || true |
Then everything should just work in theory. Do you think this is a better solution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried the above approach and it worked as expected, created issue #2759 to discuss the approaches.
loopbackAddress := "127.0.0.1" | ||
if data.IPFamily == config.IPv6Family { | ||
loopbackAddress = "[::1]" | ||
} | ||
jsonPatch := fmt.Sprintf(`[{"op": "replace", "path": "/clusters/0/cluster/server", "value": "https://%s:%d"}]`, loopbackAddress, data.APIBindPort) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
loopbackAddress := "127.0.0.1" | |
if data.IPFamily == config.IPv6Family { | |
loopbackAddress = "[::1]" | |
} | |
jsonPatch := fmt.Sprintf(`[{"op": "replace", "path": "/clusters/0/cluster/server", "value": "https://%s:%d"}]`, loopbackAddress, data.APIBindPort) | |
loopbackAddress := "127.0.0.1" | |
if data.IPFamily == config.IPv6Family { | |
loopbackAddress = "::1" | |
} | |
jsonPatch := fmt.Sprintf(`[{"op": "replace", "path": "/clusters/0/cluster/server", "value": "https://%s"}]`, net.JoinHostPort(loopbackAddress, strconv.Itoa(data.APIBindPort))) |
@tnqn: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@tnqn: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
#2775, thanks! |
This seems to still occur on v0.18.0, is there a specific usage change needed to use #2775 ? |
@SiegfredRodriguez please file a bug with cluster logs etc unless it's #1689 which is a known limitation. |
In a multi-node cluster with single controlplane node, if the
controlplane node's IP changes, kube-controller-manager and
kube-scheduler would fail to connect kube-apiserver. Updating the
server address to new IP doesn't work because the API server's
certificate isn't valid for it.
This patch uses "patches" option of kubeadm to replace the server
address in the kubeconfig files of kube-controller-manager and
kube-scheduler with loopback address, which is an alternative address
of the API server's certificate.
Signed-off-by: Quan Tian qtian@vmware.com
For #1685
Depends on kubernetes/kubernetes#109886