Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot join control plane to existing cluster with automatic copy of certificates (1.18) #2386

Closed
vbouchaud opened this issue Feb 2, 2021 · 6 comments
Labels
kind/support Categorizes issue or PR as a support question.

Comments

@vbouchaud
Copy link

vbouchaud commented Feb 2, 2021

What keywords did you search in kubeadm issues before filing this one?

join control plane cert certificate automatic

Multiple combinations of the above keywords.

Is this a BUG REPORT or FEATURE REQUEST?

BUG REPORT

Versions

kubeadm version (use kubeadm version):

kubeadm version: &version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.10+2.el8", GitCommit:"829f162e7aeba291f1ecaffb509f387d81f58273", GitTreeState:"clean", BuildDate:"2020-10-30T20:29:26Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.10+2.el8", GitCommit:"829f162e7aeba291f1ecaffb509f387d81f58273", GitTreeState:"clean", BuildDate:"2020-10-30T20:29:26Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.15", GitCommit:"73dd5c840662bb066a146d0871216333181f4b64", GitTreeState:"clean", BuildDate:"2021-01-13T13:14:05Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration:
    Oracle Cloud VM for each node (Virtual Machine, 2 core OCPU, 16 GB memory)
  • OS (e.g. from /etc/os-release):
    Oracle Linux Server 8.3
  • Kernel (e.g. uname -a):
    5.4.17-2036.102.0.2.el8uek.x86_64
  • Others:

What happened?

While working on an ansible playbook to initialize, configure and join kubernetes clusters, I created my first control plane with kubeadm init --config="init.yaml" using the following configuration file:

---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: systemd
---
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
controlPlaneEndpoint: "kube.test.local"
networking:
  podSubnet: "10.244.0.0/16"
---
apiVersion: kubeadm.k8s.io/v1beta2
kind: InitConfiguration
certificateKey: "{{ kubernetes_cluster_certificate_key }}"
localAPIEndpoint:
  advertiseAddress: 10.0.0.18
  bindPort: 6443
nodeRegistration:
  name: ol8-5
  criSocket: /var/run/crio/crio.sock
  taints:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
  kubeletExtraArgs:
    cgroup-driver: systemd

(not relevant to this issue but I know cgroup-driver information is redundant in this configuration, I'm not sure yet which one to remove and for now it only triggers a warning. I'm in a test phase so I did let both stay.)

I then used the following command to generate a token (and a command) for control planes to be able to join the new cluster: kubeadm token create --description='ansible generated token for control planes' --ttl=1h --print-join-command --certificate-key='{{ kubernetes_cluster_certificate_key }}', giving me the following string: kubeadm join kube.test.local:6443 --token ja3yw2.whwccz9hz7upm6ow --discovery-token-ca-cert-hash sha256:REDACTED --control-plane --certificate-key REDACTED that I used on a second machine.

I verified the token existence with kubeadm token list:

TOKEN                     TTL         EXPIRES                USAGES                   DESCRIPTION                                                EXTRA GROUPS
ge6abq.ewqv5qpjx3ni902g   23h         2021-02-03T10:22:42Z   authentication,signing   The default bootstrap token generated by 'kubeadm init'.   system:bootstrappers:kubeadm:default-node-token
ja3yw2.whwccz9hz7upm6ow   55m         2021-02-02T11:58:06Z   authentication,signing   ansible generated token for control planes                 system:bootstrappers:kubeadm:default-node-token

The output of the given command with -v=5 is as follow (truncated):

[download-certs] Downloading the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
secrets "kubeadm-certs" is forbidden: User "system:bootstrap:ja3yw2" cannot get resource "secrets" in API group "" in the namespace "kube-system"
error downloading the secret
k8s.io/kubernetes/cmd/kubeadm/app/phases/copycerts.DownloadCerts
	/root/rpmbuild/BUILD/kubernetes-1.18.10/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/cmd/kubeadm/app/phases/copycerts/copycerts.go:228
...

I had to add a ClusterRole and ClusterRoleBinding as follow (I used ClusterRole{,Binding}/kubeadm:get-nodes as template):

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: kubeadm:get-secrets
rules:
- apiGroups:
  - ""
  resources:
  - secrets
  verbs:
  - get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: kubeadm:get-secrets
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: kubeadm:get-secrets
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: system:bootstrappers:kubeadm:default-node-token

I then used the same kubeadm join ... command which successfully added this node to the cluster as a control plane.

What you expected to happen?

I expected the second control plane to join the first time and not to have to add the ClusterRole and ClusterRoleBinding.

How to reproduce it (as minimally and precisely as possible)?

By using the configuration file and the two or three commands I wrote in this issue.

(Basically, just initialize a cluster where the certificates are uploaded in a secret and try to join other control planes.)

Anything else we need to know?

  • Pod networking is done through kube-router.
  • I was asked to use the rpm provided by https://yum.oracle.com/repo/OracleLinux/OL8/olcne12/x86_64/ so for now, I'm stuck with kubernetes in 1.18.x and did not test this issue with the latest 1.20.x or any other release.
  • I can create a MR to add the ClusterRole and ClusterRoleBinding if needed.
@neolit123
Copy link
Member

neolit123 commented Feb 2, 2021

[download-certs] Downloading the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
secrets "kubeadm-certs" is forbidden: User "system:bootstrap:ja3yw2" cannot get resource "secrets" in API group "" in the namespace "kube-system"
error downloading the secret
k8s.io/kubernetes/cmd/kubeadm/app/phases/copycerts.DownloadCerts
/root/rpmbuild/BUILD/kubernetes-1.18.10/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/cmd/kubeadm/app/phases/copycerts/copycerts.go:228

as you can see kubeadm already does that during kubeadm init, when the certs are uploaded.
https://github.com/kubernetes/kubernetes/blob/release-1.18/cmd/kubeadm/app/phases/copycerts/copycerts.go#L123-L159

make sure that you are not skipping the "upload-certs" phase of "kubeadm init".

this also works fine in all of our e2e tests.

/kind support

@k8s-ci-robot k8s-ci-robot added the kind/support Categorizes issue or PR as a support question. label Feb 2, 2021
@neolit123
Copy link
Member

i cannot see anything wrong in your steps. make sure that you are not joining the node to a different cluster e.g. a cluster not created by kubeadm.

i tried reproducing your problem with the exact steps here https://www.katacoda.com/courses/kubernetes/playground (1.18) and it worked fine. closing as we usually don't provide support in this issue tracker, but feel free to update here with your findings.

#kubeadm on k8s slack or the support channels are a good way to get help with kubeadm:
https://github.com/kubernetes/kubernetes/blob/master/SUPPORT.md

@vbouchaud
Copy link
Author

Thank you for your answer.

make sure that you are not joining the node to a different cluster

I'm 100% positive this is not what happens since the VM I use to make my tests have a short span of life and there isn't anything beside them in the environment they are created.

make sure that you are not skipping the "upload-certs" phase of "kubeadm init".

I did not skip any phase intentionally. The command my ansible role is using is exactly kubeadm init --config="init.yaml" with the configuration file I provided.

Is it possible the configuration file I provided implicitly skip some phase? I'm not really comfortable with its syntax.

I'll try to do some more tests and will join #kubeadm channel on slack.

@neolit123
Copy link
Member

skipping phases via the config is not supported yet:
#2261

given you have:

certificateKey: "{{ kubernetes_cluster_certificate_key }}"

it should call the "upload-certs" phase.
look at the output of "kubeadm init" for lines such as this:
https://github.com/kubernetes/kubernetes/blob/release-1.18/cmd/kubeadm/app/phases/copycerts/copycerts.go#L89

@vbouchaud
Copy link
Author

I ran the same command again on a new VM only to find in the logs

[upload-certs] Skipping phase. Please see --upload-certs

I created a new VM to try kubeadm init --config="/etc/kubernetes/.init.yaml" --upload-certs and it did upload the certificates to the cluster.

[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace

My bad, I guess this is the intended behavior and I feel stupid for overlooking this.

Thanks again for your time.

@neolit123
Copy link
Member

neolit123 commented Feb 2, 2021

no problem.
the intended behavior is to not upload the certs for clusters that don't needed it e.g. clusters that have only one CP node.

one potential improvement would be on this line:
https://github.com/kubernetes/kubernetes/blob/1119a505aca14467accedf850daf30aa9c532ef2/cmd/kubeadm/app/cmd/init.go#L401

to add the following:

// If the certificate key is defined, assume the user wants to upload the control plane certificates 
if len(cfg.CertificateKey) != 0 {
   options.uploadCerts = true
}

and the user would not have to pass the --upload-certs flag explicitly.
this will also work when executing the phase on demand kubeadm init phase upload-certs.

cc @fabriziopandini

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/support Categorizes issue or PR as a support question.
Projects
None yet
Development

No branches or pull requests

3 participants