Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add k8s endpoint check to markNodeReady #615

Merged
merged 1 commit into from
Aug 27, 2024

Conversation

HomayoonAlimohammadi
Copy link
Contributor

@HomayoonAlimohammadi HomayoonAlimohammadi commented Aug 21, 2024

Summary

onStart hook happens before onBootstrap. because of this, on non-fresh machines (non-fresh == /etc/kubernetes/admin.conf is available) we use invalid/old admin.conf kubeconfig for csrsigining controller client. this PR makes sure that we prevent running controllers until we have the correct .conf files and we can reach the k8s cluster.

How to test

build and install k8s on a fresh machine, run bootstrap and check logs and confirm the csrsigning controller is running, e.g.:

Aug 21 08:28:20 test k8s.k8sd[1642]: I0821 08:28:20.897429    1642 controller/controller.go:181] "Starting Controller" logger="k8sd.csrsigning" controller="certificatesignin
grequest" controllerGroup="certificates.k8s.io" controllerKind="CertificateSigningRequest"
Aug 21 08:28:21 test k8s.k8sd[1642]: I0821 08:28:21.005667    1642 controller/controller.go:215] "Starting workers" logger="k8sd.csrsigning" controller="certificatesigningre
quest" controllerGroup="certificates.k8s.io" controllerKind="CertificateSigningRequest" worker count=1

also confirm that there are kubeconfigs available in /etc/kubernetes/, specifically admin.conf.
now remove the k8s snap and reinstall k8s (same snap). Run bootstrap and like above confirm that csrsigning controller is started and running.

@HomayoonAlimohammadi HomayoonAlimohammadi requested a review from a team as a code owner August 21, 2024 08:45
Copy link
Contributor

@neoaggelos neoaggelos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think this is what we need here. The onStart hook runs every time k8sd starts. Will this not remove the kubeconfigs of a running cluster?

@HomayoonAlimohammadi
Copy link
Contributor Author

right, I didn't think about that. what if we run the csrsigning controller in onBootstrap hook? @neoaggelos

@neoaggelos
Copy link
Contributor

Perhaps an alternative would be to block the csrsigning controller starting before we can validate that we have a proper kubeconfig in place

@neoaggelos
Copy link
Contributor

For example, we could return the rest config from here

only if we can successfully call an endpoint (e.g. /readyz or a GetNode()), otherwise keep looping.

e.g. here is what we do on bootstrap (without the loop, as we don't want to keep using stale kubeconfigs)

// returning the error would abort, thus checking for nil
endpoint, err := c.CoreV1().Endpoints("default").Get(ctx, "kubernetes", metav1.GetOptions{})
return err == nil && endpoint != nil, nil

Copy link
Contributor

@neoaggelos neoaggelos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going in the right direction, but let's keep things simple rather than add layers on top of layers

src/k8s/pkg/client/kubernetes/status.go Outdated Show resolved Hide resolved
src/k8s/pkg/client/kubernetes/status.go Outdated Show resolved Hide resolved
src/k8s/pkg/client/kubernetes/status.go Outdated Show resolved Hide resolved
src/k8s/pkg/client/kubernetes/status.go Outdated Show resolved Hide resolved
src/k8s/pkg/k8sd/app/app.go Outdated Show resolved Hide resolved
src/k8s/pkg/k8sd/app/app.go Outdated Show resolved Hide resolved
src/k8s/pkg/k8sd/controllers/csrsigning/controller.go Outdated Show resolved Hide resolved
src/k8s/pkg/k8sd/app/app.go Outdated Show resolved Hide resolved
src/k8s/pkg/k8sd/app/app.go Outdated Show resolved Hide resolved
src/k8s/pkg/k8sd/app/app.go Outdated Show resolved Hide resolved
@HomayoonAlimohammadi HomayoonAlimohammadi marked this pull request as draft August 21, 2024 13:32
@HomayoonAlimohammadi
Copy link
Contributor Author

adding a centralized wait didn't seem to be possible. in order to mark the node as ready onStart should finish so we couldn't wait there. I think adding the k8s check on markNodeReady is the best way to go.

@HomayoonAlimohammadi HomayoonAlimohammadi marked this pull request as ready for review August 21, 2024 15:42
@HomayoonAlimohammadi HomayoonAlimohammadi changed the title Remove kubeconfigs in onStart hook Add k8s endpoint check to markNodeReady Aug 22, 2024
Copy link
Contributor

@bschimke95 bschimke95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

src/k8s/pkg/k8sd/app/app.go Show resolved Hide resolved
@HomayoonAlimohammadi HomayoonAlimohammadi merged commit edb4cc1 into main Aug 27, 2024
19 checks passed
@HomayoonAlimohammadi HomayoonAlimohammadi deleted the KU-1175/csr-purge-problem branch August 27, 2024 06:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants