Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update operator Pod to speed up secret propagation #5519

Merged
merged 17 commits into from
Apr 6, 2022

Conversation

miriam-eid
Copy link
Contributor

Mark the operator pod as updated to speed up secret propagation.

Fixes #3321.

@miriam-eid miriam-eid added >bug Something isn't working v2.2.0 labels Mar 24, 2022
Copy link
Contributor

@thbkrkr thbkrkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Left some minor comments.

CI reported some linter issues. You should be able to run make lint locally to verify that you have fixed them.

pkg/controller/webhook/reconcile.go Outdated Show resolved Hide resolved
pkg/controller/webhook/reconcile.go Outdated Show resolved Hide resolved
pkg/controller/webhook/reconcile.go Outdated Show resolved Hide resolved
pkg/controller/webhook/reconcile_test.go Outdated Show resolved Hide resolved
pkg/controller/webhook/reconcile_test.go Outdated Show resolved Hide resolved
Copy link
Contributor

@barkbay barkbay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work ! 👍
I did a few tests and it seems to work as expected. I just left a few nits.

pkg/controller/webhook/reconcile.go Outdated Show resolved Hide resolved
pkg/controller/webhook/reconcile.go Outdated Show resolved Hide resolved
pkg/controller/webhook/reconcile.go Outdated Show resolved Hide resolved
pkg/controller/webhook/reconcile.go Outdated Show resolved Hide resolved

// updateOperatorPod updates a specific annotation on a single pod to speed up secret propagation.
func updateOperatorPod(ctx context.Context, pod corev1.Pod, clientset kubernetes.Interface) {
err := retry.RetryOnConflict(retry.DefaultRetry, func() error {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the ReconcileResources function is outside the reconciliation loop, I implemented this retry mechanism. Would it be worth it?

@miriam-eid miriam-eid requested a review from thbkrkr April 4, 2022 11:06
Copy link
Contributor

@barkbay barkbay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran a few other tests and it works great. In a new/cleaned namespace the first start of the operator is much more faster. It only takes ~1 sec for the operator to generate the certificates and resume its startup:

{
	"log.level": "info",
	"@timestamp": "2022-04-06T06:17:12.288Z",
	"message": "Polling for the webhook certificate to be available",

}
{
	"log.level": "info",
	"@timestamp": "2022-04-06T06:17:13.289Z",
	"message": "Starting the manager",
}

I also tested the retry mechanism (by generating conflicts using annotations), it feels a bit "over engineered" but it works as expected.

Copy link
Contributor

@barkbay barkbay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

Copy link
Contributor

@thbkrkr thbkrkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

pkg/controller/webhook/reconcile.go Outdated Show resolved Hide resolved
pkg/controller/webhook/reconcile_test.go Outdated Show resolved Hide resolved
@thbkrkr thbkrkr merged commit 49138b7 into elastic:main Apr 6, 2022
@miriam-eid miriam-eid deleted the operator-crash branch April 7, 2022 06:10
@thbkrkr thbkrkr changed the title Update operator pod to speed up secret propagation Update operator Pod to speed up secret propagation Apr 20, 2022
fantapsody pushed a commit to fantapsody/cloud-on-k8s that referenced this pull request Feb 7, 2023
…ion (elastic#5519)

When the operator is deployed for the first time, the Secret which contains the webhook certificate is populated
 by the operator (if the webhook certificate is managed by ECK, which is the case by default). Unfortunately it can
take some time for the content of the Secret to be propagated into the operator container. This commit marks the
operator pod as updated using an annotation to speed up the secret propagation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug Something isn't working v2.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Operator may crash on the first start
4 participants