-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update operator Pod to speed up secret propagation #5519
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Left some minor comments.
CI reported some linter issues. You should be able to run make lint
locally to verify that you have fixed them.
2926c98
to
afa51b2
Compare
03374bc
to
92de3f7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work ! 👍
I did a few tests and it seems to work as expected. I just left a few nits.
|
||
// updateOperatorPod updates a specific annotation on a single pod to speed up secret propagation. | ||
func updateOperatorPod(ctx context.Context, pod corev1.Pod, clientset kubernetes.Interface) { | ||
err := retry.RetryOnConflict(retry.DefaultRetry, func() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the ReconcileResources function is outside the reconciliation loop, I implemented this retry mechanism. Would it be worth it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran a few other tests and it works great. In a new/cleaned namespace the first start of the operator is much more faster. It only takes ~1 sec for the operator to generate the certificates and resume its startup:
{
"log.level": "info",
"@timestamp": "2022-04-06T06:17:12.288Z",
"message": "Polling for the webhook certificate to be available",
}
{
"log.level": "info",
"@timestamp": "2022-04-06T06:17:13.289Z",
"message": "Starting the manager",
}
I also tested the retry mechanism (by generating conflicts using annotations), it feels a bit "over engineered" but it works as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…ion (elastic#5519) When the operator is deployed for the first time, the Secret which contains the webhook certificate is populated by the operator (if the webhook certificate is managed by ECK, which is the case by default). Unfortunately it can take some time for the content of the Secret to be propagated into the operator container. This commit marks the operator pod as updated using an annotation to speed up the secret propagation.
Mark the operator pod as updated to speed up secret propagation.
Fixes #3321.