Allow overriding ctrl manager graceful shutdown timeout #570

relu · 2022-11-25T11:30:51Z

Overriding the default GracefulShutdownTimeout option given to the controller manager with a default of 0 (no timeout) since the helm operations are sensitive to interruption and can lead to leaving the HelmRelease in a bad state.

This will also allow users to override the option via a CLI flag -graceful-shutdown-timeout how much time to wait before forcibly exiting.

Overriding the default GracefulShutdownTimeout option given to the controller manager with a default of 0 (no timeout) since the helm operations are sensitive to interruption and can lead to leaving the HelmRelease in a bad state. This will also allow users to override the option via a cli flag `-graceful-shutdown-timeout` how much time to wait before forcibly exiting. Related to #569 Signed-off-by: Aurel Canciu <aurelcanciu@gmail.com>

relu · 2022-11-28T13:40:16Z

I've added a commit to remove the readinessProbe as well since without this change graceful shutdown won't work anyway.
The alternative would be to adjust its failureThreshold so it allows the container to run for a longer period of time (needing to match the terminationGracePeriodSeconds which we have set to 600 by default) before being terminated, but I don't think this is quite the right solution.
All in all, I don't really think there's much value in having the readinessProbe configured because the helm-controller does not expose a service endpoint and thus readiness doesn't quite make sense anyway in its runtime lifecycle context.

hiddeco · 2022-12-08T13:10:42Z

Sorry for the late reply, was floored by COVID the past couple of days.

I am concerned about the readiness probe being removed for scenarios in which someone has configured multiple replicas, as this would report the non-elected Pod as being Ready while it is not actually handling reconciliation requests. Which does touch the runtime lifecycle context without it exposing a Service endpoint.

stefanprodan · 2022-12-08T13:21:21Z

I don't think we should remove readiness in this PR. If we decided to do this, it should be done across all controllers which don't have ClusterIP services like KC, IRC, IAC.

relu · 2022-12-08T15:07:16Z

Sorry for the late reply, was floored by COVID the past couple of days.

No worries, I'm sorry to hear that, hope you're feeling better now!

Thanks for the feedback. I agree with both of your assessments. I think ideally this problem should be addressed somehow at the level of the controller-runtime, having the behavior changed so the probes are not shut down during graceful termination.

Going to remove the second commit.

stefanprodan

LGTM

Thanks @relu

PS. Please add the new flag to the controller options documentation here: https://github.com/fluxcd/website/blob/main/content/en/flux/components/helm/options.md

This is the correct default value as intended in #570. xref: https://github.com/kubernetes-sigs/controller-runtime/blob/92234b3c49a315a1aed54dc0655c3570d02faa38/pkg/manager/manager.go#L292-L293 Signed-off-by: Hidde Beydals <hello@hidde.co>

hiddeco approved these changes Nov 25, 2022

View reviewed changes

hiddeco added the enhancement New feature or request label Nov 25, 2022

relu requested a review from hiddeco November 28, 2022 13:40

pjbgf mentioned this pull request Dec 8, 2022

Prepare for v0.38 release fluxcd/flux2#3344

Closed

22 tasks

relu force-pushed the fix-graceful-shutdown branch from 20176bb to e242bb0 Compare December 8, 2022 15:09

stefanprodan approved these changes Dec 9, 2022

View reviewed changes

hiddeco merged commit 3340022 into main Dec 9, 2022

hiddeco deleted the fix-graceful-shutdown branch December 9, 2022 10:25

hiddeco mentioned this pull request Dec 20, 2022

Set --graceful-shutdown-timeout default to -1 #582

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow overriding ctrl manager graceful shutdown timeout #570

Allow overriding ctrl manager graceful shutdown timeout #570

relu commented Nov 25, 2022

relu commented Nov 28, 2022

hiddeco commented Dec 8, 2022

stefanprodan commented Dec 8, 2022 •

edited

Loading

relu commented Dec 8, 2022

stefanprodan left a comment

Allow overriding ctrl manager graceful shutdown timeout #570

Allow overriding ctrl manager graceful shutdown timeout #570

Conversation

relu commented Nov 25, 2022

relu commented Nov 28, 2022

hiddeco commented Dec 8, 2022

stefanprodan commented Dec 8, 2022 • edited Loading

relu commented Dec 8, 2022

stefanprodan left a comment

Choose a reason for hiding this comment

stefanprodan commented Dec 8, 2022 •

edited

Loading