Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow not using finalizers for timer #896

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

gregsterin
Copy link

I'd like to have to option to not use finalizers in timers. The reason is that finalizers, while perfectly reasonable for a daemon/timer/change event handler acting as the operator for a CRD, are not ideal when watching other resources (ex: a secret). My use case is im using the timer to periodically check on Secrets with a custom TTL annotation, and delete them after they have been alive for TTL time. I don't want to have a finalizer on the secrets, as that couples successful deletion of a secret with this operator running. Its not a necessary coupling, and in the case where the operator is not running or was turned off for any reason it creates a very strange behavior for someone using kubectl to manually delete the secret for whatever reason to see the deletion hung. I don't need the finalizer for any practical reason (they should be used to perform custom cleanup before the object disappears from K8s, but I have no such actions to do), so it seems like in this case, the finalizer is really just there for convenience of detection of deletion in the code, rather than a practical use-case for it.

In #548, finalizers were required on all spawning handlers (daemon and timer handlers). This needed to be done the fix the bug in #390, but only because the code in https://github.com/nolar/kopf/blob/main/kopf/_core/reactor/processing.py#L335 has an assumption that finalizers are enabled, as checking for is_deletion_ongoing will ever really return True if finalizers are there, otherwise you never see a resource with a deletionTimestamp, it just disappears faster than it can be detected.

There's still a way to detect that the object was deleted, namely, the watchers see the deletion event and raw_event['type'] == 'DELETED'.

This diff introduces

  1. Awareness of raw_event_type in the SpawningCause, to detect deletion even without finalizers on
  2. Checking for which daemons to stop by checking for raw_event_type == 'DELETED' in the case where handler.requires_finalizers is False

This allows some daemons/timers to have finalizers and others not to, while still stopping the daemons/timers in both cases correctly.

I have tested manually against a kind cluster, and added a unit test to verify the timer stops when a deletion event is observed when requires_finalizer=False.

@gregsterin
Copy link
Author

@nolar any thoughts / feedback on this would be appreciated! Currently running this on my own branch but would love to contribute this upstream!

@nolar
Copy link
Owner

nolar commented Apr 2, 2022

Hello. Sorry for the delayed review — I was a bit busy with work & life in the past months.

I generally like the idea that daemons & timers can run without finalizers! The solution seems to be addressing it properly. Give me some time to think on it thoroughly, especially on the edge cases.

Regarding the edge cases:

I don't need the finalizer for any practical reason (they should be used to perform custom cleanup before the object disappears from K8s, but I have no such actions to do), so it seems like in this case, the finalizer is really just there for convenience of detection of deletion in the code, rather than a practical use-case for it.

The framework is made with the assumption that operator developers can do harm to themselves on occasion, and Kopf should not amplify that. In particular, they can write the daemons/timers in such a way that they do not terminate properly and instantly on the first signal to do that.

The finalizers are needed to keep the resource in the cluster as long as it takes for a daemon/timer to finish, and to do the regular "pings" (termination retries) every few seconds. If the daemon/timer does not exit after some cancellation_timeout, the operator gives up on it with a warning on a potential resource leak.

This scenario is documented here:

This whole approach also covers the scenario when a daemon/timer was started for a resource object X, the object was deleted and replaced with the same-named object X' — but the daemon has stuck on termination and continues operating on X' as if it were X.

So, these are a few edge cases to consider.


Going back to the original story with the total absence of finalizers: taking the edge cases described above, such a scenario would be a "shooting oneself in the foot" case. This can be allowed, but not as the default.

There is one example of allowing risky behaviour — the optional flag for the deletion handlers where operator developers explicitly declare that they take the risks of misbehaving.

As a rule of thumb, all defaults should be None; if not possible, then False or other forms of zero. So the "no-finalizer" case should be explicitly set to True. I recommend reversing the meaning of the flag (and finding a better name for it: optional does not fit here semantically).


Meanwhile, as a quick hack if you do not need the finalizers and Kopf's safety nets, such an operator can be implemented quite easily without storing anything on the resource at all:

import kopf
import time
import threading


@kopf.on.event(..., when=lambda type, **_: type!='DELETED')
def start_a_thread(memo, body, **_):
    if memo.get('thread') is None:
        gone = threading.Event()
        thread = threading.Thread(target=careless_daemon, args=(body, gone))
        thread.start()
        memo.gone = gone
        memo.thread = thread


@kopf.on.event(..., when=lambda type, **_: type=='DELETED')
def stop_the_thread(memo, logger, **_):
    if memo.get('thread') is not None:
        memo.gone.set()
        memo.thread.join(timeout=5)
        if memo.thread.is_alive():
            logger.warning("The careless daemon is not stopped after 5s. A resource leak?")


def careless_daemon(body: kopf.Body, gone: threading.Event):
    while not gone.is_set():
        print('ping')
        time.sleep(1)

@nolar nolar added the enhancement New feature or request label Apr 2, 2022
@gregsterin
Copy link
Author

no problem, I can definitely invert the requires_finalizer to no_finalizers so that it defaults to False - meaning yes finalizers.

I'll hold off until you have looks through more thoroughly and give that additional feedback. Thank you for the initial look!

@clee421
Copy link

clee421 commented Feb 13, 2023

I'm following up on the pull request. Was having the option to not add finalizers on a timer dropped?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants