-
Notifications
You must be signed in to change notification settings - Fork 717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't set an ownerRef on secrets users are susceptible to copy around #3992
Conversation
A k8s bug (kubernetes/kubernetes#65200) may cause the k8s garbage collection to delete undesired resources in case users manually copy an operator-managed secret to another namespace. To avoid that situation, this commit ensures no ownerRef is set on a subset of managed secrets users are susceptible to copy around: - the elastic user password secret - elasticsearch public transport certs - elasticsearch, kibana, enterprise search, apm server public http certs Existing ownerReferences set with earlier ECK versions will be removed when reconciled. Since they do not have an ownerRef anymore, those secrets are not automatically deleted when the Elasticsearch resource is deleted. To work around that situation, the secret reconciliation logic adds an additional set of labels to the reconciled secrets that don't have an ownerRef specified. These labels reference the "soft" owner ("soft" as in handled through some custom code and not through k8s builtin garbage collection logic). Once a controller receives a deletion event for the resource it manages, it will automatically remove the soft-owned secrets. This is done as best-effort. Secrets will remain orphan if: - the operator is not running when the owner is deleted - an error happens while deleting the soft-owned secrets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just gave it a try, lgtm so far. I left a few nits and one question about the use of the UID instead of the object name.
// - we remove the existing owner | ||
// - we set additional labels to perform garbage collection on owner deletion (best-effort) | ||
expected.Labels[SoftOwnerNamespaceLabel] = theoreticalOwner.GetNamespace() | ||
expected.Labels[SoftOwnerNameLabel] = theoreticalOwner.GetName() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if it makes sense to rely on the owner's UID, and not on the object name, to be consistent with what is done by the K8S GC ? It would require for the object to have been stored first but I think it is a fair assumption ?
expected.Labels[SoftOwnerNameLabel] = theoreticalOwner.GetName() | |
expected.Labels[SoftOwnerUID] = string(theoreticalOwner.GetUID()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm I think the problem is that the garbage collection part of the code only knows about the deleted resource name + namespace (as part of the reconcile.Request
). We don't know anything about the parent ID since it doesn't exist anymore? (Maybe I'm misunderstanding what you mean).
|
Current status: this PR is missing unit tests, but otherwise I don't expect any code change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, did a few simple tests that all seemed to work. Quirk: If you delete your es cluster while the operator is not running and recreate another cluster with the same name it adopts the left over resources from the previous incarnation which means password stays the same etc.
run full pr build |
We're missing a watch on soft-owned secrets to automatically trigger reconciliations on any change (e.g. deleting the elastic user secret). |
Good catch! I totally forgot that we are basing our watches on the owner reference :-( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, so happy we have these e2e tests...
Jenkins test this please My bad still not fixed #3617 |
run full pr build |
…elastic#3992) * Don't set an ownerRef on secrets users are susceptible to copy around A k8s bug (kubernetes/kubernetes#65200) may cause the k8s garbage collection to delete undesired resources in case users manually copy an operator-managed secret to another namespace. To avoid that situation, this commit ensures no ownerRef is set on a subset of managed secrets users are susceptible to copy around: - the elastic user password secret - elasticsearch public transport certs - elasticsearch, kibana, enterprise search, apm server public http certs Existing ownerReferences set with earlier ECK versions will be removed when reconciled. Since they do not have an ownerRef anymore, those secrets are not automatically deleted when the Elasticsearch resource is deleted. To work around that situation, the secret reconciliation logic adds an additional set of labels to the reconciled secrets that don't have an ownerRef specified. These labels reference the "soft" owner ("soft" as in handled through some custom code and not through k8s builtin garbage collection logic). Once a controller receives a deletion event for the resource it manages, it will automatically remove the soft-owned secrets. This is done as best-effort. Secrets will remain orphan if: - the operator is not running when the owner is deleted - an error happens while deleting the soft-owned secrets * Error out the reconciliation on garbage collection errors * Improvements from PR review * Label soft-owned secrets with the owner Kind * Reuse Kind constants instead of hardcoded string * Fix existing unit tests * Garbage collect orphan soft-owned secrets on operator startup * Fix linter warnings * Add unit tests * Cover soft owned secrets gc in e2e tests * Fix linter warning (typo) * Improvements from PR review * Trigger reconciliations on soft-owned secrets events * Fix linter stuff
…#3992) (#4008) * Don't set an ownerRef on secrets users are susceptible to copy around A k8s bug (kubernetes/kubernetes#65200) may cause the k8s garbage collection to delete undesired resources in case users manually copy an operator-managed secret to another namespace. To avoid that situation, this commit ensures no ownerRef is set on a subset of managed secrets users are susceptible to copy around: - the elastic user password secret - elasticsearch public transport certs - elasticsearch, kibana, enterprise search, apm server public http certs Existing ownerReferences set with earlier ECK versions will be removed when reconciled. Since they do not have an ownerRef anymore, those secrets are not automatically deleted when the Elasticsearch resource is deleted. To work around that situation, the secret reconciliation logic adds an additional set of labels to the reconciled secrets that don't have an ownerRef specified. These labels reference the "soft" owner ("soft" as in handled through some custom code and not through k8s builtin garbage collection logic). Once a controller receives a deletion event for the resource it manages, it will automatically remove the soft-owned secrets. This is done as best-effort. Secrets will remain orphan if: - the operator is not running when the owner is deleted - an error happens while deleting the soft-owned secrets * Error out the reconciliation on garbage collection errors * Improvements from PR review * Label soft-owned secrets with the owner Kind * Reuse Kind constants instead of hardcoded string * Fix existing unit tests * Garbage collect orphan soft-owned secrets on operator startup * Fix linter warnings * Add unit tests * Cover soft owned secrets gc in e2e tests * Fix linter warning (typo) * Improvements from PR review * Trigger reconciliations on soft-owned secrets events * Fix linter stuff
I guess I understand why this is done however it does mean that those generated resources do not show up in tools like Argo. When lots of resources are generated being visible from tooling is helpful so as to understand what has even been generated. Would there be any appetite to revisit this? |
A k8s bug (kubernetes/kubernetes#65200) may cause the
k8s garbage collection to delete undesired resources in case users manually
copy an operator-managed secret to another namespace.
To avoid that situation, this commit ensures no ownerRef is set on a subset of
managed secrets users are susceptible to copy around:
Existing ownerReferences set with earlier ECK versions will be removed when
reconciled.
Since they do not have an ownerRef anymore, those secrets are not automatically
deleted when the Elasticsearch resource is deleted. To work around that situation,
the secret reconciliation logic adds an additional set of labels to the reconciled
secrets that don't have an ownerRef specified. These labels reference the "soft"
owner ("soft" as in handled through some custom code and not through k8s builtin
garbage collection logic).
Once a controller receives a deletion event for the resource it manages, it will
automatically remove the soft-owned secrets.
Additionally, garbage collection runs at operator startup to cover the case where
owners were deleted while the operator was down.
TODO:
Fixes #3986.