-
Notifications
You must be signed in to change notification settings - Fork 39.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Emit events on 'Failed' daemon pods #40720
Conversation
msg := fmt.Sprintf("Found failed daemon pod %s/%s on node %s, will try to kill it", pod.Namespace, node.Name, pod.Name) | ||
glog.V(2).Infof(msg) | ||
// Emit an event so that it's discoverable to users. | ||
dsc.eventRecorder.Eventf(ds, v1.EventTypeWarning, "FailedDaemonPod", msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure this pod is not already marked for deletion (pod.DeletionTimestamp == nil) so that we won't end up with duplicate events.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Fixed
@Kargakis after adding the check for DeletionTimestamp I started thinking that maybe it's better to filter active pods (i.e. not failed, succeeded, or terminating) earlier when generating the node to daemon pods map ( |
How are those going to be cleaned up? |
We can either not clean them up and let users to observe them and manually delete them, or we can clean them up for the users periodically (separate this clean up loop from the shouldSchedule-shouldContinueRunning switch case). |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED The following people have approved this PR: janetkuo, mikedanese Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
I don't think letting admins clean up is an option and periodically killing them seems more complicated than what you have here. I am lgtming this, we can discuss further if you think what you have here is not good enough. /lgtm |
@k8s-bot test this [submit-queue is verifying that this PR is safe to merge] |
Automatic merge from submit-queue (batch tested with PRs 40556, 40720) |
Please squash and ask for squashes of fix-up commits. |
Follow up #40330 @erictune @mikedanese @Kargakis @lukaszo @kubernetes/sig-apps-bugs