-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix stopping of modules started by kubernetes autodiscover #10476
Fix stopping of modules started by kubernetes autodiscover #10476
Conversation
12df0f0
to
b67d810
Compare
well done @jsoriano, thank you for taking this! ❤️ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Could you rebase to get CI green?
When deleting pods, they are first marked for deletion and then, after a grace period or after all containers have stopped, they are actually deleted. We were stopping modules only on pod deletion, but at this moment the container can lack of information of containers that were running when the modules started, but have been stopping during the grace period. This change schedules additional stops as soon as we receive an event with a deletion timestamp. At this moment the pods still contains the information of all containers and thus the stop is actually done.
b67d810
to
5a6ce23
Compare
TLDR; My original assumption was wrong, modules were not being re-started because of a race condition between last update and delete, but they weren't being stopped because on delete we may not have enough information to stop the modules. On my first implementation of this change I messed up with the |
Writting my previous comment made me think that we may need an identifier for containers that don't depend on its state, so we can start and stop modules normally. |
Yes, much simpler, it works by using another ID, description updated. |
|
||
// This must be an id that doesn't depend on the state of the container | ||
// so it works also on `stop` if containers have been already deleted. | ||
eventId := fmt.Sprintf("%s.%s", pod.Metadata.GetUid(), c.GetName()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var eventId should be eventID
Except for the adjustement in an event id, failures were legit, I was overwriting also the value for |
jenkins, test this again please |
jenkins, test this again |
Oh, this was not the build I wanted to relaunch 🤦♂️ |
jenkins, test this again |
I am merging this to start with the backports, I'll do follow-ups with further fixes if needed. |
…0476) Kubernetes autodiscover only emits events for containers with an ID in pods with an IP, but when a pod is being terminated, their containers can lack of ID and the pod itself can lack of IP. This leads to modules that are never stopped because the delete event that should stop them lacks of the needed information. This change makes two things to avoid this problem: * Don't require the pod to have an IP on stop events. * Use IDs for containers that don't depend on its state. (cherry picked from commit 15f2f26)
…0476) Kubernetes autodiscover only emits events for containers with an ID in pods with an IP, but when a pod is being terminated, their containers can lack of ID and the pod itself can lack of IP. This leads to modules that are never stopped because the delete event that should stop them lacks of the needed information. This change makes two things to avoid this problem: * Don't require the pod to have an IP on stop events. * Use IDs for containers that don't depend on its state. (cherry picked from commit 15f2f26)
…0476) Kubernetes autodiscover only emits events for containers with an ID in pods with an IP, but when a pod is being terminated, their containers can lack of ID and the pod itself can lack of IP. This leads to modules that are never stopped because the delete event that should stop them lacks of the needed information. This change makes two things to avoid this problem: * Don't require the pod to have an IP on stop events. * Use IDs for containers that don't depend on its state. (cherry picked from commit 15f2f26)
…0476) Kubernetes autodiscover only emits events for containers with an ID in pods with an IP, but when a pod is being terminated, their containers can lack of ID and the pod itself can lack of IP. This leads to modules that are never stopped because the delete event that should stop them lacks of the needed information. This change makes two things to avoid this problem: * Don't require the pod to have an IP on stop events. * Use IDs for containers that don't depend on its state. (cherry picked from commit 15f2f26)
…10643) Kubernetes autodiscover only emits events for containers with an ID in pods with an IP, but when a pod is being terminated, their containers can lack of ID and the pod itself can lack of IP. This leads to modules that are never stopped because the delete event that should stop them lacks of the needed information. This change makes two things to avoid this problem: * Don't require the pod to have an IP on stop events. * Use IDs for containers that don't depend on its state. (cherry picked from commit 15f2f26)
…10642) Kubernetes autodiscover only emits events for containers with an ID in pods with an IP, but when a pod is being terminated, their containers can lack of ID and the pod itself can lack of IP. This leads to modules that are never stopped because the delete event that should stop them lacks of the needed information. This change makes two things to avoid this problem: * Don't require the pod to have an IP on stop events. * Use IDs for containers that don't depend on its state. (cherry picked from commit 15f2f26)
…10641) Kubernetes autodiscover only emits events for containers with an ID in pods with an IP, but when a pod is being terminated, their containers can lack of ID and the pod itself can lack of IP. This leads to modules that are never stopped because the delete event that should stop them lacks of the needed information. This change makes two things to avoid this problem: * Don't require the pod to have an IP on stop events. * Use IDs for containers that don't depend on its state. (cherry picked from commit 15f2f26)
…10644) Kubernetes autodiscover only emits events for containers with an ID in pods with an IP, but when a pod is being terminated, their containers can lack of ID and the pod itself can lack of IP. This leads to modules that are never stopped because the delete event that should stop them lacks of the needed information. This change makes two things to avoid this problem: * Don't require the pod to have an IP on stop events. * Use IDs for containers that don't depend on its state. (cherry picked from commit 15f2f26)
…0476) (elastic#10644) Kubernetes autodiscover only emits events for containers with an ID in pods with an IP, but when a pod is being terminated, their containers can lack of ID and the pod itself can lack of IP. This leads to modules that are never stopped because the delete event that should stop them lacks of the needed information. This change makes two things to avoid this problem: * Don't require the pod to have an IP on stop events. * Use IDs for containers that don't depend on its state. (cherry picked from commit 4162aa9)
Kubernetes autodiscover only emits events for containers with
an ID in pods with an IP, but when a pod is being terminated,
their containers can lack of ID and the pod itself can lack of IP.
This leads to modules that are never stopped because the
delete event that should stop them lacks of the needed
information.
This change makes two things to avoid this problem: