-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add_resource_metadata.cronjob
overloads the memory usage
#31
Comments
@ChrsMark instead of making |
Thanks @jsoriano ! Yes that would also work! I won't expect much increase if you consider that we have a similar watcher/informer for Pods. So in case of having too many Jobs you will have the same amount of events for their respective Pods, so I would expect an "x2" factor in usage which is far better compared to what we observe with the current implementation. At the same time I'm not sure though, if it worth it having a complete watcher/informer/store implementation just to retrieve a single field 🤔 . But we can evaluate this of course since it seems promising. |
Maybe a bit hacky, but the list/watch cache that we use for informers allow to implement the List and Watch functions to return arbitrary |
@jsoriano , @ChrsMark : I feel I'm missing something here..... could anyone explain in what way the problem we are facing is NOT a memory leak (because it's perceived like that)? I'm not a fan of actions like:
Probably I'm missing the details of what's exactly going on, in order to understand why a beat with 200 simple Is there a technical explanation for this to be considered a huge memory need more than a In my example, 200 CronJobs is a static number, that for sure will create a lot of new pods and jobs per minute, but the pods and jobs will eventually be deleted (otherwise the error would be on the Kubernetes side), so the final number of existing elements should be stable, if I'm not mistaken, hence I don't understand why the beat memory need increase over time when it's doing nothing more than checking what's running in the system via the autodiscover implementation. Is it possible to get a clear explanation of the issue and why exactly the memory grows and grows? |
@eedugon the problem lies inside the k8s library implementation according to our observations. So we are not 100% of what can go wrong here. From my observations by using https://github.com/ChrsMark/k8sdiscovery I see that after some point the memory usage is stabilized in high level but stable though. I guess that in Beats this stable level is quite high since we fetch the metadata from various places. So I would say it's not a leak unless we prove that it is :). In any case the idea of querying and fetching the Job objects from the k8s API every time a related Pod is added/updated/deleted is not good. |
@ChrsMark , thanks for the explanation, it really helps! Losing the Based on your description the issue could rely in the k8s library. In such case it might make sense to raise an issue there with some proofs of concepts (maybe with your I will do some extra tests to see if I get the same behavior as you ( Update: A simple With a node of 32G after 4 days running the beat memory is still growing. @ChrsMark , I don't see here a high utilization but a continuous growth, and even in the hypothetical case of the memory usage being stabilized, this amount of memory could be considered a leak in my opinion: |
We have another report in a Discuss thread: https://discuss.elastic.co/t/filebeat-memory-leak-via-filebeat-autodiscover-and-200-000-goroutines/. @gizas I think it's about time to prioritize this :) |
FYI @eedugon I'm working on a possible fix for this issue:
I'm testing by deploying on 3 node GKE cluster with the following Cronjobs' related Resources: ➜ ~ k get job -A | wc -l
966
➜ ~ k get cronjob -A | wc -l
201
➜ ~ k get pod -A | wc -l
771 The image that I have built from the source PRs is the Below providing some screenshots that indicate positive results: @eedugon would you maybe have any environments that you could also test it so as to double check that I'm actually solving the issue? |
I revive the discussion here for the option to disable by default the On one hand the enhancement with watchers has been part of 8.9.0 in elastic/elastic-agent#2711 The enrichment only adds deployment name and cronjob name, so I think is not a big loss to have those disabled. On the contrary, the addition only of two fields can possible cause many problems in big scale. Additionally the extra names (for deployments inside a replicaset and for cronjob/job inside a pod) can be added by removing the suffixes (in code programatically or with an ingest pipeline) eg. ❯ k get jobs
NAME COMPLETIONS DURATION AGE
pi 0/1 42s 42s
❯ k get pods -A
default pi-hlc8c 0/1 Completed 0 52s
❯ k describe pod pi-hlc8c
Name: pi-hlc8c
...
Controlled By: Job/pi So for Jobs: job-=pod.name ❯ k get deployments.apps -n kube-system
coredns 2/2 2 2 19d
❯ k get replicasets.apps -n kube-system
coredns-5d78c9869d 2 2 2 19d
k describe replicasets.apps -n kube-system coredns-5d78c9869d
Name: coredns-5d78c9869d
...
deployment.kubernetes.io/revision: 1
Controlled By: Deployment/coredns So for replicaset: deployment-=replicaset.name |
The trade-off of the added value from these 2 fields compared to the performance overhead we pay back to retrieve them makes me think that we should:
It's a fact that changing the default setting would be a breaking change but in that case it's for a reasonable reason when we have investigate all possible technical solutions. Indeed requesting the API only for 2 single fields is an overkill to my mind and should be disabled by default. |
Some more findings looking the kube-state-metrics: For pod :
For replicaset: curl http://kube-state-metrics:8080/metrics | grep -i kube_replicaset_owner
kube_replicaset_owner{namespace="kube-system",replicaset="coredns-5d78c9869d",owner_kind="Deployment",owner_name="coredns",owner_is_controller="true"} 1 So I can even open a new enhancement request to solve both |
Disabling AddResourceMetadataConfig.Deployment: false and AddResourceMetadataConfig.Cronjob: false as per discussion here #31 (comment)
As reported at elastic/beats#33307, the kubernetes autodiscovery provider can lead to OOM kills for Beats Pods in clusters with specific type of workloads, ie Cronjobs.
The purpose of this issue is the following:
add_resource_metadata.cronjob: false
the default since we know it's an "expensive" feature.hello
(the name of the cronjob) out ofhello-1234
(the name of the job).For full context and previous analysis see the summary at elastic/beats#33307 (comment).
cc: @eedugon @gizas @jsoriano
The text was updated successfully, but these errors were encountered: