-
Notifications
You must be signed in to change notification settings - Fork 249
CPU and Memory Requests and Limits? #14
Comments
I think that real memory requirements will depends on the amount of filters and output plugins defined, using the approach described earlier should work. For CPU we need to do some tests.. |
#18 increased the memory limits but did not try to address spikes. #19 tries to restrict the producer buffers, which if I interpret #16 (comment) correctly should allow a memory limit as discussed in http://fluentbit.io/documentation/0.12/configuration/memory_usage.html#estimating. Sadly I have no good test environment with unprocessed logs, so I'll just have to keep this running for a couple of days. Probably won't validate startup. |
@solsson If you delete the fluent-bit daemonset (and pods) and then delete the file |
Did so on two nodes now. It's regular GKE and I guess logs are rotated because I have no node with more than 100M of container logs. Here's the 10 min rate of bytes in for the two pods I tested: And this is the memory consumption: The result won't mean much without higher log volumes, but the two containers that just started processing without flb_kube.db do have the highest memory use. And I did see one of them restart initially (no evidence of OOMKilled though), but then successfully catching up. Actually I've never seen the memory use of a fluent-bit pod go down. |
@solsson I got confused with the graphics, in the memory consumption chart, what each line represents ? |
A container, and because there's only one container in fluent-bit pods it also represents a pod. They're all from the daemonset so there's one pod per node. The end of a line means the pod got killed. In the case above I did three rolling upgrades with revert, hence some pods survived and some were re-created (at which point they get a new name). I don't think we can draw much conclusions from the graphs above. Higher log volumes would be useful. But memory and cpu limits must always be adapted to workload, and to me it looks like the current limits from #18 work fine, as the current spike used about 65% of max. I think that the capping of kafka's buffer in #19 lead to less dramatic spikes, but it could also be the CPU cap or the IO situation that does so because ingestion get limited. |
What would be a good set of CPU and memory requests and limits? For comparison, this is filebeat:
I know that the documentation talks about Memory limits being dependent on the buffer amount.
Given the
Mem_Buf_Limit 5MB
, would we need 13 MB?I have very little insight as to an appropriate CPU request and limit.
The text was updated successfully, but these errors were encountered: