Skip to content
This repository has been archived by the owner on Apr 24, 2023. It is now read-only.

CPU and Memory Requests and Limits? #14

Open
StevenACoffman opened this issue Dec 21, 2017 · 6 comments
Open

CPU and Memory Requests and Limits? #14

StevenACoffman opened this issue Dec 21, 2017 · 6 comments

Comments

@StevenACoffman
Copy link

StevenACoffman commented Dec 21, 2017

What would be a good set of CPU and memory requests and limits? For comparison, this is filebeat:

        resources:
          requests:
            cpu: 2m
            memory: 10Mi
          limits:
            cpu: 10m
            memory: 20Mi

I know that the documentation talks about Memory limits being dependent on the buffer amount.

So, if we impose a limit of 10MB for the input plugins and considering the worse case scenario of the output plugin consuming 20MB extra, as a minimum we need (30MB x 1.2) = 36MB.

Given the Mem_Buf_Limit 5MB, would we need 13 MB?
I have very little insight as to an appropriate CPU request and limit.

@edsiper
Copy link
Member

edsiper commented Jan 16, 2018

@StevenACoffman

I think that real memory requirements will depends on the amount of filters and output plugins defined, using the approach described earlier should work.

For CPU we need to do some tests..

@solsson
Copy link
Contributor

solsson commented Feb 6, 2018

#18 increased the memory limits but did not try to address spikes. #19 tries to restrict the producer buffers, which if I interpret #16 (comment) correctly should allow a memory limit as discussed in http://fluentbit.io/documentation/0.12/configuration/memory_usage.html#estimating. Sadly I have no good test environment with unprocessed logs, so I'll just have to keep this running for a couple of days. Probably won't validate startup.

@StevenACoffman
Copy link
Author

StevenACoffman commented Feb 7, 2018

@solsson If you delete the fluent-bit daemonset (and pods) and then delete the file /var/lib/docker/containers/flb_kube.db on the host file (maps to /var/log/flb_kube.db from the container) and re-apply the daemonset, the new fluent-bit daemonset will reprocess all the host logs, so you can retest spikes.

@solsson
Copy link
Contributor

solsson commented Feb 7, 2018

Did so on two nodes now. It's regular GKE and I guess logs are rotated because I have no node with more than 100M of container logs.

Here's the 10 min rate of bytes in for the two pods I tested:

And this is the memory consumption:

skarmavbild 2018-02-07 kl 21 50 15

The result won't mean much without higher log volumes, but the two containers that just started processing without flb_kube.db do have the highest memory use. And I did see one of them restart initially (no evidence of OOMKilled though), but then successfully catching up.

Actually I've never seen the memory use of a fluent-bit pod go down.

@edsiper
Copy link
Member

edsiper commented Feb 7, 2018

@solsson I got confused with the graphics, in the memory consumption chart, what each line represents ?

@solsson
Copy link
Contributor

solsson commented Feb 8, 2018

A container, and because there's only one container in fluent-bit pods it also represents a pod. They're all from the daemonset so there's one pod per node. The end of a line means the pod got killed. In the case above I did three rolling upgrades with revert, hence some pods survived and some were re-created (at which point they get a new name).

I don't think we can draw much conclusions from the graphs above. Higher log volumes would be useful. But memory and cpu limits must always be adapted to workload, and to me it looks like the current limits from #18 work fine, as the current spike used about 65% of max. I think that the capping of kafka's buffer in #19 lead to less dramatic spikes, but it could also be the CPU cap or the IO situation that does so because ingestion get limited.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants