Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excessive memory usage on kubernetes #894

Closed
tomstreet opened this issue Nov 8, 2018 · 10 comments
Closed

Excessive memory usage on kubernetes #894

tomstreet opened this issue Nov 8, 2018 · 10 comments
Assignees

Comments

@tomstreet
Copy link

Running on Azure Kubernetes Service (kubernetes v1.11.3) as a daemon set using the fluent/fluent-bit:0.14.6 image. The nodes are quite small with each one running roughly 15 containers that are sending JSON logs over tcp. The pod memory limit is currently set to 200Mi and fluent-bit keeps hitting this and restarting. Any suggestions? Here is the config:

[SERVICE]
    Flush         5
    Log_Level     info
    Daemon        off
    Parsers_File  parsers.conf
    HTTP_Server   On
    HTTP_Listen   0.0.0.0
    HTTP_Port     2020

[INPUT]
    Name      tcp
    Listen    0.0.0.0
    Port      5170

[OUTPUT]
    Name      null
    Match     *

parsers.conf:

[PARSER]
    Name   json
    Format json
    Time_Key time
    Time_Format %d/%b/%Y:%H:%M:%S %z
@tomstreet
Copy link
Author

Is this even excessive memory usage? Are there any recommendations for what the resource limits/requests should be?

@edsiper
Copy link
Member

edsiper commented Nov 21, 2018

If all Pods send around 200MB of data within 5 seconds, yeah, it will be killed.

While Fluent Bit receives data, it will not deliver the logs until the Flush time expiration, my suggestion is to set Flush to 1 (one second) and append a Mem_Buf_Limit option into the TCP input plugin just for protection, you can read more about memory handling here:

https://docs.fluentbit.io/manual/configuration/backpressure

@tomstreet
Copy link
Author

@edsiper It takes about 10-15 minutes for the fluentbit pod to be killed - it just has a nice straight memory graph that looks like it never cleans up any memory:
image
The drop in memory usage is when the pod gets killed:
image

I have tried various settings for the Mem_Buf_Limit but none of them make any difference.

@edsiper
Copy link
Member

edsiper commented Nov 30, 2018

Did u try Flush 1?

@tomstreet
Copy link
Author

Yeh - those graphs are with it set to Flush 1

@tomstreet
Copy link
Author

Looks like the issue is with our app - it never closed the TCP connection to fluentbit and instead just reused it for each batch of logs. Now we close the connection after a batch of logs and it has fixed the issue.

@edsiper
Copy link
Member

edsiper commented Nov 30, 2018

@tomstreet I am curious to learn more about the issue. My expectation is that Fluent Bit will protect it self from that scenario. Would you please share some steps to reproduce the problem ?

@tomstreet
Copy link
Author

tomstreet commented Nov 30, 2018

Sure.. so the config is above, here is the daemonset yaml:

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: fluent-bit
  namespace: logging
  labels:
    app: fluent-bit-logging
    kubernetes.io/cluster-service: "true"
spec:
  updateStrategy:
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: fluent-bit-logging
        kubernetes.io/cluster-service: "true"
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "2020"
        prometheus.io/path: /api/v1/metrics/prometheus
    spec:
      containers:
      - name: fluent-bit
        image: fluent/fluent-bit:0.14.7
        imagePullPolicy: Always
        ports:
          - containerPort: 2020
          - containerPort: 5170
            hostPort: 5170
        volumeMounts:
        - name: varlog
          mountPath: /var/log
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
        - name: fluent-bit-config
          mountPath: /fluent-bit/etc/
        resources:
          limits:
            cpu: 2
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 200Mi
      terminationGracePeriodSeconds: 10
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers
      - name: fluent-bit-config
        configMap:
          name: fluent-bit-config
      tolerations:
      - key: node-role.kubernetes.io/master
        operator: Exists
        effect: NoSchedule

and our app is written in c# and here is a simplified version of the log emitter:

public class Emitter 
{
    private TcpClient _client;
    private FluentBitSettings _settings;

    private async Task Connect()
    {
        if(_client != null)
        {
            if (_client.Connected)
            {
                return;
            }

            _client.Dispose();
            _client = null;
        }

        _client = new TcpClient();

        await _client.ConnectAsync(_settings.Host, _settings.Port);
    }

    private void Disconnect()
    {
        _client?.Dispose();
        _client = null;
    }

    public async Task Emit(byte[] logsBatch) 
    {
        try
        {
            await Connect();

            var tcpStream = _client.GetStream();
            
            await tcpStream.WriteAsync(logsBatch);
            await tcpStream.FlushAsync();
        }
        finally
        {
            Disconnect();
        }
    }
}

If we remove the Disconnect() in the finally block of the Emit method, then it reuses the TCP connection without closing it every time the Emit method is called - this is what causes the memory issue in Fluent Bit. Including it not only stopped the issue in Fluent Bit but also reduced the memory usage of our own service.

@HarishHothi
Copy link

Similar problem,
Observing very high memory usage on fluentbit pod. We observed around 10GB of memory usage. We have not specified resource limit on pod for the testing.

kubectl top po -n logging fluent-bit-gmnrt 
NAME               CPU(cores)   MEMORY(bytes) 
fluent-bit-gmnrt    46m             9861Mi 

When elastic search is heavily loaded it will give HTTP 429 Error to fluent bit. And fluent bit will keep unsent logs in its main memory for retry. Fluent bit is retrying for X number of times (as configured in output plugin's Retry_Limit setting). Retry_Limit. After X number of retries it should discard the message. Over here I am not sure it is discarding or keeping in its memory.
Mem_Buf_Limit is also set to 5MB but still 10GB is used by fluent bit.

To Reproduce
Start application and fluentbit when elastic search is heavily loaded.

Expected behavior
Once retry limit is reached fluentbit should not keep the record in its memory.

Your Environment
Kubernetes version is v1.12.2
Fluent bit version 0.14.7
Snippet of fluentbit Configuration

    [INPUT]
    Name              tail
    Tag               kube.*
    Path              /var/log/containers/*.log
    Parser            docker
    DB                /var/log/flb_kube.db
    Mem_Buf_Limit     5MB
    Skip_Long_Lines   On
    Refresh_Interval  10
    ignore_older        1d
 
   [OUTPUT]
    Name            es
    Match           *
    Host            ${FLUENT_ELASTICSEARCH_HOST}
    Port            ${FLUENT_ELASTICSEARCH_PORT}
    Logstash_Format On
    Retry_Limit     2
    Buffer_Size     False

    [FILTER]
     Name record_modifier
     Match *
     Remove_key time

    [FILTER]
        Name            grep
        Match           *
        Regex           log [a-zA-Z1-9]*SOME_STRING[a-zA-Z1-9]*

@mcauto
Copy link

mcauto commented Sep 26, 2022

I don't think this issue has been resolved.

rawahars pushed a commit to rawahars/fluent-bit that referenced this issue Oct 24, 2022
)

Signed-off-by: Wesley Pettit <wppttt@amazon.com>

Signed-off-by: Wesley Pettit <wppttt@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants