-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Excessive memory usage on kubernetes #894
Comments
Is this even excessive memory usage? Are there any recommendations for what the resource limits/requests should be? |
If all Pods send around 200MB of data within 5 seconds, yeah, it will be killed. While Fluent Bit receives data, it will not deliver the logs until the Flush time expiration, my suggestion is to set Flush to 1 (one second) and append a Mem_Buf_Limit option into the TCP input plugin just for protection, you can read more about memory handling here: |
@edsiper It takes about 10-15 minutes for the fluentbit pod to be killed - it just has a nice straight memory graph that looks like it never cleans up any memory: I have tried various settings for the Mem_Buf_Limit but none of them make any difference. |
Did u try Flush 1? |
Yeh - those graphs are with it set to Flush 1 |
Looks like the issue is with our app - it never closed the TCP connection to fluentbit and instead just reused it for each batch of logs. Now we close the connection after a batch of logs and it has fixed the issue. |
@tomstreet I am curious to learn more about the issue. My expectation is that Fluent Bit will protect it self from that scenario. Would you please share some steps to reproduce the problem ? |
Sure.. so the config is above, here is the daemonset yaml: apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: fluent-bit
namespace: logging
labels:
app: fluent-bit-logging
kubernetes.io/cluster-service: "true"
spec:
updateStrategy:
type: RollingUpdate
template:
metadata:
labels:
app: fluent-bit-logging
kubernetes.io/cluster-service: "true"
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "2020"
prometheus.io/path: /api/v1/metrics/prometheus
spec:
containers:
- name: fluent-bit
image: fluent/fluent-bit:0.14.7
imagePullPolicy: Always
ports:
- containerPort: 2020
- containerPort: 5170
hostPort: 5170
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
- name: fluent-bit-config
mountPath: /fluent-bit/etc/
resources:
limits:
cpu: 2
memory: 200Mi
requests:
cpu: 100m
memory: 200Mi
terminationGracePeriodSeconds: 10
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
- name: fluent-bit-config
configMap:
name: fluent-bit-config
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule and our app is written in c# and here is a simplified version of the log emitter: public class Emitter
{
private TcpClient _client;
private FluentBitSettings _settings;
private async Task Connect()
{
if(_client != null)
{
if (_client.Connected)
{
return;
}
_client.Dispose();
_client = null;
}
_client = new TcpClient();
await _client.ConnectAsync(_settings.Host, _settings.Port);
}
private void Disconnect()
{
_client?.Dispose();
_client = null;
}
public async Task Emit(byte[] logsBatch)
{
try
{
await Connect();
var tcpStream = _client.GetStream();
await tcpStream.WriteAsync(logsBatch);
await tcpStream.FlushAsync();
}
finally
{
Disconnect();
}
}
} If we remove the |
Similar problem,
When elastic search is heavily loaded it will give HTTP 429 Error to fluent bit. And fluent bit will keep unsent logs in its main memory for retry. Fluent bit is retrying for X number of times (as configured in output plugin's To Reproduce Expected behavior Your Environment
|
I don't think this issue has been resolved. |
Running on Azure Kubernetes Service (kubernetes v1.11.3) as a daemon set using the fluent/fluent-bit:0.14.6 image. The nodes are quite small with each one running roughly 15 containers that are sending JSON logs over tcp. The pod memory limit is currently set to 200Mi and fluent-bit keeps hitting this and restarting. Any suggestions? Here is the config:
parsers.conf:
The text was updated successfully, but these errors were encountered: