From 1dbdc16099ed2d1d6dbb32f40962f55c72fd8621 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Marcin=20=27Perk=27=20Sto=C5=BCek?= Date: Mon, 14 Feb 2022 17:07:45 +0100 Subject: [PATCH] docs: add fluentd buffers vs DPM calculations info for metrics --- deploy/docs/Best_Practices.md | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/deploy/docs/Best_Practices.md b/deploy/docs/Best_Practices.md index d9596d53d8..14517b973a 100644 --- a/deploy/docs/Best_Practices.md +++ b/deploy/docs/Best_Practices.md @@ -255,6 +255,34 @@ See the following links to official Fluentd buffer documentation: - https://docs.fluentd.org/configuration/buffer-section - https://docs.fluentd.org/buffer/file +### Fluentd buffer size for metrics + +Should you have any connectivity problems, depending on the buffer size your setup will be able to survive for a given amount of time without a data loss, +delivering the data later when everything is operational again. + +To calculate this time you need to know how much data you send. For the calculations below we made an assumption that a single metric data poing is around 1 +kilobyte in size, including metadata. This assumption is based on the average data we ingest. By default, for file based buffering we use gzip compression +which gives us around 3:1 compress ratio. + +That results in `1 DPM` (Data Points per Minute) using around `333 bytes of buffer`. That is `333 kilobytes for 1 thousand DPM` and `333 megabytes for 1 million DPM`. + +This buffer size can be spread between multiple Fluentd instances. To have the best results you should use the metrics load balancing with the help of built +in nginx load balancer. It can be enabled by using the following setting: `sumologic.metrics.remoteWriteProxy.enabled=true`. + +The formula to calculate the buffering time: + +``` +minutes = (PV size in bytes * Fluentd instances) / (DPM * 333 bytes) +``` + +Example 1: +My cluster sends 10 thousand DPM to Sumo. I'm using default 10 gb of buffer size. I'm also using 3 Fluentd instances. That gives me 30gb of buffer (3 * 10gb). +I'm using 3.33 mb per minute. My setup should be able to hold data for 9000 minutes, that is 150 hours or 6.25 days. + +Example 2: +My cluster sends 1 million DPM to Sumo. I'm using 20 gb of buffer size. I'm using 20 Fluentd instances. I have 400 gb of total buffers (20 * 20gb). I'm using +333 mb of buffer every minute. My setup should be able to hold data for around 1200 minutes, that is 20 hours. + ## Excluding Logs From Specific Components You can exclude specific logs from specific components from being sent to Sumo Logic