From 1dbdc16099ed2d1d6dbb32f40962f55c72fd8621 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Marcin=20=27Perk=27=20Sto=C5=BCek?= <perk@sumologic.com>
Date: Mon, 14 Feb 2022 17:07:45 +0100
Subject: [PATCH] docs: add fluentd buffers vs DPM calculations info for
 metrics

---
 deploy/docs/Best_Practices.md | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/deploy/docs/Best_Practices.md b/deploy/docs/Best_Practices.md
index d9596d53d8..14517b973a 100644
--- a/deploy/docs/Best_Practices.md
+++ b/deploy/docs/Best_Practices.md
@@ -255,6 +255,34 @@ See the following links to official Fluentd buffer documentation:
 - https://docs.fluentd.org/configuration/buffer-section
 - https://docs.fluentd.org/buffer/file
 
+### Fluentd buffer size for metrics
+
+Should you have any connectivity problems, depending on the buffer size your setup will be able to survive for a given amount of time without a data loss,
+delivering the data later when everything is operational again.
+
+To calculate this time you need to know how much data you send. For the calculations below we made an assumption that a single metric data poing is around 1
+kilobyte in size, including metadata. This assumption is based on the average data we ingest. By default, for file based buffering we use gzip compression
+which gives us around 3:1 compress ratio.
+
+That results in `1 DPM` (Data Points per Minute) using around `333 bytes of buffer`. That is `333 kilobytes for 1 thousand DPM` and `333 megabytes for 1 million DPM`.
+
+This buffer size can be spread between multiple Fluentd instances. To have the best results you should use the metrics load balancing with the help of built
+in nginx load balancer. It can be enabled by using the following setting: `sumologic.metrics.remoteWriteProxy.enabled=true`.
+
+The formula to calculate the buffering time:
+
+```
+minutes = (PV size in bytes * Fluentd instances) / (DPM * 333 bytes)
+```
+
+Example 1:  
+My cluster sends 10 thousand DPM to Sumo. I'm using default 10 gb of buffer size. I'm also using 3 Fluentd instances. That gives me 30gb of buffer (3 * 10gb).
+I'm using 3.33 mb per minute. My setup should be able to hold data for 9000 minutes, that is 150 hours or 6.25 days.
+
+Example 2:  
+My cluster sends 1 million DPM to Sumo. I'm using 20 gb of buffer size. I'm using 20 Fluentd instances. I have 400 gb of total buffers (20 * 20gb). I'm using
+333 mb of buffer every minute. My setup should be able to hold data for around 1200 minutes, that is 20 hours.
+
 ## Excluding Logs From Specific Components
 
 You can exclude specific logs from specific components from being sent to Sumo Logic