Skip to content

Commit

Permalink
docs: add fluentd buffers vs DPM calculations info for metrics
Browse files Browse the repository at this point in the history
  • Loading branch information
perk-sumo committed Feb 14, 2022
1 parent 1b1dc5e commit 1dbdc16
Showing 1 changed file with 28 additions and 0 deletions.
28 changes: 28 additions & 0 deletions deploy/docs/Best_Practices.md
Original file line number Diff line number Diff line change
Expand Up @@ -255,6 +255,34 @@ See the following links to official Fluentd buffer documentation:
- https://docs.fluentd.org/configuration/buffer-section
- https://docs.fluentd.org/buffer/file

### Fluentd buffer size for metrics

Should you have any connectivity problems, depending on the buffer size your setup will be able to survive for a given amount of time without a data loss,
delivering the data later when everything is operational again.

To calculate this time you need to know how much data you send. For the calculations below we made an assumption that a single metric data poing is around 1
kilobyte in size, including metadata. This assumption is based on the average data we ingest. By default, for file based buffering we use gzip compression
which gives us around 3:1 compress ratio.

That results in `1 DPM` (Data Points per Minute) using around `333 bytes of buffer`. That is `333 kilobytes for 1 thousand DPM` and `333 megabytes for 1 million DPM`.

This buffer size can be spread between multiple Fluentd instances. To have the best results you should use the metrics load balancing with the help of built
in nginx load balancer. It can be enabled by using the following setting: `sumologic.metrics.remoteWriteProxy.enabled=true`.

The formula to calculate the buffering time:

```
minutes = (PV size in bytes * Fluentd instances) / (DPM * 333 bytes)
```

Example 1:
My cluster sends 10 thousand DPM to Sumo. I'm using default 10 gb of buffer size. I'm also using 3 Fluentd instances. That gives me 30gb of buffer (3 * 10gb).
I'm using 3.33 mb per minute. My setup should be able to hold data for 9000 minutes, that is 150 hours or 6.25 days.

Example 2:
My cluster sends 1 million DPM to Sumo. I'm using 20 gb of buffer size. I'm using 20 Fluentd instances. I have 400 gb of total buffers (20 * 20gb). I'm using
333 mb of buffer every minute. My setup should be able to hold data for around 1200 minutes, that is 20 hours.

## Excluding Logs From Specific Components

You can exclude specific logs from specific components from being sent to Sumo Logic
Expand Down

0 comments on commit 1dbdc16

Please sign in to comment.