Skip to content

Commit

Permalink
docs: add fluentd buffers vs DPM calculations info for metrics
Browse files Browse the repository at this point in the history
  • Loading branch information
perk-sumo committed Feb 15, 2022
1 parent 1d2efb0 commit d2631b0
Show file tree
Hide file tree
Showing 2 changed files with 43 additions and 0 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Added

- feat: add otelcol's liveness and readiness probes configuration [#2105][#2105]
- docs: add fluentd buffers vs DPM calculations info for metrics [#2128][#2128]

### Changed

Expand All @@ -23,6 +24,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
[#2105]: https://github.com/SumoLogic/sumologic-kubernetes-collection/pull/2105
[#2114]: https://github.com/SumoLogic/sumologic-kubernetes-collection/pull/2114
[#2127]: https://github.com/SumoLogic/sumologic-kubernetes-collection/pull/2127
[#2128]: https://github.com/SumoLogic/sumologic-kubernetes-collection/pull/2128

## [v2.5.1]

Expand Down
41 changes: 41 additions & 0 deletions deploy/docs/Best_Practices.md
Original file line number Diff line number Diff line change
Expand Up @@ -255,6 +255,47 @@ See the following links to official Fluentd buffer documentation:
- https://docs.fluentd.org/configuration/buffer-section
- https://docs.fluentd.org/buffer/file

### Fluentd buffer size for metrics

Should you have any connectivity problems, depending on the buffer size your setup will
be able to survive for a given amount of time without a data loss, delivering the data
later when everything is operational again.

To calculate this time you need to know how much data you send. For the calculations below
we made an assumption that a single metric data point is around 1 kilobyte in size, including
metadata. This assumption is based on the average data we ingest. By default, for file based
buffering we use gzip compression which gives us around 3:1 compress ratio.

That results in `1 DPM` (Data Points per Minute) using around `333 bytes of buffer`. That is
`333 kilobytes for 1 thousand DPM` and `333 megabytes for 1 million DPM`. In other words - storing
a million data points will use a 333 megabytes of buffer every minute.

This buffer size can be spread between multiple Fluentd instances. To have the best results you
should use the metrics load balancing which can be enabled by using the following setting:
`sumologic.metrics.remoteWriteProxy.enabled=true`. It enables the remote write proxy where nginx
is being used to forward data from Prometheus to Fluentds. We strongly recommend using this
setting as in case of uneven load your buffer storage is as big as single Fluentd instance buffer.
Unfortunately even with `remoteWriteProxy` enabled you might experience uneven load. Because of
that we also `recommend to make your buffers twice the calculated size`.

The formula to calculate the buffering time:

```
minutes = (PV size in bytes * Fluentd instances) / (DPM * 333 bytes)
```

Example 1:
My cluster sends 10 thousand DPM to Sumo. I'm using default 10 gb of buffer size. I'm also using
3 Fluentd instances. That gives me 30 gb of buffers in total (3 * 10 gb). I'm using 3.33 mb per
minute. My setup should be able to hold data for 9000 minutes, that is 150 hours or 6.25 days.
We recommend treating this as 4500 minutes, that is 75 hours or 3.12 days of buffer.

Example 2:
My cluster sends 1 million DPM to Sumo. I'm using 20 gb of buffer size. I'm using 20 Fluentd
instances. I have 400 gb of buffers in total (20 * 20 gb). I'm using 333 mb of buffer every minute.
My setup should be able to hold data for around 1200 minutes, that is 20 hours. We recommend treating
this as 600 minutes, that is 10 hours of buffer.

## Excluding Logs From Specific Components

You can exclude specific logs from specific components from being sent to Sumo Logic
Expand Down

0 comments on commit d2631b0

Please sign in to comment.