Skip to content

alantang888/prometheus-downsampler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Prometheus Downsampler

This program use for collect Prometheus data for last n minutes (default is 5 minutes). Then take average on each metrics. Output to a text file.

Solution

Use with another Prometheus for store the downsampled data. For our case, we set the long term Prometheus retention to 2 years (Still testing. Hope it will work as expected).

Tested for a while. Memory usage on prometheus keep growing. It may cause by metrics didn't update recently but not reach retention will keep that index in memory. Now will try thanos.

This program only output a text file on K8S empty dir. Then use a nginx in same pod to expose the output to long-term Prometheus. And need to set honor_labels: true inside long term Prometheus scrape job. Otherwise some conflicted labels will be renamed.

Downsampler with 2 Prometheus

Config

There 4 parameters can config. You can either use args or environment variable.

  • Source Prometheus endpoint
  • Output file path
    • Default: /tmp/prometheus_downsample_output.txt
    • Args: -o
    • Environame variable: PDS_OUTPUT
  • Interval in minute for collect data from source Prometheus
    • Default: 5m
    • Args: -i
    • Environame variable: PDS_INTERVAL
  • Max concurrent connection to source Prometheus
    • Default: 50
    • Args: -c
    • Environame variable: PDS_CONCURRENT

Example: Your prometheus endpoint is http://192.168.1.20:9090 and want to downsample data for every 10 minutes:

go run prometheus-downsampler.go -s http://192.168.1.20:9090 -i 10m

or

./prometheus-downsampler -s http://192.168.1.20:9090 -i 10m

How it work

  1. Call Querying label values API to get all metric names
  2. Call Range Queries API to get every metrics with 1 minute step
  3. Take average on each metrics
  4. Write all metrics with exposition format to a temp file
  5. Rename the temp file to output file name

Issue

This program can handle collect a longer time range data. Then group them to every n minute and take average. But due to below reasons. Now only process for single time group.

  • exposition format mention Each line must have a unique combination of a metric name and labels. Otherwise, the ingestion behavior is undefined.. But didn't mention is it safe if have different timestamp
  • Also tested for a while with export 1 hour data with 12 data points. The long term Prometheus lost some of data point.

Why not use remote_write with InfluxDB

Because we scrape over 650K metrics every 10 second (With 2 Prometheus servers for HA). We tried use remote_write to InfluxDB (Single server, not enterprise edition). But it cause InfluxDB very high CPU usage, no response and OOM dead very soon. Also make the operation Prometheus dead. So we try on different way (This project). Also just need a little bit modify on Grafana dashboard. No need to re-build all dashboard (Don't know why use Prometheus remote_read from InfluxDB for Grafana always got proxy timeout from Grafana).

About

Collect data from Prometheus and downsampling

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published