Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Default to Prometheus histograms, not summaries
Prometheus histograms / summaries are complex and hard to wrap one's head around. The difference between them is also quite subtle and confusing to the uninitiated. I'm sure I won't do a good job of explaining the difference in this commit message, so I'll just link to the docs[0]. The key difference for us is that summaries can't be aggregates, but histograms can. For example, if I'm looking at http_request_duration_seconds, we might want to know "what's the 95%ile request time for this controller action". We can answer this question for a particular application instance (i.e. a specific pod) with both histograms and summaries. If we want to know the 95%ile request time aggregated across all instances / pods however, we can only do that with histograms. This is because in summaries, quantiles are computed ahead of time by the prometheus client, so they can only see information for one particular app instance. Histograms on the other hand, defer the calculation of quantiles to query time, which means they can be aggregated (but are less precise). prometheus_exporter defaults to Summary[1], but in our case I think it makes more sense to default to Histogram. There may be some apps where we prefer Summary, so I've allowed it to be passed in as a configuration option. From the summary metrics we have at the moment, we can see that some controller actions take significantly longer than the 10 seconds which prometheus_exporter uses as it's default max bucket. Therefore, I've added a few more buckets so we can see the distribution between 10 and 50 seconds. [0] - https://prometheus.io/docs/practices/histograms/#quantiles [1] - https://github.com/discourse/prometheus_exporter#histogram-mode
- Loading branch information