Skip to content

Commit

Permalink
Default to Prometheus histograms, not summaries
Browse files Browse the repository at this point in the history
Prometheus histograms / summaries are complex and hard to wrap one's
head around. The difference between them is also quite subtle and
confusing to the uninitiated.  I'm sure I won't do a good job of
explaining the difference in this commit message, so I'll just link to
the docs[0].

The key difference for us is that summaries can't be aggregates, but
histograms can.

For example, if I'm looking at http_request_duration_seconds, we might
want to know "what's the 95%ile request time for this controller
action". We can answer this question for a particular application
instance (i.e. a specific pod) with both histograms and summaries. If we
want to know the 95%ile request time aggregated across all instances /
pods however, we can only do that with histograms. This is because in
summaries, quantiles are computed ahead of time by the prometheus
client, so they can only see information for one particular app
instance. Histograms on the other hand, defer the calculation of
quantiles to query time, which means they can be aggregated (but are
less precise).

prometheus_exporter defaults to Summary[1], but in our case I think it
makes more sense to default to Histogram. There may be some apps where
we prefer Summary, so I've allowed it to be passed in as a configuration
option.

From the summary metrics we have at the moment, we can see that some
controller actions take significantly longer than the 10 seconds which
prometheus_exporter uses as it's default max bucket. Therefore, I've
added a few more buckets so we can see the distribution between 10 and
50 seconds.

[0] - https://prometheus.io/docs/practices/histograms/#quantiles
[1] - https://github.com/discourse/prometheus_exporter#histogram-mode
  • Loading branch information
richardTowers committed Sep 13, 2023
1 parent b9ec948 commit b03fe38
Showing 1 changed file with 7 additions and 1 deletion.
8 changes: 7 additions & 1 deletion lib/govuk_app_config/govuk_prometheus_exporter.rb
Original file line number Diff line number Diff line change
Expand Up @@ -11,13 +11,19 @@ def self.should_configure
end
end

def self.configure(collectors: [])
def self.configure(collectors: [], default_aggregation: Prometheus::Metric::Histogram)
return unless should_configure

require "prometheus_exporter"
require "prometheus_exporter/server"
require "prometheus_exporter/middleware"

# PrometheusExporter::Metric::Histogram.DEFAULT_BUCKETS tops out at 10 but
# we have a few controller actions which are slower than this, so we add a
# few extra buckets for slower requests
PrometheusExporter::Metric::Histogram.default_buckets = [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10, 15, 25, 50].freeze
PrometheusExporter::Metric::Base.default_aggregation = default_aggregation

if defined?(Sidekiq)
Sidekiq.configure_server do |config|
require "sidekiq/api"
Expand Down

0 comments on commit b03fe38

Please sign in to comment.