Default to Prometheus histograms, not summaries

Prometheus histograms / summaries are complex and hard to wrap one's head around. The difference between them is also quite subtle and confusing to the uninitiated. I'm sure I won't do a good job of explaining the difference in this commit message, so I'll just link to the docs[0]. The key difference for us is that summaries can't be aggregates, but histograms can. For example, if I'm looking at http_request_duration_seconds, we might want to know "what's the 95%ile request time for this controller action". We can answer this question for a particular application instance (i.e. a specific pod) with both histograms and summaries. If we want to know the 95%ile request time aggregated across all instances / pods however, we can only do that with histograms. This is because in summaries, quantiles are computed ahead of time by the prometheus client, so they can only see information for one particular app instance. Histograms on the other hand, defer the calculation of quantiles to query time, which means they can be aggregated (but are less precise). prometheus_exporter defaults to Summary[1], but in our case I think it makes more sense to default to Histogram. There may be some apps where we prefer Summary, so I've allowed it to be passed in as a configuration option. From the summary metrics we have at the moment, we can see that some controller actions take significantly longer than the 10 seconds which prometheus_exporter uses as it's default max bucket. Therefore, I've added a few more buckets so we can see the distribution between 10 and 50 seconds. [0] - https://prometheus.io/docs/practices/histograms/#quantiles [1] - https://github.com/discourse/prometheus_exporter#histogram-mode
alphagov · Sep 13, 2023 · b03fe38 · b03fe38
1 parent b9ec948
commit b03fe38
Showing 1 changed file with 7 additions and 1 deletion.
diff --git a/lib/govuk_app_config/govuk_prometheus_exporter.rb b/lib/govuk_app_config/govuk_prometheus_exporter.rb
@@ -11,13 +11,19 @@ def self.should_configure
     end
   end
 
-  def self.configure(collectors: [])
+  def self.configure(collectors: [], default_aggregation: Prometheus::Metric::Histogram)
     return unless should_configure
 
     require "prometheus_exporter"
     require "prometheus_exporter/server"
     require "prometheus_exporter/middleware"
 
+    # PrometheusExporter::Metric::Histogram.DEFAULT_BUCKETS tops out at 10 but
+    # we have a few controller actions which are slower than this, so we add a
+    # few extra buckets for slower requests
+    PrometheusExporter::Metric::Histogram.default_buckets = [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10, 15, 25, 50].freeze
+    PrometheusExporter::Metric::Base.default_aggregation = default_aggregation
+
     if defined?(Sidekiq)
       Sidekiq.configure_server do |config|
         require "sidekiq/api"