Optimize incrementing values in DirectFileStore adapter #280

splattael · 2023-02-28T14:01:36Z

Introduce FileMappedDict#increment_value
Check for internal_storage only once (Process.pid is slow on Linux)
Lookup file position only once

Benchmarks

TLDR; Speed up counter and histogram by ~6-7%.

Warming up --------------------------------------
    counter perf-inc    22.380k i/100ms
Calculating -------------------------------------
    counter perf-inc    224.104k (± 0.8%) i/s -      1.141M in   5.093434s

Comparison:
    counter perf-inc:   224103.9 i/s
        counter main:   212032.1 i/s - 1.06x  slower

Warming up --------------------------------------
      gauge perf-inc    29.079k i/100ms
Calculating -------------------------------------
      gauge perf-inc    289.211k (± 1.8%) i/s -      1.454M in   5.029133s

Comparison:
          gauge main:   292329.3 i/s
      gauge perf-inc:   289210.9 i/s - same-ish: difference falls within error

Warming up --------------------------------------
  histogram perf-inc     8.522k i/100ms
Calculating -------------------------------------
  histogram perf-inc     84.796k (± 2.8%) i/s -    426.100k in   5.029627s

Comparison:
  histogram perf-inc:    84795.9 i/s
      histogram main:    80946.5 i/s - same-ish: difference falls within error

# frozen_string_literal: true

require 'prometheus/client'
require 'prometheus/client/formats/text.rb'
require 'prometheus/client/data_stores/direct_file_store.rb'
require 'pp'
require 'benchmark/ips'

FileUtils.rm_rf "/tmp/prometheus_benchmark"
FileUtils.mkdir_p "/tmp/prometheus_benchmark"

Prometheus::Client.config.data_store =
  Prometheus::Client::DataStores::DirectFileStore.new(
    dir: "/tmp/prometheus_benchmark"
  )

# pp Prometheus::Client.config

prometheus = Prometheus::Client.registry

counter = Prometheus::Client::Counter.new(:mycounter,
  docstring: 'Example counter',
  labels: [:foo])

gauge = Prometheus::Client::Gauge.new(:mygauge,
  docstring: 'Example gauge',
  labels: [:foo],
  store_settings: { aggregation: :sum })

histogram = Prometheus::Client::Histogram.new(:myhistogram,
  docstring: 'Example histogram',
  labels: [:foo],
  buckets: [0, 1, 2])

prometheus.register(counter)
prometheus.register(gauge)
prometheus.register(histogram)

labels = {'foo': 'bar'}

branch = `git rev-parse --abbrev-ref HEAD`.chomp

Benchmark.ips do |x|
  x.report("counter #{branch}") do
    counter.increment(by: 1, labels: labels)
  end

  x.compare!
  x.save! 'counter'
end

Benchmark.ips do |x|
  x.report("gauge #{branch}") do
    gauge.set(1, labels: labels)
  end

  x.compare!
  x.save! 'gauge'
end

Benchmark.ips do |x|
  x.report("histogram #{branch}") do
    histogram.observe(1, labels: labels)
  end

  x.compare!
  x.save! 'histogram'
end

* Introduce FileMappedDict#increment_value * Check for internal_storage only once (Process.pid is slow on Linux) * Lookup file position only once Signed-off-by: Peter Leitzen <peter@leitzen.de>

SuperQ

Nice

dmagliola · 2023-02-28T18:36:40Z

I'm not opposed to this change, but:

I'm not sure it actually makes a measurable performance difference, two out of 3 benchmarks report no change. Note that in my experience, testing performance of the disk store is very hard to get meaningful results. We used to use a cloudvm to get realistic / repeatable results, testing locally was all over the place.
sounds like what's you're trying to avoid is the double call to internal_store. You could avoid that by assigning the result of internal_store to a local var, then you don't need the new method in FileMappedDict.
avoiding the double lookup in the positions hash should be below measurability. Especially when compared to writing to the file...

dmagliola · 2023-02-28T18:54:52Z

Keeping in mind my comment that this may or may not be a greatly realistic test, I'm getting this on my Macbook M2 Air:

Warming up --------------------------------------
        counter main    26.202k i/100ms
Calculating -------------------------------------
        counter main    261.357k (± 0.4%) i/s -      1.310M in   5.012766s

Comparison:
    counter perf-inc:   265807.5 i/s
        counter main:   261356.7 i/s - 1.02x  slower

Warming up --------------------------------------
          gauge main    37.286k i/100ms
Calculating -------------------------------------
          gauge main    336.417k (±10.3%) i/s -      1.678M in   5.040241s

Comparison:
          gauge main:   336417.0 i/s
      gauge perf-inc:   320784.3 i/s - same-ish: difference falls within error

Warming up --------------------------------------
      histogram main     9.862k i/100ms
Calculating -------------------------------------
      histogram main     98.633k (± 1.2%) i/s -    493.100k in   5.000117s

Comparison:
  histogram perf-inc:   101441.5 i/s
      histogram main:    98633.1 i/s - 1.03x  slower

So, 3% faster on histograms, 2% faster on counter, same on gauges.
Second run: 2% on counter, same on gauges and hist
On a third run, I got a freakishly high result for gauges (23%) 😳

I'm not sure how much to trust this benchmark... :/

Doing:

              st = internal_store
              value = st.read_value(key)
              st.write_value(key, value + by.to_f)

(so, sidestepping the double call to pid)

I get more or less the same result (so, either "same", or within 1 to 4% improvement to performance)

I'm going to try to see if I remember how my "heavy" benchmarking code worked, and see what results I get, but again, keeping in mind that testing on a Macbook is probably not super representative.

In the meantime, i'd recommend doing just the change to sidestep the double call to internal_store, which adds almost zero complexity, and doesn't hurt, really :)

dmagliola · 2023-02-28T23:30:58Z

Ok, I've changed my mind :)

I've run the datastores benchmarks, on a somewhat random m6i.2xlarge on EC2 that I have at hand, and they seem to be consistently faster. About 6% indeed.

Doing the caching of the variable as I was suggesting is also faster than baseline, but only like 2% (ish)

These seem pretty repeatable, and they consistently give me those results.

So I'm convinced, we should merge this.

@Sinjo, any objections to this?

dmagliola · 2023-02-28T23:33:42Z

Also, sorry, what I should've started with.

Thank you for looking into this and for the PR @splattael! :D

splattael · 2023-03-01T06:27:34Z

@dmagliola Thanks a ton for reviewing and merging 🙇

You are right regarding benchmarks - they were flaky for me locally too (on Linux) but in the end they gave me overall speed improvement after running several times. This improvement makes sense (not only because of side-stepping Process.pid) but also having less calls.

BTW, I am planning to reduce the amount of getpid (via Process.pid) syscalls by setting the PID on startup and override Kernel.fork (or Process._fork in Ruby 3.1+). See https://bugs.ruby-lang.org/issues/5446 and https://bugs.ruby-lang.org/issues/17795 for more context.

👋 @SuperQ ❤️

splattael · 2023-03-01T06:56:19Z

In the meantime, i'd recommend doing just the change to sidestep the double call to internal_store, which adds almost zero complexity, and doesn't hurt, really :)

I had this very refactoring ready in another branch but saw less speed improvements. 😅

I should have provided more context before, sorry about 🙇

dmagliola · 2023-03-04T11:18:25Z

No worries, and thanks for the PR!
I'm going to wait on @Sinjo to confirm he's cool with this, but this should go out in the next release :)

I am planning to reduce the amount of getpid (via Process.pid) syscalls by setting the PID on startup and override Kernel.fork (or Process._fork in Ruby 3.1+).

I may be misreading the docs here, but is this a method intended to be monkeypatched? 😅

That makes me a bit uncomfortable, but it seems like it's designed precisely to do that...
It's unfortunate the underscore in the name makes it hard to google for examples, all I can find is the test case where it gets introduced...

So I guess we could do that prepend in the constructor of DirectFileStore.

Would that look like:

def _fork
  super
  DirectFileStore.process_pid = Process.pid 
end

Am I understanding that correctly?

splattael · 2023-03-05T11:20:27Z

@dmagliola Yes, is the way I understood it as well 👍

It seems that Rails, for example, already does exactly this for Ruby 3.1+. See https://github.com/rails/rails/blob/v7.0.2/activesupport/lib/active_support/fork_tracker.rb#L50

dmagliola · 2023-03-05T11:39:39Z

Ah, that's amazing! You found exactly the kind of example I was hoping to get! 🙌
That sounds like a sensible thing to do, and i'm looking forward to the PR!

NOTE: I would do "half" of what Rails is doing. I'd prepend if _fork is defined, but if not (if Ruby < 3.1), i'd leave it alone and continue checking for Process.pid on every increment. So you get the ~4% perf boost if you're on a modern Ruby version only.

The logic here being that _fork is intended for you to do this, but the other 3 aren't. I'm open minded about this. The argument "look, Rails does it and here's exactly how to do it safely" is pretty compelling, I just think we should limit how much we pollute the global namespace with "surprising" stuff.

Sinjo

Sorry for the slow reply on my side. This all looks great!

I agree with @dmagliola about the _fork extension point - if it's there we should use it, and if it's not then people should upgrade their Ruby version if they want a performance boost. I'd rather not be the reason somebody ends up with weird behaviour in their application.

Open to changing my mind on that if we can convince ourselves really thoroughly that the options for older Rubies are safe.

dmagliola · 2023-03-06T20:07:42Z

Thank you @splattael for the perf improvement, and looking forward to future PRs! 🙌

splattael · 2023-03-07T20:10:15Z

@dmagliola @Sinjo I agree! We should use _fork automatically if available 👍

I wonder, though, if we could provide a way for users to opt-in to use cached process ids (via 🐒 patching Process.fork)? 🤔

So:

For Ruby 3.1+ - cached process id on by default via _fork hook
For <Ruby 3.1 - off by default but opt-in allowed if setup code is called explicitly

WDYT?

Optimize incrementing values in DirectFileStore adapter

3ee337e

* Introduce FileMappedDict#increment_value * Check for internal_storage only once (Process.pid is slow on Linux) * Lookup file position only once Signed-off-by: Peter Leitzen <peter@leitzen.de>

splattael force-pushed the perf-inc branch from 4c89123 to 3ee337e Compare February 28, 2023 14:03

SuperQ approved these changes Feb 28, 2023

View reviewed changes

dmagliola approved these changes Feb 28, 2023

View reviewed changes

Sinjo approved these changes Mar 6, 2023

View reviewed changes

dmagliola merged commit 2022a41 into prometheus:main Mar 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize incrementing values in DirectFileStore adapter #280

Optimize incrementing values in DirectFileStore adapter #280

splattael commented Feb 28, 2023 •

edited

Loading

SuperQ left a comment

dmagliola commented Feb 28, 2023

dmagliola commented Feb 28, 2023 •

edited

Loading

dmagliola commented Feb 28, 2023

dmagliola commented Feb 28, 2023

splattael commented Mar 1, 2023 •

edited

Loading

splattael commented Mar 1, 2023

dmagliola commented Mar 4, 2023

splattael commented Mar 5, 2023

dmagliola commented Mar 5, 2023 •

edited

Loading

Sinjo left a comment

dmagliola commented Mar 6, 2023

splattael commented Mar 7, 2023

Optimize incrementing values in DirectFileStore adapter #280

Optimize incrementing values in DirectFileStore adapter #280

Conversation

splattael commented Feb 28, 2023 • edited Loading

Benchmarks

SuperQ left a comment

Choose a reason for hiding this comment

dmagliola commented Feb 28, 2023

dmagliola commented Feb 28, 2023 • edited Loading

dmagliola commented Feb 28, 2023

dmagliola commented Feb 28, 2023

splattael commented Mar 1, 2023 • edited Loading

splattael commented Mar 1, 2023

dmagliola commented Mar 4, 2023

splattael commented Mar 5, 2023

dmagliola commented Mar 5, 2023 • edited Loading

Sinjo left a comment

Choose a reason for hiding this comment

dmagliola commented Mar 6, 2023

splattael commented Mar 7, 2023

splattael commented Feb 28, 2023 •

edited

Loading

dmagliola commented Feb 28, 2023 •

edited

Loading

splattael commented Mar 1, 2023 •

edited

Loading

dmagliola commented Mar 5, 2023 •

edited

Loading