Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redis data store: Why are per-process keys required for MIN/MAX aggreggations? #1

Closed
olofhe opened this issue May 20, 2020 · 3 comments

Comments

@olofhe
Copy link

olofhe commented May 20, 2020

Hi!

I might be missing something, but I don't understand the need to keep individual Redis hash keys per process in order to support MIN and MAX aggregation of metrics.

It seems completely possible to only keep one key for the maximum or minimum reported value in Redis.

For MAX, simply just write the new value v_new to key k iff:

  • k has no old value v_old

OR

  • v_old < v_new

and the same for MIN, but with a flipped inequality sign.

AFAIK, there is no built-in Redis function to do this atomically in a single command (i.e. without the risk of some other process modifying the value in between read/write), but atomicity can easily be ensured using EVAL with a Lua script.

Here's an example ruby function that does this (extra verbose for readability).

def redis_set_if_greater(redis, key, val)
  lua_script =
    "local key = KEYS[1]; " \
    "local val = ARGV[1]; " \
    "local current_val = redis.call('get', key); " \
    "if (not current_val) or (tonumber(current_val) < tonumber(val)) then " \
      "redis.call('set', key, val); " \
    "end"
  redis.eval(lua_script, [key], [val])
end

Example program:

r = Redis.new
r.del "mykey"
[2, 1, 3, 2].each do |x|
  redis_set_if_greater(r, "mykey", x)
  puts "set #{x} => #{r.get("mykey")}"
end

Output:

set 2 => 2
set 1 => 2
set 3 => 3
set 2 => 3

Is there something I haven't considered? :)

Best,
Olof

@dmagliola
Copy link
Contributor

Hi!

Interesting question!
I'm not 100% confident on whether this is possible, but some thoughts:

  1. The main reason this wasn't done this way is not necessarily that it can't be done, but that we just didn't. In other worse, there's a lot that we haven't considered. :)
    The Redis Data Store was an experiment in performance for a data store that would share data between processes. Its performance was much worse than we would have expected, and much worse than other simpler options we had, so we abandoned it pretty early on. As such, it didn't get as much work and as much thought as the other data stores we are supporting.

  2. One thing your solution wouldn't work with is increment and decrement. If you are inc/dec'ing gauges in different processes, and you then want to aggregate them and get the MIN/MAX, it's not going to do what you wanted...
    This is similar to a discussion we had in a recent PR in the official client: Add :most_recent aggregation to DirectFileStore prometheus/client_ruby#172
    You may find that one interesting, and there may be other related insights in there.

Just out of curiosity... Are you using this Redis-based store?
I'd like to hear more about under what circumstances it it useful!

@olofhe
Copy link
Author

olofhe commented May 20, 2020

Hi!

Thanks for a very quick reply!

I hadn't considered incrementing/decrementing gauges. That indeed wouldn't work with my solution.

I'm surprised to hear that the Redis store performed so poorly! Instinctively, it felt like the more "natural" choice for a multi-process application. But I suppose I should focus on the officially supported DirectFileStore then :)

I'm very new to Prometheus. I was looking into setting up some metrics for a web application at work, using Rails and Sidekiq on a Docker Swarm, and was just looking at the various stores. So I haven't set up anything yet, and unfortunately can't speak for the usefulness of the Redis store. :)

Regards,
Olof

@dmagliola
Copy link
Contributor

Instinctively, it felt like the more "natural" choice for a multi-process application

I feel this too! So much so, that out of that instinct and the surprise of the result was born a conference talk, because I just couldn't believe it...

And yes, given your use cases, I'd recommend sticking to the "official" supported data stores until you have a more specific need. This repo was mostly for documenting experiments, results, and for anyone that might want to try the mmap store, which we don't officially support, but it's technically a bit faster than DirectFileStore

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants