Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add :most_recent aggregation to DirectFileStore #172

Merged
merged 5 commits into from
Jun 22, 2020

Conversation

stefansundin
Copy link
Contributor

Add :most_recent aggregation to DirectFileStore, which reports the value that was set by a process most recently.

I am using this for a small app called github-activity, and it tracks the remaining GitHub ratelimit using Prometheus (this is the only metric at the moment). The existing aggregations didn't really fit my use case, since for this use case, the last reported value from GitHub is the one that I care about. So I decided to try to improve the gem to support my use case, and I came up with this.

Please let me know about nitpicks and other things. I wasn't sure what the best name would be for this aggregation, and I was basically picking between :latest and :most_recent. And I am unsure if this should be using CLOCK_MONOTONIC or if Time.now.to_f is Ok.

Thanks!

Useful links:

@coveralls
Copy link

Coverage Status

Coverage remained the same at 100.0% when pulling 207654d on stefansundin:add-most_recent-aggregation into 7129453 on prometheus:master.

@coveralls
Copy link

coveralls commented Dec 23, 2019

Coverage Status

Coverage remained the same at 100.0% when pulling 826b32e on stefansundin:add-most_recent-aggregation into 204d8c9 on prometheus:master.

@stefansundin
Copy link
Contributor Author

Hi @SuperQ @Sinjo @dmagliola. Feel free to let me know what you think. Thanks.

@SuperQ
Copy link
Member

SuperQ commented Dec 24, 2019

Thinking about the comparison of gauge values are handled in other languages, and when in single-threaded mode, "most recent" is the standard way to do things. I wonder if this should be the default way multi-process gauges work.

@brian-brazil
Copy link

I wonder if this should be the default way multi-process gauges work.

It requires cross-process locking, and we're trying to avoid that on performance grounds.

@dmagliola
Copy link
Collaborator

First of all, thank you for this PR! This is a great contribution :-D
So, I have a number of thoughts on this. Not necessarily in order...

  • I like the idea, since it's something we were thinking of doing for our own purposes. @lawrencejones is this more or less what you had in mind? Are you happy with this approach?

  • Also @lawrencejones or @Sinjo do you have an opinion on CLOCK_MONOTONIC vs Time.now.to_f? You both have a clearer idea of this than I do.

  • I was a little concerned about the performance of this, but it seems to be roughly the same as what we currently have (2% to 3% slower, maybe), so 👍 on that. Files are probably going to be a bit bigger, but it doesn't seem to be a huge deal.

  • @SuperQ I kinda like the idea of making this the default, because our current default (all) creates a lot of time series, potentially unnecessarily. My main worry is that that's a backwards-incompatible change, so i'm not sure we can introduce that. Maybe we should start a list of breaking changes that we'd like to introduce next time we release a major version, and start with that one?

  • @brian-brazil Unless i'm misunderstanding your comment, this shouldn't be a problem the way this is implemented. This is not locking across processes. Instead, each value is paired with the timestamp of when it was set, and which one was latest gets resolved at scrape time, which I thought was quite an elegant approach. Does that alleviate that concern, or did I misunderstand?

  • @stefansundin The thing that gives me pause is that, in the cases where we wanted this feature, we got around it by either doing a min aggregation on the data store, or on the Prometheus Query. In your case, it feels like this would do the trick too, right? The "latest" and the "minimum" should be the same given that it's a rate limit that presumably decreases over time? In any case, you'd need the latest value across all your servers, which this wouldn't be able to give you, so you'd still need the aggregation at the query level. With that in mind, do we need this feature?

Finally, Merry Xmas to all of you :-)

@@ -182,13 +185,15 @@ def process_id

def aggregate_values(values)
if @values_aggregation_mode == SUM
values.inject { |sum, element| sum + element }
values.map { |element| element[0] }.inject { |sum, value| sum + value }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all 3 of these, could we map(&:first) and then do the same thing we were doing before?
That feels a lot more readable to me.

(so: values.map(&:first).inject { |sum, element| sum + element }, values.map(&:first).max, etc)
I'm half wondering why we did this inject before, instead of just calling .sum... Is there a large performance penalty to map? Could we measure that?
The reason I mention that is that we already have some users with performance problems on scrape, and i'd thus like to make this as fast as possible.

If map is much slower, we should probably change this first one to the same inject as before, but the body of the block to sum + element[0]...
And i'm not sure about the ones below, whether that's the fastest way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am very open to changing this code. I mostly just wanted to get it working without considering performance that much.

@stefansundin
Copy link
Contributor Author

First of all, thank you for this PR! This is a great contribution :-D

That's a great thing to hear.

  • @stefansundin The thing that gives me pause is that, in the cases where we wanted this feature, we got around it by either doing a min aggregation on the data store, or on the Prometheus Query. In your case, it feels like this would do the trick too, right? The "latest" and the "minimum" should be the same given that it's a rate limit that presumably decreases over time? In any case, you'd need the latest value across all your servers, which this wouldn't be able to give you, so you'd still need the aggregation at the query level. With that in mind, do we need this feature?

So my ratelimit starts at 5000 requests/hour, and decreases, that is correct. But the ratelimit is reset every hour, so if I used min then I'd always see the lowest value even after the reset. If I have multiple servers, they would probably have their own request IP from GitHub's point of view, and in this case the ratelimit would be independent, so I would need to track all of them individually. This change lets me omit metric series that I already know are "wrong" (in the sense that I already know, at scrape-time, that the information is out of date).

Finally, Merry Xmas to all of you :-)

Merry Christmas to all of you too! :)

@brian-brazil
Copy link

This is not locking across processes. Instead, each value is paired with the timestamp of when it was set, and which one was latest gets resolved at scrape time, which I thought was quite an elegant approach.

I hadn't looked at the code, I was responding to @SuperQ's comment. The issue with this approach is that it only works for set, not inc/dec. Thus it cannot be the default.

@lawrencejones
Copy link
Contributor

I think banning inc/dec on this aggregation is probably the way to go. Hell, we could raise for the moment on inc/dec while allowing set, then add the inter-process locking for inc/dec later if people really want it.

@brian-brazil
Copy link

If you want inc/dec then there's no need for inter-process coordination, each process can keep it locally and we sum on exposition.

@dmagliola
Copy link
Collaborator

That's a really good point on inc/dec, Brian, thanks for raising that.

@lawrencejones I like your solution on first sight, but i'm a bit worried of two things:

  • the store behaves somewhat differently when observing based on the aggregation method you choose, which feels a bit weird to me.
  • more importantly, i'm a bit hesitant about raising an exception, since it's highly possible you won't find out about it until it's running in production. I can very easily see how you'd use the in-memory stores for dev/test, and only use the file store in prod. (And the reason this distinction is important is that the only store that supports aggregation is the FileStore, the others wouldn't know they should raise)

Side thought... Would / could this be solved by "Custom Collectors", instead?
I'm thinking of the value to be exposed getting calculated when the scrape happens, instead of storing separate values for each process, and then having to resolve which one is the latest...

@Sinjo
Copy link
Member

Sinjo commented Jan 28, 2020

do you have an opinion on CLOCK_MONOTONIC vs Time.now.to_f? You both have a clearer idea of this than I do.

I think either is fine for this use-case. Each has their caveats, but I think CLOCK_MONOTONIC is ever so slightly better for what we're doing and I've included my reasoning below.

The caveat I had to read up on with CLOCK_MONOTONIC was whether it was consistent between different processes. It turns out that as long as you're comparing values from the same boot of the same computer, you're fine. If you compare values from different computers or from different boots of the same computer, they will be meaningless.

Time.now.to_f will be susceptible to clock adjustments by ntp, but not to DST changes. Just tested with two irb sessions, leaving one in London time and setting TZ=Europe/Paris in the env of the other. There's no 1 hour offset between them (unlike Time.now by itself).

@dmagliola
Copy link
Collaborator

Hello @stefansundin,
Sorry about the delay in responding.

So, I think we should merge this feature, but i'd like to ask for a few changes, because as it stands, it's quite possible to use this in a way that will do something the user definitely didn't mean.

Could we have:

  • a validation in validate_metric_settings that makes sure most_recent can only be used for Gauges? Using this aggregation with any other metric type is almost certainly not what the user intended, and it'll result in counters that go up and down, and completely inconsistent histograms.
  • a validation on increment that throws an exception if the aggregation type is most_recent. This will also result in not what the user wants.
  • documentation in the README of this aggregation mode, when it's useful, and a big note that it's only useful for Gauges, and that it doesn't allow using increment/decrement
  • i'd get rid of the filename prefix / renaming for this PR. The code so far assumes you're starting from an empty directory, and having that in this PR mixes the concerns we're trying to solve.
  • see my comment on adding a comment above aggregate_values documenting what it returns, which should make it much easier to follow the code. Also the code about mapping / how to aggregate the values.
  • let's switch to CLOCK_MONOTONIC as per @Sinjo's comment above.

Notes on making this the default:

  • SInce this has unobvious caveats, like the fact that you can't use inc/dec on gauges that have this aggregation, and that you might only find about that in production if your test environment doesn't use the FileStore, I would like to make it a manual option that someone has to find out about by reading the docs, at which point we warn them about this. If this is the default, we really need to make inc/dec work, which is quite a lot of work if we don't want to be locking between processes.

Thank you!

@stefansundin
Copy link
Contributor Author

@dmagliola I dropped the commit with the filename prefix change. I'd highly recommend testing the behavior for loading files with the old format. If there isn't any work to prevent it from happening, then I will bet that you will get reports of crashes (or whatever the behavior will be) from confused users. Hopefully only from dev machines and not from production.

As for the rest, unfortunately I am short on time to work on this at the moment. Can you work on addressing the rest of the changes? This branch is allowing edits from maintainers, so if you want you can push new commits to it and then merge when desired. Thanks.

@dmagliola dmagliola force-pushed the add-most_recent-aggregation branch 3 times, most recently from 3a0ab34 to 4c00fe9 Compare February 25, 2020 11:24
@dmagliola
Copy link
Collaborator

Alright, i've made all those changes mentioned above. It's now pending review and we'll merge this and cut a new minor version

Copy link
Member

@Sinjo Sinjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few small changes and one thought on performance. Otherwise LGTM!

@@ -32,6 +32,31 @@
metric_type: :counter,
metric_settings: { some_setting: true })
end.to raise_error(Prometheus::Client::DataStores::DirectFileStore::InvalidStoreSettingsError)

# :most_recent aggregation can only be used for gauges
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer this to be done with RSpec methods describing the tests rather than a single comment at the top of a bunch of expects.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not really sure what shape the code you're thinking of has, sorry.
to me this is doing that, if we remove the comment, this is pretty much how the rest of the test case is written.

Can you mock a quick example of what you mean by "done with RSpec methods"?

metric_store2.set(labels: { foo: "baz" }, val: 2)
allow(Process).to receive(:pid).and_return(12345)
metric_store1.set(labels: { foo: "baz" }, val: 4)
allow(Process).to receive(:pid).and_return(23456)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what these last two examples are here to test. I think all the behaviour is covered by the ones above?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe i should leave only one of those two. What this is testing is a labelset that only exists in one of the stores, all the others are in both.

stefansundin and others added 3 commits June 22, 2020 17:37
This reports the value that was set by a process most recently.

The way this works is by tagging each value in the files with the
timestamp of when they were set.

For all existing aggregations, we ignore that timestamp and do
what we've been doing so far.
For `:most_recent`, we take the "maximum" entry according to
its timestamp (i.e. the latest) and then return its value

Signed-off-by: Stefan Sundin <stefan@stefansundin.com>
Signed-off-by: Daniel Magliola <danielmagliola@gocardless.com>
Using this aggregation with any other metric type is almost certainly
not what the user intended, and it'll result in counters that go up
and down, and completely inconsistent histograms.

Signed-off-by: Daniel Magliola <danielmagliola@gocardless.com>
…egation

If we do this, we'd be incrementing the value for *this* process, not the
global one, which is almost certainly not what the user wants to do.

This is not very pretty because we may end up raising an exception in
production (as test/dev tend to not use DirectFileStore), but we consider
it better than letting the user mangle their numbers and end up with
incorrect metrics.

Signed-off-by: Daniel Magliola <danielmagliola@gocardless.com>
Daniel Magliola added 2 commits June 22, 2020 17:39
The Monotonic clock is going to be more accurate on the few cases where
the distinction matters, but it's also somehow faster than `Time.now`.

Signed-off-by: Daniel Magliola <danielmagliola@gocardless.com>
Instead of two `read` operations, we can do both together at once

Signed-off-by: Daniel Magliola <danielmagliola@gocardless.com>
@dmagliola dmagliola force-pushed the add-most_recent-aggregation branch from 2c02e21 to 826b32e Compare June 22, 2020 16:39
@dmagliola dmagliola merged commit c1d9acd into prometheus:master Jun 22, 2020
draftcode added a commit to draftcode/client_python that referenced this pull request Oct 17, 2023
In the multiprocess mode, the process that expose the metrics need to
aggregate the samples from other processes. Gauge metric allows users to
choose the aggregation mode. This implements 'mostrecent' (and
'livemostrecent') mode where the last observed value is exposed.

In order to support this, the file format is expanded to store the
timestamps in addition to the values. The stored timestamps are read by
the reader process and it's used to find the latest value.

The timestamp itself is exposed as a part of Prometheus exposition
(https://github.com/prometheus/docs/blob/main/content/docs/instrumenting/exposition_formats.md).
This allows further aggregation across exporters.

Closes prometheus#847

Consideration on the atomicity:

Previously, mmap_dict.py had a comment saying "We assume that reading
from an 8 byte aligned value is atomic". With this change, the value
write becomes a 16 bytes 8-byte aligned write. The code author tried to
find a basis on the original assumption, but couldn't find any.
According to write(2), **if a file descriptor is shared**, the write
becomes atomic. However, we do not share the file descriptors in the
current architecture.

Considering that Ruby implementation also does the same and hadn't seen
an issue with it, this write atomicity problem might be practically not
an issue.

See also:

* prometheus/client_ruby#172

  The approach and naming are taken from client_ruby.

* https://github.com/prometheus/client_golang/blob/v1.17.0/prometheus/metric.go#L149-L161

  client_golang has an API for setting timestamp already. It explains
  the use case for the timestamp beyond the client-local aggregation. In
  order to support the same use case in Python, further changes are
  needed.
draftcode added a commit to draftcode/client_python that referenced this pull request Oct 17, 2023
In the multiprocess mode, the process that expose the metrics need to
aggregate the samples from other processes. Gauge metric allows users to
choose the aggregation mode. This implements 'mostrecent' (and
'livemostrecent') mode where the last observed value is exposed.

In order to support this, the file format is expanded to store the
timestamps in addition to the values. The stored timestamps are read by
the reader process and it's used to find the latest value.

The timestamp itself is exposed as a part of Prometheus exposition
(https://github.com/prometheus/docs/blob/main/content/docs/instrumenting/exposition_formats.md).
This allows further aggregation across exporters.

Closes prometheus#847

Consideration on the atomicity:

Previously, mmap_dict.py had a comment saying "We assume that reading
from an 8 byte aligned value is atomic". With this change, the value
write becomes a 16 bytes 8-byte aligned write. The code author tried to
find a basis on the original assumption, but couldn't find any.
According to write(2), **if a file descriptor is shared**, the write
becomes atomic. However, we do not share the file descriptors in the
current architecture.

Considering that Ruby implementation also does the same and hadn't seen
an issue with it, this write atomicity problem might be practically not
an issue.

See also:

* prometheus/client_ruby#172

  The approach and naming are taken from client_ruby.

* https://github.com/prometheus/client_golang/blob/v1.17.0/prometheus/metric.go#L149-L161

  client_golang has an API for setting timestamp already. It explains
  the use case for the timestamp beyond the client-local aggregation. In
  order to support the same use case in Python, further changes are
  needed.

Signed-off-by: Masaya Suzuki <draftcode@gmail.com>
draftcode added a commit to draftcode/client_python that referenced this pull request Oct 17, 2023
In the multiprocess mode, the process that exposes the metrics needs to
aggregate the samples from other processes. Gauge metric allows users to
choose the aggregation mode. This implements 'mostrecent' (and
'livemostrecent') mode where the last observed value is exposed.

In order to support this, the file format is expanded to store the
timestamps in addition to the values. The stored timestamps are read by
the reader process and it's used to find the latest value.

The timestamp itself is exposed as a part of Prometheus exposition
(https://github.com/prometheus/docs/blob/main/content/docs/instrumenting/exposition_formats.md).
This allows further aggregation across exporters.

Closes prometheus#847

Consideration on the atomicity:

Previously, mmap_dict.py had a comment saying "We assume that reading
from an 8 byte aligned value is atomic". With this change, the value
write becomes a 16 bytes 8-byte aligned write. The code author tried to
find a basis on the original assumption, but couldn't find any.
According to write(2), **if a file descriptor is shared**, the write
becomes atomic. However, we do not share the file descriptors in the
current architecture.

Considering that Ruby implementation also does the same and hadn't seen
an issue with it, this write atomicity problem might be practically not
an issue.

See also:

* prometheus/client_ruby#172

  The approach and naming are taken from client_ruby.

* https://github.com/prometheus/client_golang/blob/v1.17.0/prometheus/metric.go#L149-L161

  client_golang has an API for setting timestamp already. It explains
  the use case for the timestamp beyond the client-local aggregation. In
  order to support the same use case in Python, further changes are
  needed.

Signed-off-by: Masaya Suzuki <draftcode@gmail.com>
draftcode added a commit to draftcode/client_python that referenced this pull request Oct 19, 2023
In the multiprocess mode, the process that exposes the metrics needs to
aggregate the samples from other processes. Gauge metric allows users to
choose the aggregation mode. This implements 'mostrecent' (and
'livemostrecent') mode where the last observed value is exposed.

In order to support this, the file format is expanded to store the
timestamps in addition to the values. The stored timestamps are read by
the reader process and it's used to find the latest value.

Closes prometheus#847

Consideration on the atomicity:

Previously, mmap_dict.py had a comment saying "We assume that reading
from an 8 byte aligned value is atomic". With this change, the value
write becomes a 16 bytes 8-byte aligned write. The code author tried to
find a basis on the original assumption, but couldn't find any.
According to write(2), **if a file descriptor is shared**, the write
becomes atomic. However, we do not share the file descriptors in the
current architecture.

Considering that Ruby implementation also does the same and hadn't seen
an issue with it, this write atomicity problem might be practically not
an issue.

See also:

* prometheus/client_ruby#172

  The approach and naming are taken from client_ruby.

* https://github.com/prometheus/client_golang/blob/v1.17.0/prometheus/metric.go#L149-L161

  client_golang has an API for setting timestamp already. It explains
  the use case for the timestamp beyond the client-local aggregation. In
  order to support the same use case in Python, further changes are
  needed.

Signed-off-by: Masaya Suzuki <draftcode@gmail.com>
draftcode added a commit to draftcode/client_python that referenced this pull request Oct 19, 2023
In the multiprocess mode, the process that exposes the metrics needs to
aggregate the samples from other processes. Gauge metric allows users to
choose the aggregation mode. This implements 'mostrecent' (and
'livemostrecent') mode where the last observed value is exposed.

In order to support this, the file format is expanded to store the
timestamps in addition to the values. The stored timestamps are read by
the reader process and it's used to find the latest value.

Closes prometheus#847

Consideration on the atomicity:

Previously, mmap_dict.py had a comment saying "We assume that reading
from an 8 byte aligned value is atomic". With this change, the value
write becomes a 16 bytes 8-byte aligned write. The code author tried to
find a basis on the original assumption, but couldn't find any.
According to write(2), **if a file descriptor is shared**, the write
becomes atomic. However, we do not share the file descriptors in the
current architecture.

Considering that Ruby implementation also does the same and hadn't seen
an issue with it, this write atomicity problem might be practically not
an issue.

See also:

* prometheus/client_ruby#172

  The approach and naming are taken from client_ruby.

* https://github.com/prometheus/client_golang/blob/v1.17.0/prometheus/metric.go#L149-L161

  client_golang has an API for setting timestamp already. It explains
  the use case for the timestamp beyond the client-local aggregation. In
  order to support the same use case in Python, further changes are
  needed.

Signed-off-by: Masaya Suzuki <draftcode@gmail.com>
draftcode added a commit to draftcode/client_python that referenced this pull request Oct 23, 2023
In the multiprocess mode, the process that exposes the metrics needs to
aggregate the samples from other processes. Gauge metric allows users to
choose the aggregation mode. This implements 'mostrecent' (and
'livemostrecent') mode where the last observed value is exposed.

In order to support this, the file format is expanded to store the
timestamps in addition to the values. The stored timestamps are read by
the reader process and it's used to find the latest value.

Closes prometheus#847

Consideration on the atomicity:

Previously, mmap_dict.py had a comment saying "We assume that reading
from an 8 byte aligned value is atomic". With this change, the value
write becomes a 16 bytes 8-byte aligned write. The code author tried to
find a basis on the original assumption, but couldn't find any.
According to write(2), **if a file descriptor is shared**, the write
becomes atomic. However, we do not share the file descriptors in the
current architecture.

Considering that Ruby implementation also does the same and hadn't seen
an issue with it, this write atomicity problem might be practically not
an issue.

See also:

* prometheus/client_ruby#172

  The approach and naming are taken from client_ruby.

* https://github.com/prometheus/client_golang/blob/v1.17.0/prometheus/metric.go#L149-L161

  client_golang has an API for setting timestamp already. It explains
  the use case for the timestamp beyond the client-local aggregation. In
  order to support the same use case in Python, further changes are
  needed.

Signed-off-by: Masaya Suzuki <draftcode@gmail.com>
draftcode added a commit to draftcode/client_python that referenced this pull request Oct 24, 2023
In the multiprocess mode, the process that exposes the metrics needs to
aggregate the samples from other processes. Gauge metric allows users to
choose the aggregation mode. This implements 'mostrecent' (and
'livemostrecent') mode where the last observed value is exposed.

In order to support this, the file format is expanded to store the
timestamps in addition to the values. The stored timestamps are read by
the reader process and it's used to find the latest value.

Closes prometheus#847

Consideration on the atomicity:

Previously, mmap_dict.py had a comment saying "We assume that reading
from an 8 byte aligned value is atomic". With this change, the value
write becomes a 16 bytes 8-byte aligned write. The code author tried to
find a basis on the original assumption, but couldn't find any.
According to write(2), **if a file descriptor is shared**, the write
becomes atomic. However, we do not share the file descriptors in the
current architecture.

Considering that Ruby implementation also does the same and hadn't seen
an issue with it, this write atomicity problem might be practically not
an issue.

See also:

* prometheus/client_ruby#172

  The approach and naming are taken from client_ruby.

* https://github.com/prometheus/client_golang/blob/v1.17.0/prometheus/metric.go#L149-L161

  client_golang has an API for setting timestamp already. It explains
  the use case for the timestamp beyond the client-local aggregation. In
  order to support the same use case in Python, further changes are
  needed.

Signed-off-by: Masaya Suzuki <draftcode@gmail.com>
draftcode added a commit to draftcode/client_python that referenced this pull request Oct 24, 2023
In the multiprocess mode, the process that exposes the metrics needs to
aggregate the samples from other processes. Gauge metric allows users to
choose the aggregation mode. This implements 'mostrecent' (and
'livemostrecent') mode where the last observed value is exposed.

In order to support this, the file format is expanded to store the
timestamps in addition to the values. The stored timestamps are read by
the reader process and it's used to find the latest value.

Closes prometheus#847

Consideration on the atomicity:

Previously, mmap_dict.py had a comment saying "We assume that reading
from an 8 byte aligned value is atomic". With this change, the value
write becomes a 16 bytes 8-byte aligned write. The code author tried to
find a basis on the original assumption, but couldn't find any.
According to write(2), **if a file descriptor is shared**, the write
becomes atomic. However, we do not share the file descriptors in the
current architecture.

Considering that Ruby implementation also does the same and hadn't seen
an issue with it, this write atomicity problem might be practically not
an issue.

See also:

* prometheus/client_ruby#172

  The approach and naming are taken from client_ruby.

* https://github.com/prometheus/client_golang/blob/v1.17.0/prometheus/metric.go#L149-L161

  client_golang has an API for setting timestamp already. It explains
  the use case for the timestamp beyond the client-local aggregation. In
  order to support the same use case in Python, further changes are
  needed.

Signed-off-by: Masaya Suzuki <draftcode@gmail.com>
csmarchbanks pushed a commit to prometheus/client_python that referenced this pull request Oct 24, 2023
In the multiprocess mode, the process that exposes the metrics needs to
aggregate the samples from other processes. Gauge metric allows users to
choose the aggregation mode. This implements 'mostrecent' (and
'livemostrecent') mode where the last observed value is exposed.

In order to support this, the file format is expanded to store the
timestamps in addition to the values. The stored timestamps are read by
the reader process and it's used to find the latest value.

Closes #847

Consideration on the atomicity:

Previously, mmap_dict.py had a comment saying "We assume that reading
from an 8 byte aligned value is atomic". With this change, the value
write becomes a 16 bytes 8-byte aligned write. The code author tried to
find a basis on the original assumption, but couldn't find any.
According to write(2), **if a file descriptor is shared**, the write
becomes atomic. However, we do not share the file descriptors in the
current architecture.

Considering that Ruby implementation also does the same and hadn't seen
an issue with it, this write atomicity problem might be practically not
an issue.

See also:

* prometheus/client_ruby#172

  The approach and naming are taken from client_ruby.

* https://github.com/prometheus/client_golang/blob/v1.17.0/prometheus/metric.go#L149-L161

  client_golang has an API for setting timestamp already. It explains
  the use case for the timestamp beyond the client-local aggregation. In
  order to support the same use case in Python, further changes are
  needed.

Signed-off-by: Masaya Suzuki <draftcode@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants