-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(pubsub): additional buckets to consume_latency prom metrics #697
Conversation
bdd7e20
to
aba3ae2
Compare
pubsub/gcloud/aio/pubsub/metrics.py
Outdated
@@ -35,6 +35,8 @@ | |||
namespace=_NAMESPACE, | |||
subsystem=_SUBSYSTEM, | |||
unit='seconds', | |||
buckets=(.005, .01, .025, .05, .075, .1, .25, .5, .75, 1.0, 2.5, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can probably trim off some of these smaller ones, that level of granularity is going to be dominated just in transit time anyway. Maybe just do something like:
buckets=(.01, .1, .25, .5, 1.0, 2.5, 5.0, 7.5, 10.0, 20.0, 30.0, float('inf'))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: while you're here, float('inf')
is unnecessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it? I've seen it used in the prometheus libraries themselves as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we leave it out elsewhere and still receive values for those buckets. Interesting though, do you have a link? I only mention to avoid confusion in the places we've omitted including this, but it's fine if we want to keep it to be explicit here.
edit: for reference https://github.com/prometheus/client_python/blob/4535ce0f43097aa48e44a65747d82064f2aadaf5/prometheus_client/metrics.py#L618-L619
But I also see they explicitly mention inf here: https://github.com/prometheus/client_python/blob/4535ce0f43097aa48e44a65747d82064f2aadaf5/prometheus_client/metrics.py#L586
I conclude with no conclusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second link is the one I saw
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe they always add an inf bucket at the end if it's missing, so including it ourselves is a noop/stylistic choice.
cf40913
to
b2e0f9b
Compare
@@ -35,6 +35,8 @@ | |||
namespace=_NAMESPACE, | |||
subsystem=_SUBSYSTEM, | |||
unit='seconds', | |||
buckets=(.01, .1, .25, .5, 1.0, 2.5, 5.0, 7.5, 10.0, 20.0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The defaullt buckets seem to be:
from prometheus_client import Histogram
# Create a Histogram without specifying buckets to see the default ones
histogram = Histogram('request_duration_seconds', 'Description of histogram', unit='seconds')
# Access the default buckets
default_buckets = histogram._upper_bounds
default_buckets
Results:
[0.005,
0.01,
0.025,
0.05,
0.075,
0.1,
0.25,
0.5,
0.75,
1.0,
2.5,
5.0,
7.5,
10.0,
inf]
Given the 2 plus minutes we're being reported, we might even need to have bigger buckets, say, 60, 90 and 120.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be fine going up to 2 minutes as well. I'm not sure the 90 second bucket is necessarily useful. For our purposes, if we're something like 30 seconds behind then we're definitely way too far behind already, I think we just wanted to have a better chance at capturing more accurate upper bounds (or perhaps more accurately, non-infinite upper bounds!). Also good to keep in mind this applies to all of our apps, plus the apps of our users who choose to make use of the metrics.
88c1cff
to
b19c2e9
Compare
We're increasing the amount of buckets in `gcloud_aio_pubsub_subscriber_consume_latency_seconds_bucket` to enable better tracking of tail end latencies.
b19c2e9
to
52b397c
Compare
Summary
We're increasing the amount of buckets in
gcloud_aio_pubsub_subscriber_consume_latency_seconds_bucket
to enable better tracking of tail end latencies.