-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Prometheus metrics directly #29
Conversation
Signed-off-by: Yuri Shkuro <ys@uber.com>
Signed-off-by: Yuri Shkuro <ys@uber.com>
Signed-off-by: Yuri Shkuro <ys@uber.com>
Signed-off-by: Yuri Shkuro <ys@uber.com>
Signed-off-by: Yuri Shkuro <ys@uber.com>
TODO - sanitize metric names, e.g. agent is failing with:
|
Signed-off-by: Yuri Shkuro <ys@uber.com>
I was able to make this work in the agent, but it still required changes to the metric naming (which are actually reasonable - jaegertracing/jaeger#516) |
metrics/prometheus/cache.go
Outdated
c.lock.Lock() | ||
defer c.lock.Unlock() | ||
|
||
cacheKey := strings.Join(append([]string{opts.Name}, labelNames...), "||") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any value extracting this out into its own function?
metrics/prometheus/factory.go
Outdated
"github.com/uber/jaeger-lib/metrics" | ||
) | ||
|
||
// Factory implements metrics.Factory backed my Prometheus registry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/my/by?
metrics/prometheus/factory_test.go
Outdated
f3 := f2.Namespace("", map[string]string{"a": "b"}) // essentially same as f2 | ||
t1 := f2.Timer("rodriguez", map[string]string{"x": "y"}) | ||
t2 := f2.Timer("rodriguez", map[string]string{"x": "z"}) | ||
t3 := f3.Timer("rodriguez", map[string]string{"x": "z"}) // same as g2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as t2
metrics/prometheus/factory_test.go
Outdated
for _, mf := range snapshot { | ||
if mf.GetName() == name { | ||
for _, m := range mf.GetMetric() { | ||
if len(m.GetLabel()) != len(tags) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cant a prometheus metric with the same name have different labels? I guess this is a testing function and I shouldn't read too much into it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the function findMetric only returns exact match. Also, as a general point, the whole reason for this PR is that once you define a metric with a certain set of labels (tags), Prometheus will not allow another one with the same name but different set of labels.
metrics/prometheus/factory_test.go
Outdated
|
||
func findMetric(t *testing.T, snapshot []*promModel.MetricFamily, name string, tags map[string]string) *promModel.Metric { | ||
for _, mf := range snapshot { | ||
if mf.GetName() == name { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you reduce the level of nesting for this function? kinda hard to read
Signed-off-by: Yuri Shkuro <ys@uber.com>
opts := prometheus.CounterOpts{ | ||
Name: name, | ||
Help: name, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yurishkuro It seems like you have to (1) include equivalents of sorted tag names to Name
or Namespace
or Subsystem
(although it won't be semantically correct) or (2) include tag names as ConstLabels
here to differentiate the equally named metrics(e.g. spans
w/ group=lifecycle,state=started
vs group=lifecycle,state=finished
)).
Otherwise you still end up with panics.
I've reproduced panics at least in combination with jaeger-client-go v2.7.0 + jaeger-lib v1.2.1 + prometheus-client_globan v0.8.0.
Update: I've checked up to jaeger-client-go v2.10.0 and the metric names and tags are still there unchanged. So probably this issue seems not yet "fixed" on the jaeger-client-go side anyway.
Here's how it seemed to happen:
- InitGlobalTracer
- config.New
- jaeger-lib/metrics.Init
- jaeger-lib/metrics.initMetrics
- jaeger-lib/metrics/prometheus.Counter
- prometheus/client_golang/prometheus.NewCounterVec gets no ConstLabels passed
- NewCounterVec creates a Desc with empty Namespace, Subsystem and ConstLabels
Register
called fromMustRegister
results in an error due to duplicate hash due to a duplicate Name+ConstLabels-values(=empty) set- Hashes are computed here. In nutshell, only the metric name and the values for const labels(but not variable labels) are taken into account.
Ref: ConstLabels
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI we've used this code snippet for reproduction.
cc @anzaitetsu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not unsure exact intentions behind all the metrics defined as of today.
Perhaps we won't want to change Subsystem
and Namespace
but rather change Name
s of metrics?
SpansStarted
could bemetric:"spans-started"
or"started-spans"
,SpansFinished
could bemetric:"spans-finished"
or"finished-spans"
with thegroup=lifecycle
tag but without thestate
tag. This way, we can expose both metrics directly without panics while allowing you to differentiate both from other spans metrics not relate to lifecycle.- Similarly,
SpansSampled
could besampled-spans
orspans-sampled
,SpansNotSampled
could benot-sampled-spans
orspans-not-sampled
respectively
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yurishkuro Sorry, it seems like I was late to the party!
Required by jaegertracing/jaeger#360 and jaegertracing/jaeger#273
Prometheus requires pre-declaring all metrics and their labels, and for a given metric name only one set of labels is allowed, otherwise panic occurs.
The approach here is to cache metric vectors registered with Prometheus using metric's name and a sorted list of labels as the cache key. This allows creating Jaeger metrics objects multiple times even with the same name and a set of tags.