Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Prometheus metrics directly #29

Merged
merged 7 commits into from
Nov 12, 2017
Merged

Support Prometheus metrics directly #29

merged 7 commits into from
Nov 12, 2017

Conversation

yurishkuro
Copy link
Member

@yurishkuro yurishkuro commented Nov 6, 2017

Required by jaegertracing/jaeger#360 and jaegertracing/jaeger#273

Prometheus requires pre-declaring all metrics and their labels, and for a given metric name only one set of labels is allowed, otherwise panic occurs.

The approach here is to cache metric vectors registered with Prometheus using metric's name and a sorted list of labels as the cache key. This allows creating Jaeger metrics objects multiple times even with the same name and a set of tags.

Signed-off-by: Yuri Shkuro <ys@uber.com>
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.9%) to 99.065% when pulling bd425f1 on prometheus-direct into bc381f8 on master.

Signed-off-by: Yuri Shkuro <ys@uber.com>
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.6%) to 99.402% when pulling c7a5c6f on prometheus-direct into bc381f8 on master.

Yuri Shkuro added 2 commits November 6, 2017 23:43
Signed-off-by: Yuri Shkuro <ys@uber.com>
Signed-off-by: Yuri Shkuro <ys@uber.com>
@yurishkuro yurishkuro changed the title [WIP] Support Prometheus metrics directly Support Prometheus metrics directly Nov 7, 2017
@coveralls
Copy link

Coverage Status

Coverage remained the same at 100.0% when pulling 7e61af6 on prometheus-direct into bc381f8 on master.

Signed-off-by: Yuri Shkuro <ys@uber.com>
@coveralls
Copy link

Coverage Status

Coverage remained the same at 100.0% when pulling 7c1739c on prometheus-direct into bc381f8 on master.

@yurishkuro
Copy link
Member Author

TODO - sanitize metric names, e.g. agent is failing with:

panic: descriptor Desc{fqName: "tc-reporter:zipkin:batches.submitted", help: "tc-reporter:zipkin:batches.submitted", constLabels: {}, variableLabels: []} is invalid: "tc-reporter:zipkin:batches.submitted" is not a valid metric name

@yurishkuro yurishkuro removed the request for review from black-adder November 7, 2017 05:05
@yurishkuro yurishkuro changed the title Support Prometheus metrics directly [WIP] Support Prometheus metrics directly Nov 7, 2017
Signed-off-by: Yuri Shkuro <ys@uber.com>
@coveralls
Copy link

Coverage Status

Coverage remained the same at 100.0% when pulling 7ccca79 on prometheus-direct into bc381f8 on master.

@yurishkuro yurishkuro changed the title [WIP] Support Prometheus metrics directly Support Prometheus metrics directly Nov 7, 2017
@yurishkuro
Copy link
Member Author

I was able to make this work in the agent, but it still required changes to the metric naming (which are actually reasonable - jaegertracing/jaeger#516)

c.lock.Lock()
defer c.lock.Unlock()

cacheKey := strings.Join(append([]string{opts.Name}, labelNames...), "||")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any value extracting this out into its own function?

"github.com/uber/jaeger-lib/metrics"
)

// Factory implements metrics.Factory backed my Prometheus registry.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/my/by?

f3 := f2.Namespace("", map[string]string{"a": "b"}) // essentially same as f2
t1 := f2.Timer("rodriguez", map[string]string{"x": "y"})
t2 := f2.Timer("rodriguez", map[string]string{"x": "z"})
t3 := f3.Timer("rodriguez", map[string]string{"x": "z"}) // same as g2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as t2

for _, mf := range snapshot {
if mf.GetName() == name {
for _, m := range mf.GetMetric() {
if len(m.GetLabel()) != len(tags) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cant a prometheus metric with the same name have different labels? I guess this is a testing function and I shouldn't read too much into it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the function findMetric only returns exact match. Also, as a general point, the whole reason for this PR is that once you define a metric with a certain set of labels (tags), Prometheus will not allow another one with the same name but different set of labels.


func findMetric(t *testing.T, snapshot []*promModel.MetricFamily, name string, tags map[string]string) *promModel.Metric {
for _, mf := range snapshot {
if mf.GetName() == name {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you reduce the level of nesting for this function? kinda hard to read

Signed-off-by: Yuri Shkuro <ys@uber.com>
@coveralls
Copy link

Coverage Status

Coverage remained the same at 100.0% when pulling be91209 on prometheus-direct into bc381f8 on master.

@yurishkuro yurishkuro merged commit 73e7680 into master Nov 12, 2017
@yurishkuro yurishkuro deleted the prometheus-direct branch November 12, 2017 22:37
opts := prometheus.CounterOpts{
Name: name,
Help: name,
}
Copy link

@mumoshu mumoshu Nov 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yurishkuro It seems like you have to (1) include equivalents of sorted tag names to Name or Namespace or Subsystem(although it won't be semantically correct) or (2) include tag names as ConstLabels here to differentiate the equally named metrics(e.g. spans w/ group=lifecycle,state=started vs group=lifecycle,state=finished)).
Otherwise you still end up with panics.
I've reproduced panics at least in combination with jaeger-client-go v2.7.0 + jaeger-lib v1.2.1 + prometheus-client_globan v0.8.0.

Update: I've checked up to jaeger-client-go v2.10.0 and the metric names and tags are still there unchanged. So probably this issue seems not yet "fixed" on the jaeger-client-go side anyway.

Here's how it seemed to happen:

Ref: ConstLabels

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI we've used this code snippet for reproduction.

cc @anzaitetsu

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not unsure exact intentions behind all the metrics defined as of today.
Perhaps we won't want to change Subsystem and Namespace but rather change Names of metrics?

  • SpansStarted could be metric:"spans-started" or "started-spans", SpansFinished could be metric:"spans-finished" or "finished-spans" with the group=lifecycle tag but without the state tag. This way, we can expose both metrics directly without panics while allowing you to differentiate both from other spans metrics not relate to lifecycle.
  • Similarly, SpansSampled could be sampled-spans or spans-sampled, SpansNotSampled could be not-sampled-spans or spans-not-sampled respectively

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yurishkuro Sorry, it seems like I was late to the party!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants