Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential bug in text parser #253

Open
vikramraman opened this issue Aug 27, 2020 · 2 comments
Open

Potential bug in text parser #253

vikramraman opened this issue Aug 27, 2020 · 2 comments

Comments

@vikramraman
Copy link

We use the text parser within our application to scrape prometheus metrics and are facing an issue related to one described here: cadence-workflow/cadence#3120

cadence exposes two histogram metrics named signal_info and signal_info_count.

When parsing the HELP line for the signal_info_count the parser appears to strip the _count suffix and sets the p.currentMF property:

if p.currentMF = p.metricFamiliesByName[histogramName]; p.currentMF != nil {

This ends up setting the property to the earlier signal_info histogram leading to the parser error here:

p.parseError(fmt.Sprintf("second HELP line for metric name %q", p.currentMF.GetName()))

@brian-brazil
Copy link
Contributor

I'm not sure we ever formally defined this, however overlapping metric names like this would not be recommended. As this is the official parser, if it rejects it then this is a true positive.

In this case I'd suggest generally improving the metrics. Info as part of the name could be confused with the info metric pattern, and the count could be confused with a summary/histogram itself - which is why you're getting an error. There's also no indication in the name exactly what is being counted (entries in some struct?) nor what the unit of the size is (bytes?).

@beorn7
Copy link
Member

beorn7 commented Sep 7, 2020

This is indeed intended behavior, at least "kind of"…

The code normalizes names of Summaries and Histograms (and their components) by pruning any of the "magic" suffixes. Then it reports the collision. On the part of the parser, the error message is confusing because the normalization is not reported.

However, if you use prometheus/client_golang to expose the metrics, the exposition will already fail. I assume the code in uber/cadence is rendering the exposition by its own means.

I totally understand your confusion because your two histograms actually don't create any direct name collisions in the text exposition. The reason to ban them nevertheless is that it would create a collision if you switched the type from Histogram to Summary (because the pre-calculated quantiles in the signal_info_count Summary would be called signal_info_count, colliding with the count component of the signal_info Summary (or even Histogram, FWIW).

On a more general note, I'm not a fan of the "magic" suffixes because handling collisions is as confusing and hairy as we see it right here. At least, the exact collision handling behavior needed to be spelled out. I guess/hope OpenMetrics will do exactly that. I would prefer to solve the problem for good, but apparently, that's not going to happen.

alanprot pushed a commit to alanprot/common that referenced this issue Mar 15, 2023
…o-kit-kit-log-import

Fix incorrect import of obsolete github.com/go-kit/kit/log package
Ran 'go mod tidy' which changed whitespace.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants