-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ignore prometheus metrics when their values are NaN or Inf #12084
Conversation
metricbeat/module/prometheus/collector/_meta/testdata/histogram.plain-expected.json
Outdated
Show resolved
Hide resolved
Changelog will be added later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing Prometheus helper is also affected: https://github.com/elastic/beats/blob/master/metricbeat/helper/prometheus/prometheus.go
metricbeat/module/prometheus/collector/_meta/testdata/histogram.plain-expected.json
Outdated
Show resolved
Hide resolved
@exekias Yes you are right, I just pushed a new commit with changes in the prometheus helper. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kaiyan-sheng for investigating this issue.
If I understand well these changes, they are only handling NaN/Inf values in histograms and summaries, but there most of the values are integers, where these values are in principle not possible. The prometheus library seems to be parsing them as floats and then converting them to integers, this is why you see them as the value of uint64(math.NaN())
. This value, even if incorrect, would be a number that can be handled by libbeat outputs and ES.
The real problem is with floats, they can be NaN/Inf, and these values cannot be handled as numbers by libbeat outputs or ES, so events containing them are dropped. Gauges, Counters and Sums in Histograms and Summaries are floats, we should handle these values on these types of metrics.
@jsoriano I don't know how I got myself into this histogram hole and ignored all the other real problem 🤣 Thanks for the explanation (and digging me out of the histogram hole )! I added the part for Gauge/Counter/Summary and removed the histogram change. Please let me know if this PR is going the right direction now... 😃 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is looking better now 🙂
We could still check that the values in the histograms are fine too, at least the sum. I see you do it in the module, could you do it too in the helper? And it'd be good to have test cases with all the types that can have this problem.
metricbeat/module/prometheus/collector/_meta/testdata/gauge-with-naninf.plain
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only wonder if we should also check for these values in the histogram sum just in case, for the rest it LGTM.
…aN or Inf (#12084) (#12295) * Ignore prometheus metrics when their values are NaN or Inf (#12084) * Ignore prometheus metrics when their values are NaN or Inf * Avoid NaN/Inf in prometheus helper * Add checks on Gauge, Summary and Counter * Add NaN/Inf check on histogram values (cherry picked from commit 9244477) * Fix changelog
…aN or Inf (#12084) (#12286) * Ignore prometheus metrics when their values are NaN or Inf (#12084) * Ignore prometheus metrics when their values are NaN or Inf * Avoid NaN/Inf in prometheus helper * Add checks on Gauge, Summary and Counter * Add NaN/Inf check on histogram values (cherry picked from commit 9244477) * Fix changelog
with the version v7.2.0, I still get an error "2019-07-17T09:26:44.695Z ERROR elasticsearch/client.go:394 Failed to encode event: unsupported float value: NaN" |
@kaiyan-sheng any suggestions ? |
@kaiyan-sheng, same here - v 7.3.0 and am getting
in the prometheus collector metricset when scraping |
…aN or Inf (elastic#12084) (elastic#12295) * Ignore prometheus metrics when their values are NaN or Inf (elastic#12084) * Ignore prometheus metrics when their values are NaN or Inf * Avoid NaN/Inf in prometheus helper * Add checks on Gauge, Summary and Counter * Add NaN/Inf check on histogram values (cherry picked from commit 6810e31) * Fix changelog
…2084) (elastic#12296) * Ignore prometheus metrics when their values are NaN or Inf * Avoid NaN/Inf in prometheus helper * Add checks on Gauge, Summary and Counter * Add NaN/Inf check on histogram values (cherry picked from commit 6810e31)
…aN or Inf (elastic#12084) (elastic#12286) * Ignore prometheus metrics when their values are NaN or Inf (elastic#12084) * Ignore prometheus metrics when their values are NaN or Inf * Avoid NaN/Inf in prometheus helper * Add checks on Gauge, Summary and Counter * Add NaN/Inf check on histogram values (cherry picked from commit 6810e31) * Fix changelog
When prometheus report metrics with value NaN or +Inf or -Inf, metricbeat will fail with error like "Failed to serialize the event: unsupported float value: NaN". This PR is to add the logic to ignore prometheus metrics with NaN/Inf value in
collector
metricset.This PR also added a
metrics-with-naninf.plain
test data with NaN/Inf as metric value. From the outputmetrics-with-naninf.plain-expected.json
you can see, the metrics with NaN/Inf value are not there.closes #10849