Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore prometheus metrics when their values are NaN or Inf #12084

Merged
merged 14 commits into from
May 24, 2019
Merged

Ignore prometheus metrics when their values are NaN or Inf #12084

merged 14 commits into from
May 24, 2019

Conversation

kaiyan-sheng
Copy link
Contributor

@kaiyan-sheng kaiyan-sheng commented May 6, 2019

When prometheus report metrics with value NaN or +Inf or -Inf, metricbeat will fail with error like "Failed to serialize the event: unsupported float value: NaN". This PR is to add the logic to ignore prometheus metrics with NaN/Inf value in collector metricset.

This PR also added a metrics-with-naninf.plain test data with NaN/Inf as metric value. From the output metrics-with-naninf.plain-expected.json you can see, the metrics with NaN/Inf value are not there.

closes #10849

@kaiyan-sheng kaiyan-sheng requested a review from a team as a code owner May 6, 2019 22:58
@kaiyan-sheng kaiyan-sheng self-assigned this May 6, 2019
@kaiyan-sheng kaiyan-sheng added the Team:Integrations Label for the Integrations team label May 6, 2019
@kaiyan-sheng
Copy link
Contributor Author

Changelog will be added later.

@kaiyan-sheng kaiyan-sheng added needs_backport PR is waiting to be backported to other branches. review labels May 7, 2019
Copy link
Contributor

@exekias exekias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kaiyan-sheng kaiyan-sheng requested a review from a team as a code owner May 7, 2019 20:56
@kaiyan-sheng
Copy link
Contributor Author

I'm guessing Prometheus helper is also affected: https://github.com/elastic/beats/blob/master/metricbeat/helper/prometheus/prometheus.go

@exekias Yes you are right, I just pushed a new commit with changes in the prometheus helper. Thanks!

Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kaiyan-sheng for investigating this issue.

If I understand well these changes, they are only handling NaN/Inf values in histograms and summaries, but there most of the values are integers, where these values are in principle not possible. The prometheus library seems to be parsing them as floats and then converting them to integers, this is why you see them as the value of uint64(math.NaN()). This value, even if incorrect, would be a number that can be handled by libbeat outputs and ES.

The real problem is with floats, they can be NaN/Inf, and these values cannot be handled as numbers by libbeat outputs or ES, so events containing them are dropped. Gauges, Counters and Sums in Histograms and Summaries are floats, we should handle these values on these types of metrics.

metricbeat/helper/prometheus/prometheus_test.go Outdated Show resolved Hide resolved
@kaiyan-sheng kaiyan-sheng added in progress Pull request is currently in progress. and removed review labels May 8, 2019
@kaiyan-sheng
Copy link
Contributor Author

@jsoriano I don't know how I got myself into this histogram hole and ignored all the other real problem 🤣 Thanks for the explanation (and digging me out of the histogram hole )! I added the part for Gauge/Counter/Summary and removed the histogram change. Please let me know if this PR is going the right direction now... 😃

@jsoriano jsoriano dismissed their stale review May 9, 2019 12:45

Reviewing again

Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is looking better now 🙂

We could still check that the values in the histograms are fine too, at least the sum. I see you do it in the module, could you do it too in the helper? And it'd be good to have test cases with all the types that can have this problem.

metricbeat/helper/prometheus/prometheus.go Outdated Show resolved Hide resolved
metricbeat/helper/prometheus/prometheus.go Show resolved Hide resolved
metricbeat/helper/prometheus/prometheus.go Outdated Show resolved Hide resolved
metricbeat/module/prometheus/collector/data.go Outdated Show resolved Hide resolved
@kaiyan-sheng kaiyan-sheng added [zube]: In Review and removed in progress Pull request is currently in progress. labels May 10, 2019
Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only wonder if we should also check for these values in the histogram sum just in case, for the rest it LGTM.

metricbeat/helper/prometheus/metric.go Show resolved Hide resolved
metricbeat/helper/prometheus/metric.go Show resolved Hide resolved
metricbeat/module/prometheus/collector/data.go Outdated Show resolved Hide resolved
@kaiyan-sheng kaiyan-sheng merged commit 9244477 into elastic:master May 24, 2019
@kaiyan-sheng kaiyan-sheng deleted the nan_prometheus branch May 24, 2019 19:54
kaiyan-sheng added a commit that referenced this pull request May 27, 2019
…aN or Inf (#12084) (#12295)

* Ignore prometheus metrics when their values are NaN or Inf (#12084)

* Ignore prometheus metrics when their values are NaN or Inf
* Avoid NaN/Inf in prometheus helper
* Add checks on Gauge, Summary and Counter
* Add NaN/Inf check on histogram values

(cherry picked from commit 9244477)

* Fix changelog
kaiyan-sheng added a commit that referenced this pull request May 27, 2019
…aN or Inf (#12084) (#12286)

* Ignore prometheus metrics when their values are NaN or Inf (#12084)

* Ignore prometheus metrics when their values are NaN or Inf
* Avoid NaN/Inf in prometheus helper
* Add checks on Gauge, Summary and Counter
* Add NaN/Inf check on histogram values

(cherry picked from commit 9244477)

* Fix changelog
kaiyan-sheng added a commit that referenced this pull request May 28, 2019
…12296)

* Ignore prometheus metrics when their values are NaN or Inf
* Avoid NaN/Inf in prometheus helper
* Add checks on Gauge, Summary and Counter
* Add NaN/Inf check on histogram values

(cherry picked from commit 9244477)
@kaiyan-sheng kaiyan-sheng removed the needs_backport PR is waiting to be backported to other branches. label May 28, 2019
@yxyx520
Copy link

yxyx520 commented Jul 17, 2019

with the version v7.2.0, I still get an error "2019-07-17T09:26:44.695Z ERROR elasticsearch/client.go:394 Failed to encode event: unsupported float value: NaN"

@yxyx520
Copy link

yxyx520 commented Jul 17, 2019

@kaiyan-sheng any suggestions ?

@charkadiusz
Copy link

@kaiyan-sheng, same here - v 7.3.0 and am getting

ERROR	elasticsearch/client.go:402	Failed to encode event: unsupported float value: NaN

in the prometheus collector metricset when scraping /federate endpoint on prometheus in a kubernetes cluster.
This is quite problematic as I'm getting flooded with those error logs. Any advice?

leweafan pushed a commit to leweafan/beats that referenced this pull request Apr 28, 2023
…aN or Inf (elastic#12084) (elastic#12295)

* Ignore prometheus metrics when their values are NaN or Inf (elastic#12084)

* Ignore prometheus metrics when their values are NaN or Inf
* Avoid NaN/Inf in prometheus helper
* Add checks on Gauge, Summary and Counter
* Add NaN/Inf check on histogram values

(cherry picked from commit 6810e31)

* Fix changelog
leweafan pushed a commit to leweafan/beats that referenced this pull request Apr 28, 2023
…2084) (elastic#12296)

* Ignore prometheus metrics when their values are NaN or Inf
* Avoid NaN/Inf in prometheus helper
* Add checks on Gauge, Summary and Counter
* Add NaN/Inf check on histogram values

(cherry picked from commit 6810e31)
leweafan pushed a commit to leweafan/beats that referenced this pull request Apr 28, 2023
…aN or Inf (elastic#12084) (elastic#12286)

* Ignore prometheus metrics when their values are NaN or Inf (elastic#12084)

* Ignore prometheus metrics when their values are NaN or Inf
* Avoid NaN/Inf in prometheus helper
* Add checks on Gauge, Summary and Counter
* Add NaN/Inf check on histogram values

(cherry picked from commit 6810e31)

* Fix changelog
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Integrations Label for the Integrations team v7.0.2 v7.1.2 v7.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Handle NaNs in prometheus metrics
6 participants