You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I’m running OpenTelemetry demo application with the latest v1.30.0 java agent. My metrics backend complains about receiving a lot of NaN value metrics.
It looks like the Fraud Detection service (Kafka consumer in the demo app) creates these NaN value metrics. Also, these metrics are missing the unit.
Steps to reproduce
Download OpenTelemetry demo application (use main/latest, not the latest released version, to have the latest java agent)
Configure detailed logging exporter in the Collector to access the detailed metric printouts
exporters:
logging/detailed:
verbosity: detailed
Start the demo app
After around 2 min, read the Collector logs with "NaN" values $ docker compose logs otelcol | grep -B 11 NaN
Expected behavior
I'd expect all the exported metrics to have a value and a unit. NaN value metrics shouldn't be propagated.
Actual behavior
Lots of metrics with NaN values and no unit:
otel-col | Metric #72
otel-col | Descriptor:
otel-col | -> Name: kafka.consumer.partition_assigned_latency_max
otel-col | -> Description: The max time taken for a partition-assigned rebalance listener callback
otel-col | -> Unit:
otel-col | -> DataType: Gauge
otel-col | NumberDataPoints #0
otel-col | Data point attributes:
otel-col | -> client-id: Str(consumer-frauddetectionservice-1)
otel-col | StartTimestamp: 2023-09-19 06:14:58.534111254 +0000 UTC
otel-col | Timestamp: 2023-09-19 06:15:58.53179417 +0000 UTC
otel-col | Value: NaN
--
otel-col | Metric #77
otel-col | Descriptor:
otel-col | -> Name: kafka.consumer.join_time_avg
otel-col | -> Description: The average time taken for a group rejoin
otel-col | -> Unit:
otel-col | -> DataType: Gauge
otel-col | NumberDataPoints #0
otel-col | Data point attributes:
otel-col | -> client-id: Str(consumer-frauddetectionservice-1)
otel-col | StartTimestamp: 2023-09-19 06:14:58.534111254 +0000 UTC
otel-col | Timestamp: 2023-09-19 06:15:58.53179417 +0000 UTC
otel-col | Value: NaN
--
otel-col | Metric #84
otel-col | Descriptor:
otel-col | -> Name: kafka.consumer.rebalance_latency_max
otel-col | -> Description: The max time taken for a group to complete a successful rebalance, which may be composed of several failed re-trials until it succeeded
otel-col | -> Unit:
otel-col | -> DataType: Gauge
otel-col | NumberDataPoints #0
otel-col | Data point attributes:
otel-col | -> client-id: Str(consumer-frauddetectionservice-1)
otel-col | StartTimestamp: 2023-09-19 06:14:58.534111254 +0000 UTC
otel-col | Timestamp: 2023-09-19 06:15:58.53179417 +0000 UTC
otel-col | Value: NaN
--
otel-col | Metric #85
otel-col | Descriptor:
otel-col | -> Name: kafka.consumer.reauthentication_latency_avg
otel-col | -> Description: The average latency observed due to re-authentication
otel-col | -> Unit:
otel-col | -> DataType: Gauge
otel-col | NumberDataPoints #0
otel-col | Data point attributes:
otel-col | -> client-id: Str(consumer-frauddetectionservice-1)
otel-col | StartTimestamp: 2023-09-19 06:14:58.534111254 +0000 UTC
otel-col | Timestamp: 2023-09-19 06:15:58.53179417 +0000 UTC
otel-col | Value: NaN
The Kafka metrics that you see here are just a simple bridge to the Kafka's own metrics system; see the OpenTelemetryMetricsReporter for more info. Because Kafka does not report units along with its metrics, neither do we.
NaN value metrics shouldn't be propagated.
That's a fair point. @jack-berg do you think we should filter out NaN values at some level?
The metrics API operations are expected to record "numeric values". NaN is not numeric and cannot be aggregated so it seems correct for the SDK to ignore them.
Describe the bug
I’m running OpenTelemetry demo application with the latest v1.30.0 java agent. My metrics backend complains about receiving a lot of NaN value metrics.
It looks like the Fraud Detection service (Kafka consumer in the demo app) creates these NaN value metrics. Also, these metrics are missing the unit.
Steps to reproduce
$ docker compose logs otelcol | grep -B 11 NaN
Expected behavior
I'd expect all the exported metrics to have a value and a unit. NaN value metrics shouldn't be propagated.
Actual behavior
Lots of metrics with NaN values and no unit:
Javaagent or library instrumentation version
v1.30.0
Environment
Docker desktop on macOS
Additional context
https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/sdk.md#numerical-limits-handling
The text was updated successfully, but these errors were encountered: