Disable Cardinality limit / pre-aggregation? #5618

johanstenberg92 · 2024-05-14T20:44:21Z

What is the question?

Hello,

is there a way to disable the pre-aggregation and cardinality limit and let the system which receives the metrics handle the throttling problem?

An alternative would be to heavily over estimate the cardinality limit, but then there’s a concern with the initial memory allocation (which could also be better explained in the docs).

Our product has a backing system which can handle the cardinality we need, but we are concerned to put fix numbers in apps reporting to it and the memory consumption since we have some huge cardinalities.

The documentation doesn’t have a solution for this scenario, do you have any advice? Thanks

Additional context

No response

cijothomas · 2024-05-15T14:11:17Z

is there a way to disable the pre-aggregation and cardinality limit

No. MeasurementProcessor, as a concept would technically allow one to by pass all in-memory aggregations and export raw measurements directly, but such a thing do not exist in the spec.

Cardinality Limit docs are here, which talks about an experimental feature to reclaim unused points.
https://github.com/open-telemetry/opentelemetry-dotnet/tree/main/docs/metrics#cardinality-limits

Yes the part about the upfront memory allocation part is not very explicit in the doc, good callout. Feel free to send a PR if you are up for it, else we'll do it.

(Note: 1 metric point is less than 100 bytes, so even with 100,000 extra metric points, its ~10 MB extra memory. Do your own benchmarks and see if this is acceptable.)

We don't yet have a mechanism to monitor the utilization - once that lands, it'll be easy to monitor how much is actually utilized vs wasted..

johanstenberg92 · 2024-05-15T16:53:05Z

Thank you for your response. Just to expand:

I previously used Datadog’s “metrics without limits” where you essentially let the apps send whatever they can and the even configure what dimensions you care about in the central system, and don’t aggregate on those you don’t care about. I feel a bit constrained with this solution, and I’m concerned with the burden of maintaining max cardinality stats in the app and the potential risk for the memory.

that being said we’ll start experimenting, thanks again.

hugo-brito · 2024-05-15T18:49:59Z

From what I read in https://github.com/open-telemetry/opentelemetry-dotnet/tree/main/docs/metrics#cardinality-limits it seems like the cardinality limit is tweakable but uniformly enforced across all metrics.

How would the SDK know about and pre-allocate all the needed objects for all the metrics, if these are unknown at the beginning of the program?

If we are to estimate a worst-case for the most complex metric, which will then dictate the memory allocation for all the other metrics, wouldn't it be more prudent to consider metric-specific cardinality? The "one size fits all" approach feels a bit lacking...

Furthermore, with the current approach, we now must maintain this cardinality limit... Code changes will be needed if suddenly your cluster can fit double or triple the users.

So in summary, it would be great to either set the cardinality per metric and/or, emit the metrics raw.

cijothomas · 2024-05-15T18:57:07Z

How would the SDK know about and pre-allocate all the needed objects for all the metrics, if these are unknown at the beginning of the program?

Not at the beginning of the program, but whenever an instrument is created, SDK pre-allocates 2000 MetricPoints by default.

cijothomas · 2024-05-15T19:00:26Z

dictate the memory allocation for all the other metrics, wouldn't it be more prudent to consider metric-specific cardinality? The "one size fits all" approach feels a bit lacking

So in summary, it would be great to either set the cardinality per metric and/or, emit the metrics raw.

You are right! Ability to set the cardinality per metric is already supported as experimental feature, available in pre-release builds.

or, emit the metrics raw.

This is not something we plan to offer, until spec allows it!

hugo-brito · 2024-05-16T08:02:23Z

With the current implementation, shouldn't at least exist a mechanism for us to know if metrics are being dropped silently (due to low cardinality)?

cijothomas · 2024-05-16T16:22:35Z

With the current implementation, shouldn't at least exist a mechanism for us to know if metrics are being dropped silently (due to low cardinality)?

There is an internal log emitted when the limit is hit for the first time. This is the current state. (It is not ideal, and overflow attribute will go a long way into making this experience smoother. And once we expose utilization metric, that'd make things much better than today)

hugo-brito · 2024-05-16T16:40:36Z

Is there any guidance on how to expose such metric? That way we could at least know if we're lowballing the max cardinality.

cijothomas · 2024-05-16T16:47:59Z

Is there any guidance on how to expose such metric? That way we could at least know if we're lowballing the max cardinality.

#3880 This is the tracking issue! There were few attempts in the past, but nothing got shipped. If you are passionate about this space, consider contributing and we can guide you through the process!
The linked issue can point you to the previous PRs attempting this, to see if you can pick it up.

reyang · 2024-05-21T21:38:56Z

Is there any guidance on how to expose such metric? That way we could at least know if we're lowballing the max cardinality.

@hugo-brito https://github.com/open-telemetry/opentelemetry-dotnet/tree/main/docs/metrics#cardinality-limits has captured some useful links, note that there are lots of moving pieces, and the specification is still Experimental https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/sdk.md#cardinality-limits.

okonaraddi-msft · 2024-06-06T21:53:49Z

(Note: 1 metric point is less than 100 bytes, so even with 100,000 extra metric points, its ~10 MB extra memory. Do your own benchmarks and see if this is acceptable.)

Is there more info on where the 100 bytes comes from?

I'm wondering if a metric point could be >100 bytes. Like what if there were many, large key-value pairs (like 50 keys each with a 50-character name and a 50-character string value) stored in the MetricPoint's Tags?

cijothomas · 2024-06-06T22:00:19Z

Is there more info on where the 100 bytes comes from?

Size of MetricPoint struct. (Of course the thing it points to could be very large as that depends on the size of keys/values etc, but MetricPoint itself is fixed size)

clupo · 2024-09-17T15:04:33Z

@hugo-brito

Is there any guidance on how to expose such metric? That way we could at least know if we're lowballing the max cardinality.

Appears there's an environment variable OTEL_DOTNET_EXPERIMENTAL_METRICS_EMIT_OVERFLOW_ATTRIBUTE to flip on to help with that

https://github.com/open-telemetry/opentelemetry-dotnet/tree/main/docs/metrics#cardinality-limits

In my testing the tag that shows up on the offending metrics is otel.metric.overflow:true

johanstenberg92 added the question Further information is requested label May 14, 2024

cijothomas added documentation Documentation related metrics Metrics signal related labels May 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disable Cardinality limit / pre-aggregation? #5618

Disable Cardinality limit / pre-aggregation? #5618

johanstenberg92 commented May 14, 2024

cijothomas commented May 15, 2024

johanstenberg92 commented May 15, 2024

hugo-brito commented May 15, 2024

cijothomas commented May 15, 2024

cijothomas commented May 15, 2024

hugo-brito commented May 16, 2024

cijothomas commented May 16, 2024

hugo-brito commented May 16, 2024

cijothomas commented May 16, 2024

reyang commented May 21, 2024

okonaraddi-msft commented Jun 6, 2024

cijothomas commented Jun 6, 2024

clupo commented Sep 17, 2024 •

edited

Loading

Disable Cardinality limit / pre-aggregation? #5618

Disable Cardinality limit / pre-aggregation? #5618

Comments

johanstenberg92 commented May 14, 2024

What is the question?

Additional context

cijothomas commented May 15, 2024

johanstenberg92 commented May 15, 2024

hugo-brito commented May 15, 2024

cijothomas commented May 15, 2024

cijothomas commented May 15, 2024

hugo-brito commented May 16, 2024

cijothomas commented May 16, 2024

hugo-brito commented May 16, 2024

cijothomas commented May 16, 2024

reyang commented May 21, 2024

okonaraddi-msft commented Jun 6, 2024

cijothomas commented Jun 6, 2024

clupo commented Sep 17, 2024 • edited Loading

clupo commented Sep 17, 2024 •

edited

Loading