-
Notifications
You must be signed in to change notification settings - Fork 774
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable Cardinality limit / pre-aggregation? #5618
Comments
No. Cardinality Limit docs are here, which talks about an experimental feature to reclaim unused points. Yes the part about the upfront memory allocation part is not very explicit in the doc, good callout. Feel free to send a PR if you are up for it, else we'll do it. (Note: 1 metric point is less than 100 bytes, so even with 100,000 extra metric points, its ~10 MB extra memory. Do your own benchmarks and see if this is acceptable.) We don't yet have a mechanism to monitor the utilization - once that lands, it'll be easy to monitor how much is actually utilized vs wasted.. |
Thank you for your response. Just to expand: I previously used Datadog’s “metrics without limits” where you essentially let the apps send whatever they can and the even configure what dimensions you care about in the central system, and don’t aggregate on those you don’t care about. I feel a bit constrained with this solution, and I’m concerned with the burden of maintaining max cardinality stats in the app and the potential risk for the memory. that being said we’ll start experimenting, thanks again. |
From what I read in https://github.com/open-telemetry/opentelemetry-dotnet/tree/main/docs/metrics#cardinality-limits it seems like the cardinality limit is tweakable but uniformly enforced across all metrics. How would the SDK know about and pre-allocate all the needed objects for all the metrics, if these are unknown at the beginning of the program? If we are to estimate a worst-case for the most complex metric, which will then dictate the memory allocation for all the other metrics, wouldn't it be more prudent to consider metric-specific cardinality? The "one size fits all" approach feels a bit lacking... Furthermore, with the current approach, we now must maintain this cardinality limit... Code changes will be needed if suddenly your cluster can fit double or triple the users. So in summary, it would be great to either set the cardinality per metric and/or, emit the metrics raw. |
Not at the beginning of the program, but whenever an instrument is created, SDK pre-allocates 2000 MetricPoints by default. |
You are right! Ability to set the cardinality per metric is already supported as experimental feature, available in pre-release builds.
This is not something we plan to offer, until spec allows it! |
With the current implementation, shouldn't at least exist a mechanism for us to know if metrics are being dropped silently (due to low cardinality)? |
There is an internal log emitted when the limit is hit for the first time. This is the current state. (It is not ideal, and overflow attribute will go a long way into making this experience smoother. And once we expose utilization metric, that'd make things much better than today) |
Is there any guidance on how to expose such metric? That way we could at least know if we're lowballing the max cardinality. |
#3880 This is the tracking issue! There were few attempts in the past, but nothing got shipped. If you are passionate about this space, consider contributing and we can guide you through the process! |
@hugo-brito https://github.com/open-telemetry/opentelemetry-dotnet/tree/main/docs/metrics#cardinality-limits has captured some useful links, note that there are lots of moving pieces, and the specification is still Experimental https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/sdk.md#cardinality-limits. |
Is there more info on where the 100 bytes comes from? I'm wondering if a metric point could be >100 bytes. Like what if there were many, large key-value pairs (like 50 keys each with a 50-character name and a 50-character string value) stored in the MetricPoint's Tags? |
Size of MetricPoint struct. (Of course the thing it points to could be very large as that depends on the size of keys/values etc, but MetricPoint itself is fixed size) |
Appears there's an environment variable OTEL_DOTNET_EXPERIMENTAL_METRICS_EMIT_OVERFLOW_ATTRIBUTE to flip on to help with that https://github.com/open-telemetry/opentelemetry-dotnet/tree/main/docs/metrics#cardinality-limits In my testing the tag that shows up on the offending metrics is |
What is the question?
Hello,
is there a way to disable the pre-aggregation and cardinality limit and let the system which receives the metrics handle the throttling problem?
An alternative would be to heavily over estimate the cardinality limit, but then there’s a concern with the initial memory allocation (which could also be better explained in the docs).
Our product has a backing system which can handle the cardinality we need, but we are concerned to put fix numbers in apps reporting to it and the memory consumption since we have some huge cardinalities.
The documentation doesn’t have a solution for this scenario, do you have any advice? Thanks
Additional context
No response
The text was updated successfully, but these errors were encountered: