-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[connector/spanmetrics] Metrics keeps being produced for spans that are no longer being received #30559
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
@albertteoh seems like we need to implement an expiration mechanism. Even if it's last_touched < Time.now - 5m etc. |
This suggestion sounds quite reasonable to me. I agree, the spanmetrics connector should stop producing metrics after its source stops producing spans for some configurable period of time. |
Great, thanks for the feedback folks. If there's an agreement on the feature I'd be happy to give it a try. |
Hello everyone, just to let you know, after reading through the comments, I do believe this is the same behavior causing this issue. |
@matej-g are you able to have a go at the feature? |
@portertech Hey, sorry for the delay, it's on my table right now. I'd come back with a PR soon. |
I have the exact same issue with identical config. Would love to see the PR merged soon :) I have an application pushing spans and metrics via opentelemetry-java-agent to an otel collector. When I stop the application or when the application is running and not doing anything span metrics are still being pushed to prometheus. When I restart the otel collector the span metrics connector stops pushing metrics to prometheus. Using otel/opentelemetry-collector-contrib:0.95.0 |
**Description:** Adds a new feature to expire metrics that are considered stale. If no new spans are received within given time frame, on the next export cycle, the metrics are considered expired and will no longer be exported by the `spanmetricsconnector`. This intends to solve a situation where a service is no longer producing spans (e.g. because it was removed), but the metrics for such spans keep being produced indefinitely. See the linked issue for more details. Feature can be configured by setting `metrics_expiration` option. The current behavior (metrics never expire) is kept as the default. **Link to tracking Issue:** #30559 **Testing:** Added unit tests and tested manually as well. **Documentation:** Updated in-code documentation and README. --------- Signed-off-by: Matej Gera <matejgera@gmail.com>
…#31106) **Description:** Adds a new feature to expire metrics that are considered stale. If no new spans are received within given time frame, on the next export cycle, the metrics are considered expired and will no longer be exported by the `spanmetricsconnector`. This intends to solve a situation where a service is no longer producing spans (e.g. because it was removed), but the metrics for such spans keep being produced indefinitely. See the linked issue for more details. Feature can be configured by setting `metrics_expiration` option. The current behavior (metrics never expire) is kept as the default. **Link to tracking Issue:** open-telemetry#30559 **Testing:** Added unit tests and tested manually as well. **Documentation:** Updated in-code documentation and README. --------- Signed-off-by: Matej Gera <matejgera@gmail.com>
…#31106) **Description:** Adds a new feature to expire metrics that are considered stale. If no new spans are received within given time frame, on the next export cycle, the metrics are considered expired and will no longer be exported by the `spanmetricsconnector`. This intends to solve a situation where a service is no longer producing spans (e.g. because it was removed), but the metrics for such spans keep being produced indefinitely. See the linked issue for more details. Feature can be configured by setting `metrics_expiration` option. The current behavior (metrics never expire) is kept as the default. **Link to tracking Issue:** open-telemetry#30559 **Testing:** Added unit tests and tested manually as well. **Documentation:** Updated in-code documentation and README. --------- Signed-off-by: Matej Gera <matejgera@gmail.com>
@jaskarnshergillCE this change has been merged already, it's a matter of releasing |
Now that 0.97.0 is out, can we close this issue as complete? |
Yes, looks like this was left open accidentally, closing. Thanks everyone! |
i am using 0.103 version of the collector and still see this problem. I have set the following configuration metrics_expiration: 15m however even post the K8s pod emitting data has been removed, I continue to see the aggregated metrics series (_total, _sum, _count and _bucket) even post hours. Those get cleaned up only after collector restart. |
Hi everyone! Version: 0.110.0 I'm still experiencing a behavior like this, would like to know if it's intended or not. Steps to Reproduce
Issue This behavior creates issues for services running in ephemeral containers, where new host.name labels are frequently generated as old ones are no longer used. Consequently, the host.name label continues to grow in cardinality as long as the service remains active. |
Component(s)
connector/spanmetrics
What happened?
Description
I have an application that is sending spans to the collector, which are subsequently ran through the connector. However, once that application is shut down, I'm seeing metrics for the spans previously generated by the app being produce indefinitely. This is despite the fact that no new traces are being emitted by the application (since the application has been already shut down, as stated above). This is particularly problematic for applications with large number of operations (spans), since I keep receiving tons of data indefinitely (i.e. until I restart the collector).
Steps to Reproduce
Easiest is to reproduce with telemetrygen. For example:
spanmetrics
connector -> receives metrics -> exports metrics todebug
exportertelemetrygen
to collectortelemetrygen
telemetrygen
podExpected Result
The metrics should stop being produced eventually.
Actual Result
The metrics keep getting exported indefinitely (until I restart the collector).
Collector version
v0.91.0
Environment information
Environment
Local
kind
clusterOpenTelemetry Collector configuration
Log output
No response
Additional context
There have been couple of similar issues flying around (e.g. #29604, #17306), although it's not 100% clear the users are describing the same issue as here, since previously there were also related reports of memory leaks.
Some users have been adviced to adjust the config (e.g. this suggestion #17306 (comment)), but these unfortunately do not address the cause of the issue (and as a side note, even when trying to decrease the size of cache, this does not affect the number of metrics that keep being produced according to my tests. At least for the cumulative temporality, the cache eviction actually does not seem to be taking place, but this is only my deduction after glancing at the connector code).
I would imagine that ideally this could be solved if we could implement a logic where "if span X is not seen for Y amount of time, stop producing metrics for this span".
The text was updated successfully, but these errors were encountered: