-
Notifications
You must be signed in to change notification settings - Fork 896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attributes on SpanData: should they be a dictionary or enumerable. #92
Comments
Same for labels on Resource |
I do not agree that Span attributes should be readable by the process. A streaming implementation will have sent them downstream immediately and have no state to read back. |
While I understand that streaming implementations may be desirable in some cases, I'd like to point out that they have their own drawbacks, particularly with what they impose on Agents/Collectors that may be present in the delivery chain from the application to the backend. Collectors usually provide capabilities to filter or modify spans based on their attribute values. If streaming implementation ships individual attributes values immediately and Collector receives each attribute value as an independent event the Collector will be forced to reconstruct the full state of the Span to be able to evaluate filtering condition. This can have dramatic performance effects on the Collector. The Collector has to keep all open Spans in memory, and unlike individual application Collectors typically deal with orders of magnitude larger number of spans. What may be a minor amount of memory necessary on application side to keep the Span state can become gigabytes of data in the Collector. |
@tigrannajaryan Agreed. An intermediate process is needed to reconstruct the SpanData from the event stream. The nice thing about this arrangement is that it can be done in a more efficient language, it avoids deadlock and self-diagnostic interactions, it can use riskier code, and so on. The Agent/Collector infrastructure is designed for the current SpanData exporters. A streaming tracer would depend on a lower-level exporter, and the external process would reconstruct SpanData and export them through your infrastructure. It should be possible for the SDK to manage state however they want. If this is only an SDK issue, I'm not worried, but if it's possible for a user to access span attributes for a live span, we've done something which makes it impossible to have a compatible streaming implementation. |
SpanData is only used in the API to report out-of-band spans, so we do not expose this for in-flight spans or "active" spans. Independent of keeping or not the SpanData in the API to report out-of-band spans we have no intention to expose a get API (except getSpanContext) on active in-progress spans (I think we said that from the beginning, so please don't be worried about that). Coming back to the issue, The main question for me is "are attributes a map or a multimap?" |
@jmacd yes, makes sense. |
@bogdandrutu I think "enumerable" can be understood as a multimap or simply as a list of key/value pairs, so that there is no way to query efficiently by given key, all you can do is enumerate all attributes in a SpanData sequentially. I think the decision how exactly the attributes should be exposed should be dictated by use-cases which require reading them from SpanData. I don't know what are such use-cases that are typical for applications, so I am not of much help here. |
I wonder why we're not sticking with the OpenCensus model for span attributes. In that model, there were Insert and Delete operations, which definitely imply that span attributes are intended to be thought of as dictionaries. I support the dictionary interpretation mainly because when these spans are presented as structured data, I'd like an efficient query unique path for filtering. Note that OpenTracing didn't actually specify an interpretation for SetAttribute. LightStep's collector applies last-value-wins to spans when there are multiple values set for repeated keys. (@yurishkuro how does Jaeger handle this?) For example, if we see SetAttribute("error", true) followed by SetAttribute("error", false), we interpret this as "error not set". |
In Jaeger it varies by language, in some cases both tags would be kept. The reason we didn't want the API to guarantee map semantics is because doing so precludes streaming implementation, since you need to keep the whole map to guarantee last-wins. Instead, that could be done in the tracing backend. |
SpanData has been removed, and attribute structure is being discussed at the protocol level, so I am closing this for now. |
* Propose OpenTelemetry Logs Vision This change proposes high-level items that define our long-term vision for Logs support in OpenTelemetry project, what we aspire to achieve. This is a starting point for discussion. I believe it is important for us to collectively come up with such vision. Comments, thoughts and opinions are highly welcome! Propose OpenTelemetry Logs Vision This change proposes igh-level items that define our long-term vision for Logs support in OpenTelemetry project, what we aspire to achieve. This a vision document that reflects our current desires. It is not a commitment to implement everything precisely as listed. The primary purpose of this document is to ensure all contributors work in alignment. As our vision changes over time maintainers reserve the right to add, modify, and remove items from this document. * Address PR comments
* Propose OpenTelemetry Logs Vision This change proposes high-level items that define our long-term vision for Logs support in OpenTelemetry project, what we aspire to achieve. This is a starting point for discussion. I believe it is important for us to collectively come up with such vision. Comments, thoughts and opinions are highly welcome! Propose OpenTelemetry Logs Vision This change proposes igh-level items that define our long-term vision for Logs support in OpenTelemetry project, what we aspire to achieve. This a vision document that reflects our current desires. It is not a commitment to implement everything precisely as listed. The primary purpose of this document is to ensure all contributors work in alignment. As our vision changes over time maintainers reserve the right to add, modify, and remove items from this document. * Address PR comments
See discussion: #78 (comment)
@tigrannajaryan:
Is it true that typical use is enumeration? This rules out use cases which need to augment / modify specific attributes efficiently.
@SergeyKanzhelev
augmentation is not supported as returned collection is immutable. What scenario do you see when you need an attribute value with the specific name from the SpanData?
@tigrannajaryan
Right. I was thinking about a scenario where you may want to implement telemetry data forwarder, which may also support augmenting the data, but you are right. If we require SpanData to be immutable then this use case is simply not possible. And I agree that we should require SpanData to be immutable since it likely will allow implementations to be more efficient.
@tigrannajaryan
Actually, there is a use case that does not require SpanData modification but requires fast inspection of data. For example a telemetry data forwarder that can also sample the data based on some attribute value.
However, I understand that this may not be the typical use case and we may not want to support it in the API that is aimed at instrumentation. Forwarders can have specialized implementations and do not need to use this API at all.
@SergeyKanzhelev
@tigrannajaryan good point. I was also thinking something like metric extraction from the SpanData may require dictionary.
Let me file an issue. It is worth to be discussed and documented explicitly to address future questions.
The text was updated successfully, but these errors were encountered: