Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attributes on SpanData: should they be a dictionary or enumerable. #92

Closed
SergeyKanzhelev opened this issue Jun 10, 2019 · 10 comments
Closed
Labels
area:api Cross language API specification issue
Milestone

Comments

@SergeyKanzhelev
Copy link
Member

See discussion: #78 (comment)

@tigrannajaryan:

Is it true that typical use is enumeration? This rules out use cases which need to augment / modify specific attributes efficiently.

@SergeyKanzhelev

augmentation is not supported as returned collection is immutable. What scenario do you see when you need an attribute value with the specific name from the SpanData?

@tigrannajaryan

Right. I was thinking about a scenario where you may want to implement telemetry data forwarder, which may also support augmenting the data, but you are right. If we require SpanData to be immutable then this use case is simply not possible. And I agree that we should require SpanData to be immutable since it likely will allow implementations to be more efficient.

@tigrannajaryan

Actually, there is a use case that does not require SpanData modification but requires fast inspection of data. For example a telemetry data forwarder that can also sample the data based on some attribute value.
However, I understand that this may not be the typical use case and we may not want to support it in the API that is aimed at instrumentation. Forwarders can have specialized implementations and do not need to use this API at all.

@SergeyKanzhelev

@tigrannajaryan good point. I was also thinking something like metric extraction from the SpanData may require dictionary.

Let me file an issue. It is worth to be discussed and documented explicitly to address future questions.

@SergeyKanzhelev SergeyKanzhelev added this to the API revision: 07-2019 milestone Jun 10, 2019
@SergeyKanzhelev SergeyKanzhelev added the area:api Cross language API specification issue label Jun 10, 2019
@SergeyKanzhelev
Copy link
Member Author

Same for labels on Resource

@jmacd
Copy link
Contributor

jmacd commented Jun 26, 2019

I do not agree that Span attributes should be readable by the process. A streaming implementation will have sent them downstream immediately and have no state to read back.

@tigrannajaryan
Copy link
Member

While I understand that streaming implementations may be desirable in some cases, I'd like to point out that they have their own drawbacks, particularly with what they impose on Agents/Collectors that may be present in the delivery chain from the application to the backend.

Collectors usually provide capabilities to filter or modify spans based on their attribute values. If streaming implementation ships individual attributes values immediately and Collector receives each attribute value as an independent event the Collector will be forced to reconstruct the full state of the Span to be able to evaluate filtering condition. This can have dramatic performance effects on the Collector. The Collector has to keep all open Spans in memory, and unlike individual application Collectors typically deal with orders of magnitude larger number of spans. What may be a minor amount of memory necessary on application side to keep the Span state can become gigabytes of data in the Collector.
Furthermore the need to reconstruct the state will result in Collector being unable to stream out the data although the input is a stream because the Collector has to wait for complete Span state before making a decision. This negates some of the original benefits of streaming implementation (low latency until events reach the backend).

@jmacd
Copy link
Contributor

jmacd commented Jun 26, 2019

@tigrannajaryan Agreed. An intermediate process is needed to reconstruct the SpanData from the event stream. The nice thing about this arrangement is that it can be done in a more efficient language, it avoids deadlock and self-diagnostic interactions, it can use riskier code, and so on.

The Agent/Collector infrastructure is designed for the current SpanData exporters. A streaming tracer would depend on a lower-level exporter, and the external process would reconstruct SpanData and export them through your infrastructure.

It should be possible for the SDK to manage state however they want. If this is only an SDK issue, I'm not worried, but if it's possible for a user to access span attributes for a live span, we've done something which makes it impossible to have a compatible streaming implementation.

@bogdandrutu
Copy link
Member

SpanData is only used in the API to report out-of-band spans, so we do not expose this for in-flight spans or "active" spans. Independent of keeping or not the SpanData in the API to report out-of-band spans we have no intention to expose a get API (except getSpanContext) on active in-progress spans (I think we said that from the beginning, so please don't be worried about that).

Coming back to the issue, The main question for me is "are attributes a map or a multimap?"

@tigrannajaryan
Copy link
Member

@jmacd yes, makes sense.

@tigrannajaryan
Copy link
Member

Coming back to the issue, The main question for me is "are attributes a map or a multimap?"

@bogdandrutu I think "enumerable" can be understood as a multimap or simply as a list of key/value pairs, so that there is no way to query efficiently by given key, all you can do is enumerate all attributes in a SpanData sequentially.

I think the decision how exactly the attributes should be exposed should be dictated by use-cases which require reading them from SpanData. I don't know what are such use-cases that are typical for applications, so I am not of much help here.

@jmacd
Copy link
Contributor

jmacd commented Jun 26, 2019

I wonder why we're not sticking with the OpenCensus model for span attributes. In that model, there were Insert and Delete operations, which definitely imply that span attributes are intended to be thought of as dictionaries.

I support the dictionary interpretation mainly because when these spans are presented as structured data, I'd like an efficient query unique path for filtering.

Note that OpenTracing didn't actually specify an interpretation for SetAttribute. LightStep's collector applies last-value-wins to spans when there are multiple values set for repeated keys. (@yurishkuro how does Jaeger handle this?) For example, if we see SetAttribute("error", true) followed by SetAttribute("error", false), we interpret this as "error not set".

@yurishkuro
Copy link
Member

In Jaeger it varies by language, in some cases both tags would be kept.

The reason we didn't want the API to guarantee map semantics is because doing so precludes streaming implementation, since you need to keep the whole map to guarantee last-wins. Instead, that could be done in the tracing backend.

@SergeyKanzhelev SergeyKanzhelev modified the milestones: API revision: 07-2019, Alpha v0.3 Sep 27, 2019
@tedsuo
Copy link
Contributor

tedsuo commented Dec 4, 2019

SpanData has been removed, and attribute structure is being discussed at the protocol level, so I am closing this for now.

@tedsuo tedsuo closed this as completed Dec 4, 2019
carlosalberto pushed a commit to carlosalberto/opentelemetry-specification that referenced this issue Oct 21, 2024
* Propose OpenTelemetry Logs Vision

This change proposes high-level items that define our long-term vision for
Logs support in OpenTelemetry project, what we aspire to achieve.

This is a starting point for discussion. I believe it is important for us
to collectively come up with such vision.

Comments, thoughts and opinions are highly welcome!
Propose OpenTelemetry Logs Vision

This change proposes igh-level items that define our long-term vision for
Logs support in OpenTelemetry project, what we aspire to achieve.

This a vision document that reflects our current desires. It is not a commitment
to implement everything precisely as listed. The primary purpose of this
document is to ensure all contributors work in alignment. As our vision changes
over time maintainers reserve the right to add, modify, and remove items from
this document.

* Address PR comments
carlosalberto pushed a commit to carlosalberto/opentelemetry-specification that referenced this issue Oct 23, 2024
* Propose OpenTelemetry Logs Vision

This change proposes high-level items that define our long-term vision for
Logs support in OpenTelemetry project, what we aspire to achieve.

This is a starting point for discussion. I believe it is important for us
to collectively come up with such vision.

Comments, thoughts and opinions are highly welcome!
Propose OpenTelemetry Logs Vision

This change proposes igh-level items that define our long-term vision for
Logs support in OpenTelemetry project, what we aspire to achieve.

This a vision document that reflects our current desires. It is not a commitment
to implement everything precisely as listed. The primary purpose of this
document is to ensure all contributors work in alignment. As our vision changes
over time maintainers reserve the right to add, modify, and remove items from
this document.

* Address PR comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:api Cross language API specification issue
Projects
None yet
Development

No branches or pull requests

7 participants