Attributes on SpanData: should they be a dictionary or enumerable. #92

SergeyKanzhelev · 2019-06-10T17:32:05Z

Is it true that typical use is enumeration? This rules out use cases which need to augment / modify specific attributes efficiently.

@SergeyKanzhelev

augmentation is not supported as returned collection is immutable. What scenario do you see when you need an attribute value with the specific name from the SpanData?

@tigrannajaryan

Right. I was thinking about a scenario where you may want to implement telemetry data forwarder, which may also support augmenting the data, but you are right. If we require SpanData to be immutable then this use case is simply not possible. And I agree that we should require SpanData to be immutable since it likely will allow implementations to be more efficient.

@tigrannajaryan

Actually, there is a use case that does not require SpanData modification but requires fast inspection of data. For example a telemetry data forwarder that can also sample the data based on some attribute value.
However, I understand that this may not be the typical use case and we may not want to support it in the API that is aimed at instrumentation. Forwarders can have specialized implementations and do not need to use this API at all.

@SergeyKanzhelev

@tigrannajaryan good point. I was also thinking something like metric extraction from the SpanData may require dictionary.

Let me file an issue. It is worth to be discussed and documented explicitly to address future questions.

SergeyKanzhelev · 2019-06-10T17:32:54Z

Same for labels on Resource

jmacd · 2019-06-26T16:46:12Z

I do not agree that Span attributes should be readable by the process. A streaming implementation will have sent them downstream immediately and have no state to read back.

tigrannajaryan · 2019-06-26T18:14:48Z

While I understand that streaming implementations may be desirable in some cases, I'd like to point out that they have their own drawbacks, particularly with what they impose on Agents/Collectors that may be present in the delivery chain from the application to the backend.

Collectors usually provide capabilities to filter or modify spans based on their attribute values. If streaming implementation ships individual attributes values immediately and Collector receives each attribute value as an independent event the Collector will be forced to reconstruct the full state of the Span to be able to evaluate filtering condition. This can have dramatic performance effects on the Collector. The Collector has to keep all open Spans in memory, and unlike individual application Collectors typically deal with orders of magnitude larger number of spans. What may be a minor amount of memory necessary on application side to keep the Span state can become gigabytes of data in the Collector.
Furthermore the need to reconstruct the state will result in Collector being unable to stream out the data although the input is a stream because the Collector has to wait for complete Span state before making a decision. This negates some of the original benefits of streaming implementation (low latency until events reach the backend).

jmacd · 2019-06-26T19:21:29Z

@tigrannajaryan Agreed. An intermediate process is needed to reconstruct the SpanData from the event stream. The nice thing about this arrangement is that it can be done in a more efficient language, it avoids deadlock and self-diagnostic interactions, it can use riskier code, and so on.

The Agent/Collector infrastructure is designed for the current SpanData exporters. A streaming tracer would depend on a lower-level exporter, and the external process would reconstruct SpanData and export them through your infrastructure.

It should be possible for the SDK to manage state however they want. If this is only an SDK issue, I'm not worried, but if it's possible for a user to access span attributes for a live span, we've done something which makes it impossible to have a compatible streaming implementation.

bogdandrutu · 2019-06-26T19:30:57Z

SpanData is only used in the API to report out-of-band spans, so we do not expose this for in-flight spans or "active" spans. Independent of keeping or not the SpanData in the API to report out-of-band spans we have no intention to expose a get API (except getSpanContext) on active in-progress spans (I think we said that from the beginning, so please don't be worried about that).

Coming back to the issue, The main question for me is "are attributes a map or a multimap?"

tigrannajaryan · 2019-06-26T19:32:36Z

@jmacd yes, makes sense.

tigrannajaryan · 2019-06-26T19:39:20Z

Coming back to the issue, The main question for me is "are attributes a map or a multimap?"

@bogdandrutu I think "enumerable" can be understood as a multimap or simply as a list of key/value pairs, so that there is no way to query efficiently by given key, all you can do is enumerate all attributes in a SpanData sequentially.

I think the decision how exactly the attributes should be exposed should be dictated by use-cases which require reading them from SpanData. I don't know what are such use-cases that are typical for applications, so I am not of much help here.

jmacd · 2019-06-26T21:37:53Z

I wonder why we're not sticking with the OpenCensus model for span attributes. In that model, there were Insert and Delete operations, which definitely imply that span attributes are intended to be thought of as dictionaries.

I support the dictionary interpretation mainly because when these spans are presented as structured data, I'd like an efficient query unique path for filtering.

Note that OpenTracing didn't actually specify an interpretation for SetAttribute. LightStep's collector applies last-value-wins to spans when there are multiple values set for repeated keys. (@yurishkuro how does Jaeger handle this?) For example, if we see SetAttribute("error", true) followed by SetAttribute("error", false), we interpret this as "error not set".

yurishkuro · 2019-06-27T13:12:57Z

In Jaeger it varies by language, in some cases both tags would be kept.

The reason we didn't want the API to guarantee map semantics is because doing so precludes streaming implementation, since you need to keep the whole map to guarantee last-wins. Instead, that could be done in the tracing backend.

tedsuo · 2019-12-04T16:35:27Z

SpanData has been removed, and attribute structure is being discussed at the protocol level, so I am closing this for now.

* Propose OpenTelemetry Logs Vision This change proposes high-level items that define our long-term vision for Logs support in OpenTelemetry project, what we aspire to achieve. This is a starting point for discussion. I believe it is important for us to collectively come up with such vision. Comments, thoughts and opinions are highly welcome! Propose OpenTelemetry Logs Vision This change proposes igh-level items that define our long-term vision for Logs support in OpenTelemetry project, what we aspire to achieve. This a vision document that reflects our current desires. It is not a commitment to implement everything precisely as listed. The primary purpose of this document is to ensure all contributors work in alignment. As our vision changes over time maintainers reserve the right to add, modify, and remove items from this document. * Address PR comments

SergeyKanzhelev mentioned this issue Jun 10, 2019

Spandata #78

Merged

SergeyKanzhelev added this to the API revision: 07-2019 milestone Jun 10, 2019

SergeyKanzhelev added the area:api Cross language API specification issue label Jun 10, 2019

fbogsany mentioned this issue Jul 11, 2019

API prototype open-telemetry/opentelemetry-ruby#11

Merged

iredelmeier added the spandata label Jul 30, 2019

SergeyKanzhelev modified the milestones: API revision: 07-2019, Alpha v0.3 Sep 27, 2019

tedsuo closed this as completed Dec 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attributes on SpanData: should they be a dictionary or enumerable. #92

Attributes on SpanData: should they be a dictionary or enumerable. #92

SergeyKanzhelev commented Jun 10, 2019

SergeyKanzhelev commented Jun 10, 2019

jmacd commented Jun 26, 2019

tigrannajaryan commented Jun 26, 2019

jmacd commented Jun 26, 2019

bogdandrutu commented Jun 26, 2019

tigrannajaryan commented Jun 26, 2019

tigrannajaryan commented Jun 26, 2019

jmacd commented Jun 26, 2019

yurishkuro commented Jun 27, 2019

tedsuo commented Dec 4, 2019