Ephemeral Resource Attributes #208

tedsuo · 2022-06-22T15:02:15Z

This OTEP is part of the RUM/Client initiative.

Currently, we are missing a place to put important client information which applies to all telemetry emitted by an SDK. This information includes attributes such as session ID, language preference, locality/timezone, and other types of user data.

Normally, these attributes would be recorded as resources. However, on client processes, there are times when this information changes without the SDK re-initializing. For example:

The browser is idle for 15 minutes, ending a session.
The process is put to sleep, and awakened at a later date.
The user logs in or out, or other state changes which affect both the behavior and the reporting needs for the application.

In all of these cases, the application/SDK is not restarted. Currently, the resource associated with the SDK cannot be changed after it is started. This makes it very difficult to record these needed attributes.

This OTEP proposes a mechanism for updating the SDK with a new resource, which will be applied to all future telemetry created by the SDK. The proposal attempts to do this while preserving important characteristics already defined for resources:

The resource object itself remains immutable, and accessing the resource object when creating telemetry does not introduce a lock. The proposed ResourceProvider concept preserves these characteristics.
Existing resource attributes have a requirement for being present at SDK start time. They are not allowed to be updated or added to the resource once the SDK has started. This OTEP proposes that resource attributes be labeled as either "permanent" or "ephemeral" in the semantic conventions. Permanent attributes may not be updated after the SDK freezes the ResourceProvider.

If there are other backwards compatibility requirements for resources that I have missed, please let me know.

Cheers,
-Ted

scheler · 2022-06-22T17:58:37Z

text/0208-ephemeral-resources.md

+
+There are two types of resource attributes, **permanent** and **ephemeral**. Attributed which are labeled as permanent in the semantic conventions must be present when the SDK is initialized. They cannot be added or updated at a later date.
+
+Resources are managed via a ResourceProvider. Setting an attribute on a ResourceProvider will cause that attribute value to be included in the resource attached to any signal generated in the future. Spans which have already been started, along with any telemetry which has already been passed to the export pipeline, will not have the new attribute value. Optionally, a check can be added to ensure that permanent resources are not modified after the SDK has started


Optionally, a check can be added to ensure that permanent resources are not modified after the SDK has started

If nested attributes proposal is accepted, then one way to simplify ephemeral resources validation is to have just one attribute called ephemeral - the ResourceProvider then allows any modification to the value of this attribute and does not need to look up for which attributes are permanent. This also avoids the need to mark the resource attributes permanent in the semantic conventions yaml files.

I don't see how this would simplify things? You then still have an attribute that needs special handling. Whether it is by name or with an explicit label would not make things more/less simple, would it?

The nested attributes proposal also does not require SDKs to implement them. If we want ephemeral attributes to depend on that, it would mean that SDKs could also not implement ephemeral attributes.

If I understand the proposal correctly, it requires that the permanent attributes be marked so in the semantic conventions. This is the part that will not be required if we limit the special handling to only one attribute with a known name.

Consider the following resource. The ResourceProvider can allow anytime modifications to the key-value pairs within the ephemeral attribute.

{ service.name: foo, service.instance.id: 123, browser.user_agent: bar, ephemeral: { session.id: 456 } }

Anyway, this is an optimization step. Let's ignore this initially until the larger proposal gets acceptance.

Marking something in the semantic conventions is just that: A convention. If we want something to be conventionally ephemeral, we still need to have a note about that in the semantic conventions one way or another.

I agree that it would simply things if ephemeral resources were kept separate from other resources.

Validator is also something which can be run in development, but disabled in production, which would work as an optimization.

Also, one aside on nested attributes: my assumption is that attribute values wouldn't be merged, they would be replaced.

In other words, there is still only a single string key per attribute, but with the option of storing an object, map, or array as the value for that attribute. If you set a new value for the key, it would throw the old value away.

tigrannajaryan · 2022-06-22T20:19:37Z

text/0208-ephemeral-resources.md

+
+An alternative to ephemeral resources would be to create span, metrics, and log processors which attach these ephemeral attributes to every instance of every signal. This would not require a modification to the specification.
+
+There are two problems to this approach. One is that the duplication of attributes is very inefficient. This is a problem on clients, which have a limited newtwork bandwidth. This problem is compounded by a lack of support for gzip and other compression algorithms on the browser.


It would be great to quantify this. How inefficient is it? A benchmark demonstrating this would be a strong argument in favour of the proposed approach.

If processors can change scope attributes, they might be a good candidate to solve this as well.

lack of support for gzip and other compression algorithms on the browser

I'm not an expert on browser stuff but can you expand on this? On its surface it seems wrong since gzipped static resources show up everywhere on the internet and there are js implementations of gzip (like this). This stackoverflow post suggests that a part of it is because a browser client can't know if the server can accept gzipped data, but OTLP requires gzip support.

Yes, it's uncommon because clients may not know if the server can accept compressed data. It is not clear to me if the gzip support in the OTLP spec refers to responses only (common for web services to provide) or requests as well (uncommon).

I think there is also a danger of an attack on the server - compressed data could be expanded to a very large content. And lastly gzip compression is not native to browsers, so there is CPU overhead, which is important to consider for impact on user experience, especially when sending data while the page is unloading.

Aside from that, I think that session ID specifically does not belong on individual signals. The session is a context for many signals in a given time period; it does not vary from signal to signal.

Never mind on the OTLP gzip support, I see it says that clients MAY gzip the content.

@tigrannajaryan Regarding the limited network bandwidth, the sendBeacon() API has a payload limit of 64KB. Assuming session.id attribute that looks like this when sent over the wire

{"key":"session.id","value":{"stringValue":"8fded6726f630a327ee3be41174a8a91"}}

It adds 79 bytes per each signal. The number of spans/events per export will depend on the type of application and which instrumentations are present. But assuming that 100 is plausible, this adds almost 8kB to the payload.

This will further increase if we add additional context attributes (user attributes, URL etc.).

While decompressing is common in browsers (be it gzip or brotli), none of the current request APIs expose a way to have browser compress the request (MDN: XHR, fetch)

This is incorrect. The CompressionStreams API provides a native solution for this and is supported in Chromium-based browsers already.

This does mean that indeed you have to bring your own compression methods. More code = larger bundle that the browser needs to download, parse and execute. First phase is network bound (but does benefit from compression itself), while second and third are CPU bound.

Also benefits from caching. In a network-constrained situation the cost of retrieving the additional code is paid once and the result cached. Conditional requests and etags are your friend.

Also in most cases instrumentation is required to be loaded ASAP (sometimes even before rest of the content on the page), causing site loading to be blocked until code is downloaded (should it be in the of the page)

The additional code for compression is only needed to export telemetry and does not need to be loaded at the same time as the code enabling instrumentation. Deferring until an export is required can increase the time-to-export but would not impact time-to-interaction or any other user-focused timing.

I propose adding an entirely new field called ephemeral_resource as a sibling to resource in ScopeSpans and ScopeLogs - this way, the original resource remains immutable and the new field can be use for the ephemeral attributes of the resource.

This is inverted. ResourceSpans contain ScopeSpans, not the other way around. Ephemeral resource attributes could be added as Scope* attributes on each Scope* produced during the time when the ephemeral resource attributes are active, but I'm not sure I see how changing the OTLP data structure advances the conversation in a safe way. That is the most invasive way of going about this that I could think of.

@Aneurysm9 sorry my bad, I meant ResourceSpans and ResourceLogs and not ScopeSpans and ScopeLogs. I corrected this in my previous comment, can you check if it makes sense this time?

The cost to generate, serialize, and compress that many spans is also not a synchronous process that takes x milliseconds, but many small processes which each take a small fraction of X. It is most important to ensure that each individual step doesn't impact user experience. With the example of 100 spans otlp -> protobuf -> pako on the pixel 4a given, the whole process is 4.393ms but you have 2 chances to yield to the event loop to ensure user experience is not affected.

This is incorrect. The CompressionStreams API provides a native solution for this and is supported in Chromium-based browsers already.

Have missed it but I generally don't consider new browser features as a solution unless usage% is >90% (and well, safari has a monopoly on ios so....) (also 90% is probably low considering how much RUM products are asked for IE11 support but they already have a miserable experience due to using IE in current year so making it optional is worth consideration)

but you have 2 chances to yield to the event loop to ensure user experience is not affected.

There is one but - not when user is leaving the page, tho generally you don't have 100 spans then, making it a question of how much do you want to maintain 2 different code paths (a sync one and an "async" one)

@tigrannajaryan i just want to emphasize what @scheler said, that the purpose of this OTEP is not to avoid compression or gain efficiency, but to extend our data model in a way that correctly represents these attributes.

If we don't want to extend the current Resource concept, we could add a new concept, call it ProccessScope or something similar, and have it work in effectively the same manner.

Personally, I'd prefer we extend resources over adding a new scope. But I prefer both over an approach that makes it impossible to cleanly implement RUM using OpenTelemetry.

In other words, I'm against "just tack on the process scope as span/event attributes" the same way I'd be opposed to "just tack on the instrumentation scope as span/event attributes." In both cases, yes it would "work." But it would create a headache for implementers and confusion for users.

We should strive for a clean data model, where everything is explained just by looking at the data structure.

Oberon00 · 2022-06-23T07:46:31Z

text/0208-ephemeral-resources.md

+
+There are two types of resource attributes, **permanent** and **ephemeral**. Attributed which are labeled as permanent in the semantic conventions must be present when the SDK is initialized. They cannot be added or updated at a later date.
+
+Resources are managed via a ResourceProvider. Setting an attribute on a ResourceProvider will cause that attribute value to be included in the resource attached to any signal generated in the future. Spans which have already been started, along with any telemetry which has already been passed to the export pipeline, will not have the new attribute value. Optionally, a check can be added to ensure that permanent resources are not modified after the SDK has started


I don't see how this would simplify things? You then still have an attribute that needs special handling. Whether it is by name or with an explicit label would not make things more/less simple, would it?

The nested attributes proposal also does not require SDKs to implement them. If we want ephemeral attributes to depend on that, it would mean that SDKs could also not implement ephemeral attributes.

Oberon00 · 2022-06-23T08:01:11Z

text/0208-ephemeral-resources.md

+
+## Trade-offs and mitigations
+
+This change should be fully backwards compatible, with one potential exception: fingerprinting. It is possible that an analysis tool which accepts OTLP may identify individual services by creating an identifier by hashing all of the resource attributes. 


There is another issue: Exporters right now may be implemented to assume they only ever deal with spans with the same resource. With this proposal, they could receive a batch of mixed spans.
Such an exporter may then misbehave and e.g. use the resource from the first/last span for everything.
Implementing sorting of spans by resource can be a bit costly.

Also there may be exporters for protocols that only support a single resource per connected agent. They would then probably need to stamp the ephemeral attributes on every single telemetry item.

Similar issues may apply to span processors.

(And possibly samplers that receive a resource in their constructor, but I don't think that will be a problem in practice open-telemetry/opentelemetry-specification#1658)

Actually, exporters must deal with more than one resource already, which is what made this change so simple!

There is an issue open for that: open-telemetry/opentelemetry-specification#1690
Right now, I don't think it's clear, and Dynatrace exporters take a shortcut here and always use the resource of the first item, assuming it will be the same for every item in the batch (everything else is an absolute edge case today)

Ok, I agree this should be clarified. My understanding is that a BatchSpanProcessor may be shared across multiple SDKs within the same process, and that is done in order to have different sets of resources for different sub-processes. So there is no guarantee that all spans in a batch have the same resource. I know that @MSNev has examples of this pattern.

But, I think that this pattern is extremely rare, so it doesn't surprise me that Dynatrace and other exporters could take a shortcut without anyone noticing.

Our examples are (currently) used using our internal (not OpenTelemetry) SDK's on clients where multiple teams provide different components to the same "view" (page etc) and need / want to report telemetry to their own backends.

And in some runtimes we have a single batching system which is shared, rather than having each component on the view creating its own SDK instance with all of the overhead and batching mechanisms. Thus reducing the runtime impact on resources for the client (CPU, Memory, etc)

Oberon00 · 2022-06-23T08:08:12Z

text/0208-ephemeral-resources.md

+
+This change should be fully backwards compatible, with one potential exception: fingerprinting. It is possible that an analysis tool which accepts OTLP may identify individual services by creating an identifier by hashing all of the resource attributes. 
+
+In this case, it is recommended that these systems modify their behavior, and choose a subset of permanent resources to use as a hash identifier.


That might be a pretty big deal for some, if they only allow storing one set of resource attributes per hash.

@open-telemetry/specs-approvers Please take a look - I suspect we may need a lot of eyes, in case somebody relies on this right now.

It seems crazy to me to use a resource hash as an identifier, given that there is no requirement that the items within it would uniquely identify a service...

But I'm throwing it out there as a possibility, just to cover all the bases.

You should be using something that doesn't exist yet, instead of hashing the whole resources: open-telemetry/opentelemetry-specification#1034 (EDIT: To clarify: We don't do/need this at Dynatrace, I don't know anybody who does. Just a side note)

Yes I agree! There are various attributes which could count as a unique identifier. We could clarify in the spec which ones are currently defined.

One possibility: by default, the SDK could generate a unique ID every time it starts, which would be a reliable identifier because we generate it ourselves. However, this identifier would not be stable across restarts. So there are limits to what can be provided without user input.

Oberon00 · 2022-06-23T08:12:49Z

text/0208-ephemeral-resources.md

+
+There are two problems to this approach. One is that the duplication of attributes is very inefficient. This is a problem on clients, which have a limited newtwork bandwidth. This problem is compounded by a lack of support for gzip and other compression algorithms on the browser.
+
+The second problem is that it becomes difficult to distinguish between emphemeral resources and other types pf attributes. 


Is it needed to distinguish them by type? Usually the attribute keys should be all you need. E.g. if you have a session.id attribute, would you care whether it is an ephemeral resource or a span/event attribute?

In the browser,the overhead of applying the session.id as an attribute on every span and event would be untenable.

As far as the need to differentiate, putting data in the proper envelope helps backend systems use it more effectively.

You might ask, why have resources at all in OTLP? Why not simple apply resources as attributes on every span and event? Besides the inefficiency, it would make life very difficult for backend systems which want to apply different analysis to resources and span attributes.

In the browser,the overhead of applying the session.id as an attribute on every span and event would be untenable.

Citation needed 😃
Would these arguments also apply against #207?

Please see this thread (#208 (comment)) for a lengthy discussion on data limitations in the browser.

I don't think these arguments apply to #207, that proposal would be helpful imho. Just not a solution for ephemeral resources, since many of the events which need these resources happen when there is no trace present.

Oberon00 · 2022-06-23T08:15:14Z

text/0208-ephemeral-resources.md

+
+An alternative to ephemeral resources would be to create span, metrics, and log processors which attach these ephemeral attributes to every instance of every signal. This would not require a modification to the specification.
+
+There are two problems to this approach. One is that the duplication of attributes is very inefficient. This is a problem on clients, which have a limited newtwork bandwidth. This problem is compounded by a lack of support for gzip and other compression algorithms on the browser.


In situations where at least one of the ephemeral attributes changes very often, telemetry items are created between the changes and there are lots of permanent attributes, attaching to to the telemetry items ("signal instance") could even be more efficient.

Generally, I wonder how many ephemeral attributes we expect relative to permanent ones.

We are not expecting large numbers of ephemeral attributes, nor are we expecting them to change with great frequency.

The expectation is that there would be between 1 and 10 ephemeral attributes set on a client, which may update after 15 minutes of inactivity, after the application reawakens, or in response to a change in user or user settings.

Oberon00 · 2022-06-23T08:17:42Z

text/0208-ephemeral-resources.md

@@ -0,0 +1,78 @@
+# Ephemeral Resource Attributes
+
+Define a new type of resource attribute, ephemeral resources, which are allowed to change over the lifetime of the process. Existing resources are redefined as permanent resources, which must be present at SDK initialization and cannot be changed.


I have proposed a somewhat similar OTEP #207

If #207 was implemented you could store your ephemeral resource attributes on the Context, and replace the active context when they change. Please check if #207 would also cover your use case.

Thanks! That looks like a good proposal, but the context scope still presumes a transactional scope within a server handling many independent transactions.

For clients, all telemetry emitted, including logs which are not bounded by a span, are related. Which is why the resource scope appears to be the correct one for things like this.

I think there is a continuum of use cases here, where some are better addressed by this OTEP and others better by #207. If one added the possibility to set a new context as root context (where the default is the empty context), we could have something that applies to everything.
Though the browser usually only has one thread of execution of which everything is a child context (I believe), so you probably would only need to set the attributes you want as active before starting your root spans, and it would stick.

That might work... but it might be better to keep the concept of a "process scope" and a "context scope" separate. I see these attributes as more similar to resources and instrumentation scopes - they represent the environment the transaction is occurring within.

Because contexts are immutable, and no rules as to when child contexts may be created, there would be synchronization issues between when ephemeral resources are updated and when they would applied, if they only change the root context and thus only affect transactions which start from a new root context.

carlosalberto · 2022-06-27T13:50:52Z

@tedsuo Thanks - I feel like some examples would be great, as it seems it's the Validator the one separating Resources/Attributes between permanent and ephemeral?

tedsuo · 2022-06-30T19:02:10Z

Sure, no problem @carlosalberto. Would you want an example implementation? Or an example use case?

tedsuo · 2022-07-07T16:10:40Z

Added an example implementation and example use case.

tedsuo · 2022-07-14T15:32:57Z

Yes? No? What should we do here? Based to these requirements, it would be good to understand how the TC the would like to move forward.

tigrannajaryan · 2022-07-14T16:01:15Z

@tedsuo the spec defined Resource like this:

A Resource is an immutable representation of the entity producing telemetry as Attributes.

This text is in a Stable spec document. How do we reconcile this OTEP with the spec's stance on immutability of the Resource? Are you suggesting that we break a Stable spec document? Or you do not think this is a breaking change?

t2t2 · 2022-07-14T16:41:59Z

How do we reconcile this OTEP with the spec's stance on immutability of the Resource?

This doesn't change anything about current resource immutability - an update on the resource provider would end up in a new resource instance. To speak in code:

const resourceProvider = new ResourceProvider({
    // Initial set of attributes, internally does a new Resource(attrs) and stores it as current value
    'session.id': '1',
});
const tracerProvider = new TracerProvider({ resourceProvider });
const tracer = tracerProvider.getTracer(/* irrelevant */);

const span1 = tracer.getSpan(/* ... */);
// internally span.resource = tracer.tracerProvider.resourceProvider.getResource()

// Some time later, user logs in and their identity is known
resourceProvider.setAttribute('enduser.id', 'superadmin');
// internally currentResource = currentResource.merge(new Resource(newAttrs)), which as per the current spec
// returns a new Resource with merged attrs
// That new Resource is set as the current value in ResourceProvider

// Or session expires and a new one is set
resourceProvider.setAttribute('session.id', '2')

const span2 = tracer.getSpan(/* ... */);


span1.resource !== span2.resource

assert.deepEquals(span1.resource.attributes, { 'session.id': '1' });
assert.deepEquals(span2.resource.attributes, { 'session.id': '2', 'enduser.id': 'superadmin' });

tigrannajaryan · 2022-07-14T17:08:20Z

This doesn't change anything about current resource immutability - an update on the resource provider would end up in a new resource instance.

I disagree. This is not just about a Resource instance in memory. It is about the Resource that is emitted by the instrumented application. The recipients of telemetry expect that the resource is immutable, i.e. its attributes do not change over time.

The OTEP talk about this in the "Trade-offs and mitigations" section. I think this is a breaking change. It breaks the contract between Otel sources and telemetry destinations. The OTEP text even recommends this:

In this case, it is recommended that these systems modify their behavior

I don't think this is acceptable. We are saying that "yes, we broke the contract, deal with it". IMO, we cannot do that.

tigrannajaryan · 2022-07-14T22:03:58Z

I thought a bit more about this, I want to find a solution.

I don't think we can delete the requirement which says the Resource is immutable. I think this needs to stay otherwise we are breaking the contract. Additionally, unfortunately the spec says we are not allowed to change the association of the Resource and TracerProvider once that association is established:

a resource can be associated with the TracerProvider when the TracerProvider is created.

However, let's step back for a moment. I don't think recipients of telemetry care about the association inside the SDK. The recipients care about the data model and data model certainly allows the SDK to emit telemetry associated with different Resources. A new TracerProvider can be created with a new Resource and can be used to emit telemetry that was previously emitted using a different TracerProvider and this is completely legal.

Given the above, I do not see any clause in the spec that directly prohibits us from introducing a new way for TracerProvider to be associated with some proxy object which itself is associated with a Resource and allow that association to change over time. Yes, this is in a sense cheating, but it allows to introduce this new way such that it is not a breaking change for the SDK. That's what the proxy ResourceProvider here does.

To me the following questions remain:

Is it right that session id is part of the Resource? It doesn't feel right but I can't put my finger on it, so I will refrain from objecting to this for now.
Why do we need to introduce anything called "ephemeral attributes"? I think this is not needed. They are regular attributes just like any other. Nothing ephemeral here. We only introduce a new way to specify the Resource that must be associated with the produced telemetry. That's all it is. It is a regular Resource, an immutable one. Attributes are all regular.
Is it really possible to introduce ResourceProvider with the ability to attach it to TracerProvider in a way that does not break any existing code? We need to see prototypes that demonstrate this.

martinkuba · 2022-07-14T23:03:32Z

Is it right that session id is part of the Resource?

It is an attribute that applies to all telemetry coming out of the application. It does not change from signal to signal, nor is it scoped to a specific instrumentation. I don't think there is any other place it could go than the resource level (given the current data model).

Why do we need to introduce anything called "ephemeral attributes"?

I think this is an attempt to alleviate the contract between OTel sources and destinations. If there is a real reason that backends need to have an immutable set of resource attributes per application instance, then this would make it possible by defining in the semantic conventions which attributes are permanent and which can change.

We assumed that the only reason backends would be relying on this contract is if they were doing something like hashing all the resource attributes (e.g. to identify the instance). Yes, this would force these backends to be updated, but it would provide them with a way to continue using the hashing. Also, since the TracerProvider can be recreated within the same application instance, defining which attributes are permanent or ephemeral is just making it explicit.

Aneurysm9 · 2022-07-20T15:27:29Z

Is it right that session id is part of the Resource?

It is an attribute that applies to all telemetry coming out of the application. It does not change from signal to signal, nor is it scoped to a specific instrumentation. I don't think there is any other place it could go than the resource level (given the current data model).

I'm not sure I see it the same way. Does it truly apply to all telemetry coming out of the application? Is it not possible for the same application instance to have two sessions active? Doesn't the fact that it can change while the application is running necessarily mean that it does not apply to all telemetry? Yes, the "session ID" attribute as a concept does, but not any given value. That is different from all other resource attributes.

As for not being scoped to a specific instrumentation, it is akin to the trace ID in that it can be used for correlation of signals. How would it be useful with distribution metrics? Do I really care to have a timeseries for every user session to track load times, or do I want to have a more general metric that has exemplars pointing at potentially interesting sessions?

As for where else it could go, it could certainly be added as a scope attribute. This would require a bit more bookkeeping on the part of the instrumentor to keep a map of sessions to tracers, etc., or to store them in session-scoped storage, but is feasible. More appropriate, perhaps, would be in the context where it would be available to all signals. Propagation across process boundaries to allow for correlation (I assume a session can be serviced by application elements that are outside of the immediately user-facing process) is still an issue. I think, though that this all reinforces my belief that session ID and trace ID are synonymous and that sessions are simply long traces. Do we really need a new concept, and to contort ourselves to find ways to claim that we're not breaking compatibility with a stable specification, to handle something that the existing concepts can already handle?

t2t2 · 2022-09-16T19:43:59Z

Note: I originally started this as part of response to open-telemetry/opentelemetry-specification#2500 (comment) but the first section ended up being more related to this otep being stuck, so here it is!

Let's eliminate the confusion of what a session means for a bit. There are some other attributes that are

good candidates for resource level
value can change over time
as a concept is more familiar to backend service / APM kind of usage that current otel contributors are a lot more familiar with

Let's bring in enduser.id

Currently it's defined as a span level identifying attribute. Which hey, makes total sense in a server side environment. You've got a server side service that you can have one server serve all of the users of the application. If you'd want to have enduser that caused a request set on all of the child spans, yes context makes a lot of sense since the entire server isn't dedicated to just one enduser. Anyways got my 3rd condition

Let's jump to client side. I open up local food delivery app, and it's instrumented to generate telemetry. Alright, what's the resource attributes. Well you've got

the ones describing the app (service.name = app name, service.version = app version, probably also including installation source, build type, ....)
but also the runtime (so like os, os version, ...; but when running in browser also browser user agent related info)
but also I'm logged in and interacting with the app so most of the telemetry comes out of my interactions with the app or server side updates pushed based on me being logged in. I'd consider this a reason to have enduser.id on resource, fulfilling condition 1

I try to order something but suddenly app runs into a bug and crashes. Smash cut, support person is messaging devop team "hey got this guy going crazy over not being able to order, can you figure out what's going on there, why his app keeps crashing". Devops looks up data based on my name, sees an attempt to order 100 kebabs in tracing spans that caused KebabStackOverflow in logs. Really this paragraph is only here to be referenced later while still having a linear timeline for the domain knowledge story

Somehow you're next to me and mention you uninstalled the app due to constant crashing a while ago and now have a please come back discount on your account. I hand my phone to you, you log my account out and log in with your account. And manage to successfully order after a more reasonable order size.

Now the logged in account has changed, so if the logged in account is in resource, the above telemetry should be over 3 resources: Data from me, data from anonymous user, data from you. (so now we've fulfilled condition 2)

Other than local food delivery app, some other examples:

The user of a self-service kiosk at an airport
The logged in cashier at a POS system
A rental scooter user

But other potential attributes:

Session id (the logic may be defined in otel but it's caused to change by external factor - time, user (in)activity)
The IP or ISP of the device the data is coming from (Roaming over the border of countries, changing the internet provider; or from wifi to mobile data)
A case could also be made for document location or the active screen in mobile app Scope attribute for browser page url in all spans and events emitted from browser opentelemetry-js#3222

So I think something people who haven't built a RUM need to consider is that a major difference between backend services and client side apps is that apps have (a lot more) state. A lot of this state is global (not scoped to parts of the app like within one request that is forgotten once the request is fulfilled), it changes over time (due to time, user interactions, or completely external actions) and in a lot of the cases it's useful or needed to assist in debugging using gathered telemetry (who, what device, what screen/url, what isp, geolocation)

A lot of these attributes are also what you'd want to query data by. Already mentioned looking up data based on app user info, but let's consider some of the RUM use cases:

Viewing the flow of an user during the session (querying spans/logs based on session.id)
Which page URL an error occurred the most (querying logs based on document.location/whatever it will get spec-ed to)
How's the webvitals score for people in a specific country (metrics(?) with geodata)
Comparing request spans based on ISP

These add considerations for efficient data ingest and storage. Now every vendor will probably have different opinions on this based on how they use and store data. In July I got some knowledge from @mdubbyap on splunk/signalfx ingest side about our use cases (and probably should have used this knowledge earlier so I don't accidentally misremember it but oh well Ted's been on vacation anyway so it wouldn't have helped move this forward):

For ingesting best is to minimise the amount of bytes that needs to be read in order to determine where to pipe the data to (be it partitioning, buckets or whatever optimises your infra). Worst is having to read deep enough to get into each span/log/metric and check it's attributes for the value. If it's a value on the resource, ingest only needs to read resource's bytes before determining where to send the data, not needing to parse the rest of the payload. (Since we focus on showing session experience, then obviously for us session.id attribute is 👀👀)

There also can be ways fulfilling legal requirements can be easier if these attributes are more easily readable, eg. indexing data based on enduser info to make deleting data on user request (such as GDPR) to be easier

Also linking open-telemetry/opentelemetry-specification#2775 as it's gone into topic of descriptive or identifying attributes, which has been to be one of the reasons against this otep so far

scheler · 2023-05-31T20:17:29Z

Hi, wanted to give an update on this topic, since some of us from the client-side-telemetry SIG have asked a few TC members to help us on the topic further. Copying the message that @jack-berg posted on slack -

@jsuereth and I discussed the issue of session id and the other attributes you want to attach to all telemetry in this week’s TC meeting. Here are some of the take aways:
We don’t think it’s appropriate to include these as resource attributes. However, we recognize the specification needs more clarity about what represents an entity in the client instrumentation space where different architectures (web application, native application, SPA) can result in SDK lifecycles that are quite different.
If not resource attributes, then the other option is to include them on individual records. We think you should pursue a strategy where these attributes are set in context, and lifted out of context onto the individual records in a custom SpanProcessor / LogRecordProcessor. It should be possible to do this today given the APIs that are available, but ideally it would be easier. We can use this use case to help steer an OTEP where shared attributes propagated via context are included on all signals.
Naturally, this is going to cause bigger payloads over the network in environments where compression isn’t available. We think this problem needs solving, but is orthogonal to where the attributes live from a data modeling perspective. We’ll separately pursue adjusting the OTLP specification to try to optimize for these types of scenarios. One potential solution is to extend the protocol with the notion of dictionaries of shared attributes, which individual records could reference instead of duplicating.

Oberon00 · 2023-05-31T20:28:04Z

@scheler

We think you should pursue a strategy where these attributes are set in context, and lifted out of context onto the individual records in a custom SpanProcessor / LogRecordProcessor

This is what I proposed in OTEP #207 to be a blessed concept with its own API, by the way. But as you said, it is in principle implementable today.

tedsuo · 2023-07-31T16:25:47Z

Closing this in favor of a new proposal coming from the RUM/Client group.

This is a proposal to address Resource and Entity data model interactions, including a path forward to address immediate friction and issues in the current resource specification. The proposal includes all links and context needed to justify it, but duplicating a snapshot here: ## Motivation This proposal attempts to focus on the following problems within OpenTelemetry to unblock multiple working groups: - Allowing mutating attributes to participate in Resource ([OTEP 208](#208)). - Allow Resource to handle entities whose lifetimes don't match the SDK's lifetime ([OTEP 208](#208)). - Provide support for async resource lookup ([spec#952](open-telemetry/opentelemetry-specification#952)). - Fix current Resource merge rules in the specification, which most implementations violate ([oteps#208](#208), [spec#3382](open-telemetry/opentelemetry-specification#3382), [spec#3710](open-telemetry/opentelemetry-specification#3710)). - Allow semantic convention resource modeling to progress ([spec#605](open-telemetry/opentelemetry-specification#605), [spec#559](open-telemetry/opentelemetry-specification#559), etc). --------- Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com> Co-authored-by: jack-berg <34418638+jack-berg@users.noreply.github.com> Co-authored-by: Arve Knudsen <arve.knudsen@gmail.com> Co-authored-by: David Ashpole <dashpole@google.com>

This is a proposal to address Resource and Entity data model interactions, including a path forward to address immediate friction and issues in the current resource specification. The proposal includes all links and context needed to justify it, but duplicating a snapshot here: ## Motivation This proposal attempts to focus on the following problems within OpenTelemetry to unblock multiple working groups: - Allowing mutating attributes to participate in Resource ([OTEP 208](open-telemetry/oteps#208)). - Allow Resource to handle entities whose lifetimes don't match the SDK's lifetime ([OTEP 208](open-telemetry/oteps#208)). - Provide support for async resource lookup ([spec#952](open-telemetry#952)). - Fix current Resource merge rules in the specification, which most implementations violate ([oteps#208](open-telemetry/oteps#208), [spec#3382](open-telemetry#3382), [spec#3710](open-telemetry#3710)). - Allow semantic convention resource modeling to progress ([spec#605](open-telemetry#605), [spec#559](open-telemetry#559), etc). --------- Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com> Co-authored-by: jack-berg <34418638+jack-berg@users.noreply.github.com> Co-authored-by: Arve Knudsen <arve.knudsen@gmail.com> Co-authored-by: David Ashpole <dashpole@google.com>

This is a proposal to address Resource and Entity data model interactions, including a path forward to address immediate friction and issues in the current resource specification. The proposal includes all links and context needed to justify it, but duplicating a snapshot here: ## Motivation This proposal attempts to focus on the following problems within OpenTelemetry to unblock multiple working groups: - Allowing mutating attributes to participate in Resource ([OTEP 208](open-telemetry#208)). - Allow Resource to handle entities whose lifetimes don't match the SDK's lifetime ([OTEP 208](open-telemetry#208)). - Provide support for async resource lookup ([spec#952](open-telemetry/opentelemetry-specification#952)). - Fix current Resource merge rules in the specification, which most implementations violate ([oteps#208](open-telemetry#208), [spec#3382](open-telemetry/opentelemetry-specification#3382), [spec#3710](open-telemetry/opentelemetry-specification#3710)). - Allow semantic convention resource modeling to progress ([spec#605](open-telemetry/opentelemetry-specification#605), [spec#559](open-telemetry/opentelemetry-specification#559), etc). --------- Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com> Co-authored-by: jack-berg <34418638+jack-berg@users.noreply.github.com> Co-authored-by: Arve Knudsen <arve.knudsen@gmail.com> Co-authored-by: David Ashpole <dashpole@google.com>

This is a proposal to address Resource and Entity data model interactions, including a path forward to address immediate friction and issues in the current resource specification. The proposal includes all links and context needed to justify it, but duplicating a snapshot here: ## Motivation This proposal attempts to focus on the following problems within OpenTelemetry to unblock multiple working groups: - Allowing mutating attributes to participate in Resource ([OTEP 208](open-telemetry/oteps#208)). - Allow Resource to handle entities whose lifetimes don't match the SDK's lifetime ([OTEP 208](open-telemetry/oteps#208)). - Provide support for async resource lookup ([spec#952](open-telemetry#952)). - Fix current Resource merge rules in the specification, which most implementations violate ([oteps#208](open-telemetry/oteps#208), [spec#3382](open-telemetry#3382), [spec#3710](open-telemetry#3710)). - Allow semantic convention resource modeling to progress ([spec#605](open-telemetry#605), [spec#559](open-telemetry#559), etc). --------- Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com> Co-authored-by: jack-berg <34418638+jack-berg@users.noreply.github.com> Co-authored-by: Arve Knudsen <arve.knudsen@gmail.com> Co-authored-by: David Ashpole <dashpole@google.com>

This is a proposal to address Resource and Entity data model interactions, including a path forward to address immediate friction and issues in the current resource specification. The proposal includes all links and context needed to justify it, but duplicating a snapshot here: ## Motivation This proposal attempts to focus on the following problems within OpenTelemetry to unblock multiple working groups: - Allowing mutating attributes to participate in Resource ([OTEP 208](open-telemetry#208)). - Allow Resource to handle entities whose lifetimes don't match the SDK's lifetime ([OTEP 208](open-telemetry#208)). - Provide support for async resource lookup ([spec#952](open-telemetry/opentelemetry-specification#952)). - Fix current Resource merge rules in the specification, which most implementations violate ([oteps#208](open-telemetry#208), [spec#3382](open-telemetry/opentelemetry-specification#3382), [spec#3710](open-telemetry/opentelemetry-specification#3710)). - Allow semantic convention resource modeling to progress ([spec#605](open-telemetry/opentelemetry-specification#605), [spec#559](open-telemetry/opentelemetry-specification#559), etc). --------- Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com> Co-authored-by: jack-berg <34418638+jack-berg@users.noreply.github.com> Co-authored-by: Arve Knudsen <arve.knudsen@gmail.com> Co-authored-by: David Ashpole <dashpole@google.com>

This is a proposal to address Resource and Entity data model interactions, including a path forward to address immediate friction and issues in the current resource specification. The proposal includes all links and context needed to justify it, but duplicating a snapshot here: ## Motivation This proposal attempts to focus on the following problems within OpenTelemetry to unblock multiple working groups: - Allowing mutating attributes to participate in Resource ([OTEP 208](open-telemetry/oteps#208)). - Allow Resource to handle entities whose lifetimes don't match the SDK's lifetime ([OTEP 208](open-telemetry/oteps#208)). - Provide support for async resource lookup ([spec#952](#952)). - Fix current Resource merge rules in the specification, which most implementations violate ([oteps#208](open-telemetry/oteps#208), [spec#3382](#3382), [spec#3710](#3710)). - Allow semantic convention resource modeling to progress ([spec#605](#605), [spec#559](#559), etc). --------- Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com> Co-authored-by: jack-berg <34418638+jack-berg@users.noreply.github.com> Co-authored-by: Arve Knudsen <arve.knudsen@gmail.com> Co-authored-by: David Ashpole <dashpole@google.com>

OTEP for ephemeral resuource attributes

3e6a2be

tedsuo requested a review from a team June 22, 2022 15:02

add issue number to filename

e06b96c

scheler reviewed Jun 22, 2022

View reviewed changes

tigrannajaryan reviewed Jun 22, 2022

View reviewed changes

Oberon00 reviewed Jun 23, 2022

View reviewed changes

tigrannajaryan mentioned this pull request Jul 4, 2022

Consider adding short keys for JSON encoding open-telemetry/opentelemetry-proto#412

Closed

Add example usage

381fb82

Oberon00 mentioned this pull request Jul 6, 2022

Use short keys for OTLP/JSON open-telemetry/opentelemetry-proto#413

Closed

tedsuo added 2 commits July 7, 2022 17:56

Added example implementation of ResourceProvider

7852cf3

Moved examples to bottom of document

25e17a5

scheler mentioned this pull request Aug 5, 2022

Add feature flagging semantic conventions open-telemetry/opentelemetry-specification#2529

Merged

martinkuba mentioned this pull request Aug 16, 2022

Project Tracking: Client Instrumentation open-telemetry/opentelemetry-specification#2734

Closed

scheler mentioned this pull request Sep 2, 2022

Scope attribute for browser page url in all spans and events emitted from browser open-telemetry/opentelemetry-js#3222

Closed

2 tasks

Oberon00 mentioned this pull request Sep 9, 2022

Scope attributes as part of identity is a breaking change open-telemetry/opentelemetry-specification#2762

Closed

t2t2 mentioned this pull request Sep 16, 2022

Storing session data on Resource open-telemetry/opentelemetry-specification#2500

Closed

This was referenced Sep 26, 2022

Contributing Geo fields from ECS open-telemetry/opentelemetry-specification#2835

Closed

Add Geo fields from Elastic Common Schema open-telemetry/semantic-conventions#1033

Closed

scheler mentioned this pull request Nov 30, 2022

Refine which attributes of Resource contribute to Metric Identity. open-telemetry/opentelemetry-specification#2775

Open

tedsuo added area:client priority:p0 labels Jan 9, 2023

tedsuo added the triaged label Jan 30, 2023

MSNev mentioned this pull request Mar 14, 2023

Additional resource attributes for browser open-telemetry/opentelemetry-specification#3287

Closed

AlexanderWert mentioned this pull request May 3, 2023

BREAKING: Rename remaining network attributes from net.* to network.* and align definitions with ECS open-telemetry/opentelemetry-specification#3426

Merged

tedsuo closed this Jul 31, 2023

martinkuba mentioned this pull request Aug 8, 2023

Shared attributes prototype open-telemetry/opentelemetry-js#4045

Closed

martinkuba mentioned this pull request Nov 9, 2023

How to handle common / global attributes? open-telemetry/opentelemetry-js#4274

Closed

2 tasks

jsuereth mentioned this pull request Aug 13, 2024

Initial Entity and Resource proposal. #264

Merged


		There are two types of resource attributes, permanent and ephemeral. Attributed which are labeled as permanent in the semantic conventions must be present when the SDK is initialized. They cannot be added or updated at a later date.

		Resources are managed via a ResourceProvider. Setting an attribute on a ResourceProvider will cause that attribute value to be included in the resource attached to any signal generated in the future. Spans which have already been started, along with any telemetry which has already been passed to the export pipeline, will not have the new attribute value. Optionally, a check can be added to ensure that permanent resources are not modified after the SDK has started


		An alternative to ephemeral resources would be to create span, metrics, and log processors which attach these ephemeral attributes to every instance of every signal. This would not require a modification to the specification.

		There are two problems to this approach. One is that the duplication of attributes is very inefficient. This is a problem on clients, which have a limited newtwork bandwidth. This problem is compounded by a lack of support for gzip and other compression algorithms on the browser.


		## Trade-offs and mitigations

		This change should be fully backwards compatible, with one potential exception: fingerprinting. It is possible that an analysis tool which accepts OTLP may identify individual services by creating an identifier by hashing all of the resource attributes.


		This change should be fully backwards compatible, with one potential exception: fingerprinting. It is possible that an analysis tool which accepts OTLP may identify individual services by creating an identifier by hashing all of the resource attributes.

		In this case, it is recommended that these systems modify their behavior, and choose a subset of permanent resources to use as a hash identifier.


		There are two problems to this approach. One is that the duplication of attributes is very inefficient. This is a problem on clients, which have a limited newtwork bandwidth. This problem is compounded by a lack of support for gzip and other compression algorithms on the browser.

		The second problem is that it becomes difficult to distinguish between emphemeral resources and other types pf attributes.

		@@ -0,0 +1,78 @@
		# Ephemeral Resource Attributes

		Define a new type of resource attribute, ephemeral resources, which are allowed to change over the lifetime of the process. Existing resources are redefined as permanent resources, which must be present at SDK initialization and cannot be changed.

Ephemeral Resource Attributes #208

Ephemeral Resource Attributes #208

Conversation

tedsuo commented Jun 22, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Oberon00 Jun 28, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Oberon00 Jun 28, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

carlosalberto commented Jun 27, 2022

tedsuo commented Jun 30, 2022

tedsuo commented Jul 7, 2022

tedsuo commented Jul 14, 2022

tigrannajaryan commented Jul 14, 2022 • edited Loading

t2t2 commented Jul 14, 2022

tigrannajaryan commented Jul 14, 2022

tigrannajaryan commented Jul 14, 2022

martinkuba commented Jul 14, 2022

Aneurysm9 commented Jul 20, 2022

t2t2 commented Sep 16, 2022

scheler commented May 31, 2023

Oberon00 commented May 31, 2023

tedsuo commented Jul 31, 2023

tedsuo commented Jun 22, 2022 •

edited

Loading

Oberon00 Jun 28, 2022 •

edited

Loading

Oberon00 Jun 28, 2022 •

edited

Loading

tigrannajaryan commented Jul 14, 2022 •

edited

Loading