Cleaning up context on reusable carriers before or after injection #1811

lmolkova · 2021-07-09T21:04:19Z

What are you trying to achieve?

Context: #1767 #1783 open-telemetry/opentelemetry-java-instrumentation#440 open-telemetry/opentelemetry-java-instrumentation#903

I'd like to create some clarity around context propagation (for HTTP for now). It also helps when multiple multiple instrumentations in the process for an HTTP request. I think it makes more sense (in terms of supportability and least surprise to users) to always pick outer instrumentation, i.e manual user instrumentation always wins.

However, propagator spec mentions that context should be stripped from the request if it can be reused between tries, which means that inner instrumentation always wins.

The predefined propagation fields. If your carrier is reused, you should delete the fields here
before calling inject.

I'd like to get community opinion on if it is really a desired behavior

it seems it was written under the assumption that there is always single instrumentation for the request - which is not true - Client libraries: Manual/programmatic and auto instrumentation duplication problem #1767, Should aws-sdk instrumentation emit CLIENT spans and suppress downstream http client library instrumentation? opentelemetry-java-instrumentation#440, Prevent duplicate telemetry when using both library and auto instrumentation opentelemetry-java-instrumentation#903
it's an optimization to leave context optimistically hanging on the request so retries can clean it up.

It was introduced in #147 and there is no reasoning/explanation, so above are my assumptions.

Additional context.
It not very common to reuse HTTP requests between retries, but e.g. golang http client allows that and there are likey other clients/cases where it happens.

The text was updated successfully, but these errors were encountered:

iNikem · 2021-07-12T06:27:47Z

I think it makes more sense (in terms of supportability and least surprise to users) to always pick outer instrumentation, i.e manual user instrumentation always wins.

Totally agreed.

However, propagator spec mentions that context should be stripped from the request if it can be reused between tries, which means that inner instrumentation always wins.

The predefined propagation fields. If your carrier is reused, you should delete the fields here

before calling inject.

If we agree that only outmost instrumentation should inject its CLIENT SpanContext into the carrier, then the requirement above does not actually apply to the nested instrumentations. Because they should not call inject anyway.

lmolkova · 2021-07-12T16:21:04Z

I think we have to discuss approaches for context injection on the nested spans.
My logic here is that redirects/retries have to have different contexts (because they are sometimes (for redirects) and usually (for retries) part of app logic and would have different contexts in real life. If sometimes we are able to wrap them under CLIENT span (and they share context) and sometimes not (and have different context), brings inconsistent experience to users . So I suggest always have new context on inner spans that actually perform HTTP requests.

I'm writing this under the assumption we're creating specs that can be implemented with auto-instrumentation.

More here: #1783 (comment)

lmolkova · 2021-07-12T16:27:15Z

Maybe things that don't need new context like circuit-breaking should not even need a span and can be expressed as an event?

And maybe this way can we stay consistent:

rpc call - new span - new context injected
no rpc call - no span (event on the current span instead) - no context injection.

Logical CLIENT calls in this case don't really need to follow HTTP instrumentation - they wrap multiple underlying HTTP calls instead. And they don't inject context. Also, they are not always feasible to the instrument and in most cases (no retries/redirects) replicate what happens on
HTTP level under them.

trask · 2021-07-13T00:48:15Z

Additional context.
It not very common to reuse HTTP requests between retries, but e.g. golang http client allows that and there are likey other clients/cases where it happens.

[just as a tangent: I was surprised to find recently in https://github.com/open-telemetry/opentelemetry-java-instrumentation/pull/2727 that reusing HTTP requests is supported by most of the 20+ Java http client libraries that we instrument]

I agree that we need a way to identify if we are in a nested CLIENT span, for a couple of reasons:

to avoid overwriting traceparent from the outer CLIENT span
to conditionally suppress nested CLIENT spans (based on some kind of user verbosity preference)

It sounds like there are at least a couple of options to accomplish this:

use the presence of traceparent to know that we are in a nested CLIENT span (this requires clearing traceparent at the end of each request if http request reuse is supported by the instrumented library)
use some kind of marker in the Context to know that we are in a nested CLIENT span (this requires setting the marker in the context and propagating it to any downstream libraries that may also be instrumented)

Maybe we can focus initially on

Getting agreement that we really do need a way to identify if we are in a nested CLIENT span
Listing options to accomplish that

tedsuo · 2021-07-13T15:49:44Z

It sounds like unless there is a bug in instrumentation, last write should always win.

If at all possible, we should go with last write wins, and write OpenTelemetry instrumentation with that in mind. I am worried that other solutions add more state and conditional rules, which make it difficult for end users to reason about what their system is doing.

Is there any situation, outside of instrumentation being written or applied in an incoherent manner, where last write wins would not be the correct solution?

lmolkova · 2021-07-13T17:53:26Z

It sounds like unless there is a bug in instrumentation, last write should always win.

Please take a look at #1767, open-telemetry/opentelemetry-java-instrumentation#440 - there could be legit cases when it happens and there is double-instrumentation (native library + auto).

I think we all have different ideas of what's HTTP (gRPC, etc) instrumentation and we first need to start with clarifications.

I believe here are some ideas I heard:

HTTP span is actually a logical call that wraps retries/redirects/etc. Retries have the same context and maybe separate spans or events on a logical span.
HTTP span is a low-level RPC call and all retries have individual context

Then there are different ideas of what's more useful to users

higher level logical call
lower level rpc call

And then there are different problems

How to suppress a certain level of instrumentation (logical/rpc)
How to suppress duplicate instrumentation that happen

So I think we need to build clarity around this within semantic convention WG and try to come up with a proposal that is

feasible to do for auto and native instrumentation
gives consistent tracing experience across usage patterns/http clients/languages
allows preventing double-instrumentation (same type of it) issues
allows suppressing certain instrumentations (e.g. logical calls or rpc calls) to helps users manage costs and reduce the noise they don't need

I will publish my way of thinking around it based on my experience with native instrumentation of .NET HTTP client in pre-OTel era which is up and running for years.

tedsuo · 2021-07-13T18:43:41Z

I agree that we need to define the conventions for having layered client and server spans. I look forwards to your thoughts. I think there is a lack of clarity around how we should be instrumenting these types of layered systems.

In general, I am concerned about adding complexity. In particular, I'm concerned about control mechanisms that allow instrumentation authors to suppress instrumentation coming from other libraries. I'd rather we extend the semantic conventions to provide a consistent language for configuring instrumentation, and allow the end user control over what information is emitted, rather than give instrumentation authors tools for messing around with instrumentation they did not write.

Ideally, our solution for managing configuration can be implemented in languages which do not use the agent approach. And agents are aware of the configuration mechanism that the end user has access to, so there aren't multiple types of control mechanisms fighting with each other.

Related note: I think I need a better understanding about the cases where "double instrumentation" is unavoidable. It feels like higher level instrumentation should defer to lower level instrumentation rather than re-implement it, and if the lower-level client is pluggable then every type of client which is plugged in should provide instrumentation.

I get that reality has edge cases, but it feels like we should try our best to avoid ladling on any complexity we can avoid.

iNikem · 2021-07-14T06:32:16Z

@tedsuo For me the simplest case of "unavoidable double instrumentation" is this:

Application developer manually creates a "logical" span with kind=CLIENT around a remote call
That remote call is implemented by http client which is integrated with Otel, either natively or by the means of instrumentation agent.

Both parties (developer and library) want to create a CLIENT span and propagate it across the wire. I am not alone in the opinion that in this case library (meaning lower level or nested span) should back off.

Oberon00 · 2021-07-14T07:55:21Z

The discussion here doesn't really have much to do with the original issue topic/title anymore. Maybe you could copy to #1767 or create a new issue so that others can find this discussion?

lmolkova · 2021-07-18T18:03:57Z

@tedsuo @iNikem
I wrote rather a long read trying to clarify concerns about HTTP instrumentation. We can use it as a starting point for the discussion.
Let's continue the double instrumentation discussion in #1767.

@Oberon00, good point, I'll close this issue and link to corresponding issues (#1747 too), it seems inevitable that we have to clear context before making calls for the majority of cases.

lmolkova added the spec:context Related to the specification/context directory label Jul 9, 2021

github-actions bot assigned carlosalberto Jul 9, 2021

lmolkova mentioned this issue Jul 9, 2021

Add context propagation requirements to HTTP conventions #1783

Closed

lmolkova closed this as completed Jul 18, 2021

lmolkova mentioned this issue Jul 18, 2021

Client libraries: Manual/programmatic and auto instrumentation duplication problem #1767

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cleaning up context on reusable carriers before or after injection #1811

Cleaning up context on reusable carriers before or after injection #1811

lmolkova commented Jul 9, 2021

iNikem commented Jul 12, 2021

lmolkova commented Jul 12, 2021 •

edited

Loading

lmolkova commented Jul 12, 2021 •

edited

Loading

trask commented Jul 13, 2021

tedsuo commented Jul 13, 2021

lmolkova commented Jul 13, 2021 •

edited

Loading

tedsuo commented Jul 13, 2021

iNikem commented Jul 14, 2021

Oberon00 commented Jul 14, 2021

lmolkova commented Jul 18, 2021

Cleaning up context on reusable carriers before or after injection #1811

Cleaning up context on reusable carriers before or after injection #1811

Comments

lmolkova commented Jul 9, 2021

iNikem commented Jul 12, 2021

lmolkova commented Jul 12, 2021 • edited Loading

lmolkova commented Jul 12, 2021 • edited Loading

trask commented Jul 13, 2021

tedsuo commented Jul 13, 2021

lmolkova commented Jul 13, 2021 • edited Loading

tedsuo commented Jul 13, 2021

iNikem commented Jul 14, 2021

Oberon00 commented Jul 14, 2021

lmolkova commented Jul 18, 2021

lmolkova commented Jul 12, 2021 •

edited

Loading

lmolkova commented Jul 12, 2021 •

edited

Loading

lmolkova commented Jul 13, 2021 •

edited

Loading