Proposal to separate context propagation from observability #42

tedsuo · 2019-09-10T01:07:41Z

This proposal provides a high level description of OpenTelemetry, which shows how context propagation, observability, and baggage could be cleanly decoupled.

A couple notes on terminology:

the terms "distributed application" and "application layer" used in this document refer to systems which leverage the context propagation layer, like observability and baggage. They do not refer to the "user applications" these systems are embedded in.
In this doc, I refer to TagMap as Correlations and labels. There can be a separate discussion for what to name this, but since Tag has repeatedly come up as a contentious name for this feature, I tried to use something "similar but different" to avoid this debate. I actually feel that the term Correlation contains a more accurate meaning than Tag, so it may be an improvement.
TTL is referred to as hoplimit, to reflect the clearer name of this feature in IPv6. It is represented as enum rather than an integer, since only two values are defined.

The goal with this spec proposal is to find a way to describe these public APIs as simple as possible, but no simpler. This required touching on most of the public APIs for OpenTelemetry. This description is meant to be high level, but not inaccurate. If a detail appears to be missing, it is intentional - presumably handled at the SDK layer. Possibly, it should be present. Please review this document with this in mind.

tedsuo · 2019-09-10T01:15:15Z

This proposal relates to #37 and #36.

text/0000-separate-context-propagation.md

Oberon00 · 2019-09-10T08:48:03Z

text/0000-separate-context-propagation.md

+
+Because the application and context propagation layers are separated, it is possible to create new distributed applications which do not depend on either the Observability or Baggage APIs.
+
+**GetPropagator(type) inject, extract**


With the text that follows, I expected a RegisterPropagatator API here. Somehow this should reference the actual registration function described later.

So, the propagation layer provides the RegisterPropagator function. Applications provide, and applications provide the Propagators to be registered.

Is there a better way to explain this in the GetPropagator description? Right now it says "To register with the propagation system, the [BLANK] API provides a set of propagation functions for every propagation type."

I've removed the Registry concept, hopefully this is clearer.

text/0000-separate-context-propagation.md

Oberon00 · 2019-09-10T09:01:57Z

text/0000-separate-context-propagation.md

+
+## What about complex propagation behavior?
+
+Some OpenTelemetry proposals have called for more complex propagation behavior. For example, having a fallback to extracting B3 headersif Trace-Context headers are not found. Chained propagators and other complex behavior can be modeled as implementation details behind the Propagator interface. Therefore, the propagation system itself does not need to provide chained propagators or other additional facilities.


the propagation system itself does not need to provide chained propagators

So what happens when multiple propagators are registered for the same type? I think chained propagators may be cumbersome to provide on top of an API that allows only one propagator, since it requires cooperation between everyone who wants to add a propagator to the chain.

I have seen code that does this, by using a stack-alike Propagator that tries to inject/extract from a list of propagators (order is important) - for extraction, it returns with the first successful attempt, and for injection it simply tries to inject all formats. And yes, we should have a test case for this scenario, so we know where work fine.

So, here is the nuance: there is basic chaining provided at the API level by RegisterPropagator. All propagators are run for every type, in the order in which they are registered. This is where the cooperation happens between independent applications.

More complex propagation behavior usually ends up being specific to each application. So, the OTel SDK can provide the kind of fallback W3C -> B3 chaining for observability, described here. That sort of behavior is presented as a single, complex propagator, from the point of view of the Propagation API, since the fallback behavior is internal to a single application. Does that make sense?

carlosalberto · 2019-09-10T13:09:55Z

One thing to keep in mind is that in some languages there's an implicitly current Context object. In Java you can fetch/copy the current Context, whereas in Python (both through contextvars and the current OpenTelemetry Python implementation) you can only either set values on it (thread-local alike) or copy the entire thing.

For this purpose, I'd imagine for some languages make Context and optional parameter, falling back to the current one.

Co-Authored-By: Christian Neumüller <christian+github@neumueller.me>

text/0000-separate-context-propagation.md

reyang · 2019-09-10T16:09:20Z

text/0000-separate-context-propagation.md

+Some OpenTelemetry proposals have called for more complex propagation behavior. For example, having a fallback to extracting B3 headersif Trace-Context headers are not found. Chained propagators and other complex behavior can be modeled as implementation details behind the Propagator interface. Therefore, the propagation system itself does not need to provide chained propagators or other additional facilities.
+
+
+## Did you add a context parameter to every API call because Go has infected your brain?


Curious, is this just for fun?

Haha, well I wanted to address what I perceive to be a common question, along the lines of "why is this context parameter everywhere? Is it because this is a golang project? Is it required that every language must pass context this way?"

But if the humor gets in the way of learning, I can change it.

I like it. You can also stress that it's not just a Go thing - in the old versions of Node there was no CLS (or it was extremely inefficient) and explicitly passing the context was how the propagation was achieved.

Good call, I will emphasize that this issue exists in multiple languages.

text/0000-separate-context-propagation.md

Co-Authored-By: Christian Neumüller <christian+github@neumueller.me>

Co-Authored-By: Reiley Yang <reyang@microsoft.com>

Co-Authored-By: Christian Neumüller <christian+github@neumueller.me>

tedsuo · 2019-10-16T16:01:24Z

So, I am generally happy with where this is landing. But there are two major issues which must be worked out.

Named Tracers and Metrics

With the advent of named tracers, propagation now depends on having a handle to the "current" tracer. For example, propagation may be disabled, or the headers used may be changed, depending on the named tracer being used. At least, that is my read on how named tracers will be used.

Since propagation behavior now depends on which tracer instance is used, how a single inject call can inject multiple independent propagators needs to be thought about a bit more. Named tracers must be addressed as part of this context propagation change. From talking with @Oberon00, I think this is a big issue.

Active Span / Context Switching

Right now, context management occurs at the span level, by setting a current span active in a thread. If there is more than one kind of context which must track the execution flow, then context switching must be managed at the context level, not the span level. When a span is moved from one thread to another, or the current thread is swapping the active span because there is an async layer on top of it (python gevent, for example), the whole context must be moved, not just the span.

This makes plenty of logical sense, but I think it may be a big change in practice. I don't want to hand wave about it; I think we need to spike this in code in order to get a handle on what this would mean in practice.

hodgesds · 2019-10-18T02:52:38Z

text/0042-separate-context-propagation.md

+OpenTelemetry is separated into an **application layer** and a **context propagation layer**. In this architecture, multiple distributed applications - including the observability and baggage systems provided by OpenTelemetry - share the same underlying context propagation system.
+
+
+# Application Layer


This RFC doesn't really explain how different applications would work, rather it goes into the implementation details of metrics and tracing "observability systems". It would be nice to have a better definition of what an application is in the application layer. Are all applications required to share a common interface?

yurishkuro · 2019-10-18T03:33:23Z

text/0042-separate-context-propagation.md

+
+Distributed tracing is an example of a cross-cutting concern, which requires non-local, transaction-level context propagation in order to execute correctly. Transaction-level context propagation can also be useful for other cross-cutting concerns, e.g., for security, versioning, and network switching. We refer to these types of cross-cutting concerns as **distributed applications**.
+
+OpenTelemetry is separated into an **application layer** and a **context propagation layer**. In this architecture, multiple distributed applications - including the observability and baggage systems provided by OpenTelemetry - share the same underlying context propagation system.


From https://docs.google.com/document/d/1UxrEYOaQlF_E4gtiPoFmcZ4YKKe1GxohvCvQDuwvD1I/edit#

yurishkuro · 2019-10-18T03:45:55Z

text/0042-separate-context-propagation.md

+**GetHTTPExtractor() -> extractor**  
+To deserialize the state of the system sent from the the prior upstream process, the Baggage API provides a function which returns a HTTPExtract function. 
+
+**GetHTTPInjector() -> injector**  


I think it's misleading to combine injectors/extractors with the API for manipulating the baggage. I am not even sure there should be "get" methods for those - who would call that? The inter-process propagation layer (see my diagram above) sits below specific contexts, so they can register with it, but it shouldn't need to "call up"

yurishkuro · 2019-10-18T03:47:46Z

text/0042-separate-context-propagation.md

+To receive data injected by prior upstream processes, the Propagation API provides a function which takes a context and an HTTP request, and returns context which represents the state of the upstream system.
+
+**ChainHTTPInjector(injector, injector) -> injector**  
+To allow multiple distributed applications to inject their context into the same request, the Propagation API provides a function which takes two injectors, and returns a single injector which calls the two original injectors in order.


this is irrelevant, the application does not need to know if context systems are chained or what not, it only needs to say "I want to inject context". This is a low-level implementation detail of the propagation later.

yurishkuro · 2019-10-18T03:53:14Z

text/0042-separate-context-propagation.md

+
+Baggage values, on the other hand, are explicitly added in order to be accessed by downstream by other application code. Therefore, Baggage Context must be readable, and reliably propagated in-band in order to accomplish this goal.
+
+There may be cases where a key-value pair is propagated as a Correlation for observability and as a Baggage item for application-specific use. AB testing is one example of such use case. This would result in extra overhead, as the same key-value pair would be present in two separate headers.  


I am not clear why we're making this point. Can't we allow telemetry sub-systems to access baggage? The metrics exported can be configured to read AB testing labels from baggage, not just from correlations - seems preferable to transmitting the same data twice.

yurishkuro · 2019-10-18T03:54:56Z

text/0042-separate-context-propagation.md

+
+There may be cases where a key-value pair is propagated as a Correlation for observability and as a Baggage item for application-specific use. AB testing is one example of such use case. This would result in extra overhead, as the same key-value pair would be present in two separate headers.  
+
+Solving this issue is not worth having semantic confusion with dual purpose. However, because all observability functions take the complete context as input – and baggage is not sampled – it may still be possible to use baggage values as labels for observability.


This is not the first reference to sampling, and I find it highly confusing. Context is never sampled. Telemetry context may be dropped due to bandwidth limitations (which is not mentioned here), but that's completely different from sampling.

tsloughter · 2019-11-01T14:52:05Z

The namehopLimit makes it sound like an integer.

Why is it not just a boolean named propagate?

Edit: Nevermind, the hopLimit will be removed completely since only baggage propagates.

tedsuo · 2019-11-16T21:35:48Z

@yurishkuro @Oberon00 I've created a second draft of this proposal, based on all of this great feedback. Please find it here: #66

Proposal to separate context propagation from observability

dff8df9

tedsuo requested review from AloisReitbauer, bogdandrutu, c24t, carlosalberto, iredelmeier, reyang, SergeyKanzhelev, songy23 and yurishkuro as code owners September 10, 2019 01:07

carlosalberto mentioned this pull request Sep 10, 2019

unifying propagator formatters open-telemetry/opentelemetry-python#89

Closed

fbogsany reviewed Sep 10, 2019

View reviewed changes

text/0000-separate-context-propagation.md Outdated Show resolved Hide resolved

fbogsany reviewed Sep 10, 2019

View reviewed changes

text/0000-separate-context-propagation.md Outdated Show resolved Hide resolved

cleanup description for Extract

5ad7d1c

Oberon00 reviewed Sep 10, 2019

View reviewed changes

Oberon00 mentioned this pull request Sep 10, 2019

error handling proposal open-telemetry/opentelemetry-specification#153

Merged

commas

1dc3c7b

Co-Authored-By: Christian Neumüller <christian+github@neumueller.me>

lizthegrey reviewed Sep 10, 2019

View reviewed changes

text/0000-separate-context-propagation.md Outdated Show resolved Hide resolved

reyang reviewed Sep 10, 2019

View reviewed changes

text/0000-separate-context-propagation.md Outdated Show resolved Hide resolved

reyang reviewed Sep 10, 2019

View reviewed changes

text/0000-separate-context-propagation.md Outdated Show resolved Hide resolved

reyang reviewed Sep 10, 2019

View reviewed changes

text/0000-separate-context-propagation.md Outdated Show resolved Hide resolved

reyang reviewed Sep 10, 2019

View reviewed changes

text/0000-separate-context-propagation.md Outdated Show resolved Hide resolved

mwear mentioned this pull request Sep 10, 2019

API: Propagate tracestate open-telemetry/opentelemetry-ruby#73

Closed

tedsuo and others added 4 commits September 10, 2019 13:39

Update text/0000-separate-context-propagation.md

58248e6

Co-Authored-By: Christian Neumüller <christian+github@neumueller.me>

RFC proposal: A layered approach to data formats

68cb0ba

whitespace

3dc6a76

Co-Authored-By: Reiley Yang <reyang@microsoft.com>

Capitalization

459435e

Co-Authored-By: Reiley Yang <reyang@microsoft.com>

tedsuo added this to the Alpha v0.3 milestone Oct 8, 2019

arminru mentioned this pull request Oct 11, 2019

Implement the text format so that w3c trace context headers are propagated. open-telemetry/opentelemetry-java#601

Closed

freeformz mentioned this pull request Oct 15, 2019

Implement W3C Correlation Context propagator open-telemetry/opentelemetry-go#179

Merged

tedsuo and others added 11 commits October 14, 2019 19:33

Clean up motivation

7ea1834

Clean up explanbation intro

7317747

Clarify context types

43ba8fd

Fix ChainHTTPInjector and ChainHTTPExtractor

d7d6f1c

typo

3a817a2

Reference Trace-Context, not just traceparent

3381e0f

Bagge context cleanup

c15a107

stronger language around context access

310e8d5

Update text/0042-separate-context-propagation.md

f59fc27

Co-Authored-By: Christian Neumüller <christian+github@neumueller.me>

clean up tradeoffs

153b9aa

Update text/0042-separate-context-propagation.md

f70855a

Co-Authored-By: Christian Neumüller <christian+github@neumueller.me>

hodgesds reviewed Oct 18, 2019

View reviewed changes

yurishkuro reviewed Oct 18, 2019

View reviewed changes

carlosalberto mentioned this pull request Oct 30, 2019

[Do not merge] Initial prototype for context-prop overhaul. open-telemetry/opentelemetry-java#655

Closed

tsloughter mentioned this pull request Nov 6, 2019

Add dependency on ctx in app file tsloughter/grpcbox#23

Merged

krnowak mentioned this pull request Nov 7, 2019

[WIP] Context propagation open-telemetry/opentelemetry-go#297

Closed

toumorokoshi mentioned this pull request Nov 7, 2019

[WIP] PR to start the discussion on context propagation. open-telemetry/opentelemetry-python#278

Closed

tedsuo mentioned this pull request Nov 15, 2019

Proposal: Separate Layer for Context Propagation #66

Merged

tedsuo closed this Nov 16, 2019

codeboten mentioned this pull request Dec 10, 2019

[WIP] Context Prop open-telemetry/opentelemetry-python#325

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal to separate context propagation from observability #42

Proposal to separate context propagation from observability #42

tedsuo commented Sep 10, 2019

tedsuo commented Sep 10, 2019

Oberon00 Sep 10, 2019

tedsuo Sep 10, 2019

tedsuo Oct 15, 2019

Oberon00 Sep 10, 2019

carlosalberto Sep 10, 2019

tedsuo Sep 10, 2019

carlosalberto commented Sep 10, 2019

reyang Sep 10, 2019

tedsuo Sep 10, 2019

yurishkuro Sep 16, 2019

tedsuo Oct 15, 2019

tedsuo commented Oct 16, 2019

hodgesds Oct 18, 2019

yurishkuro Oct 18, 2019 •

edited

Loading

tedsuo Oct 18, 2019

yurishkuro Oct 18, 2019

yurishkuro Oct 18, 2019

yurishkuro Oct 18, 2019

yurishkuro Oct 18, 2019

tsloughter commented Nov 1, 2019 •

edited

Loading

tedsuo commented Nov 16, 2019


		Because the application and context propagation layers are separated, it is possible to create new distributed applications which do not depend on either the Observability or Baggage APIs.

		GetPropagator(type) inject, extract


		## What about complex propagation behavior?

		Some OpenTelemetry proposals have called for more complex propagation behavior. For example, having a fallback to extracting B3 headersif Trace-Context headers are not found. Chained propagators and other complex behavior can be modeled as implementation details behind the Propagator interface. Therefore, the propagation system itself does not need to provide chained propagators or other additional facilities.

		Some OpenTelemetry proposals have called for more complex propagation behavior. For example, having a fallback to extracting B3 headersif Trace-Context headers are not found. Chained propagators and other complex behavior can be modeled as implementation details behind the Propagator interface. Therefore, the propagation system itself does not need to provide chained propagators or other additional facilities.


		## Did you add a context parameter to every API call because Go has infected your brain?

		OpenTelemetry is separated into an application layer and a context propagation layer. In this architecture, multiple distributed applications - including the observability and baggage systems provided by OpenTelemetry - share the same underlying context propagation system.


		# Application Layer


		Distributed tracing is an example of a cross-cutting concern, which requires non-local, transaction-level context propagation in order to execute correctly. Transaction-level context propagation can also be useful for other cross-cutting concerns, e.g., for security, versioning, and network switching. We refer to these types of cross-cutting concerns as distributed applications.

		OpenTelemetry is separated into an application layer and a context propagation layer. In this architecture, multiple distributed applications - including the observability and baggage systems provided by OpenTelemetry - share the same underlying context propagation system.


		Baggage values, on the other hand, are explicitly added in order to be accessed by downstream by other application code. Therefore, Baggage Context must be readable, and reliably propagated in-band in order to accomplish this goal.

		There may be cases where a key-value pair is propagated as a Correlation for observability and as a Baggage item for application-specific use. AB testing is one example of such use case. This would result in extra overhead, as the same key-value pair would be present in two separate headers.


		There may be cases where a key-value pair is propagated as a Correlation for observability and as a Baggage item for application-specific use. AB testing is one example of such use case. This would result in extra overhead, as the same key-value pair would be present in two separate headers.

		Solving this issue is not worth having semantic confusion with dual purpose. However, because all observability functions take the complete context as input – and baggage is not sampled – it may still be possible to use baggage values as labels for observability.

Proposal to separate context propagation from observability #42

Proposal to separate context propagation from observability #42

Conversation

tedsuo commented Sep 10, 2019

tedsuo commented Sep 10, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

carlosalberto commented Sep 10, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tedsuo commented Oct 16, 2019

Named Tracers and Metrics

Active Span / Context Switching

Choose a reason for hiding this comment

yurishkuro Oct 18, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tsloughter commented Nov 1, 2019 • edited Loading

tedsuo commented Nov 16, 2019

yurishkuro Oct 18, 2019 •

edited

Loading

tsloughter commented Nov 1, 2019 •

edited

Loading