Add Suppress Tracing context key #1653

dyladan · 2021-04-27T16:39:02Z

Fixes #530

As discussed in the specification SIG today, this adds a predefined context key for suppressing instrumentation.

specification/context/context.md

carlosalberto · 2021-04-28T14:31:58Z

@open-telemetry/specs-approvers Take a look at this please.

specification/context/context.md

yurishkuro

I am -1 on this change in the current form, it feels rushed to me (lacks extensibility), without a use case clearly documented

Oberon00 · 2021-04-28T15:02:45Z

@yurishkuro

it feels rushed to me

The issue and general solution design here are quite old, see #530 (@dyladan, probably you should add "Fixes #530" to the PR description)

yurishkuro · 2021-04-28T15:08:10Z

The issue and general solution design here are quite old, see #530

#530 is mostly about stopping tracing, not stopping all instrumentation. This change would make a lot more sense to me if it was about stopping tracing only, but even then a single boolean is a crude design that cannot be extended further.

dyladan · 2021-04-28T15:44:16Z

The issue and general solution design here are quite old, see #530

#530 is mostly about stopping tracing, not stopping all instrumentation. This change would make a lot more sense to me if it was about stopping tracing only, but even then a single boolean is a crude design that cannot be extended further.

I don't understand the need to be able to extend this key in the future. The whole point of having context with opaque keys was to be able to create keys that mean different things. We can make one for tracing and in the future if we want another signal disabled (or all signals) we can just create a new key.

…ontrib/opentelemetry-specification into suppress-instrumentation

dyladan · 2021-04-28T15:57:31Z

@yurishkuro modified to only suppress tracing by creating non-recording spans and preventing injection. Extract and metrics removed from the wording.

blumamir · 2021-04-28T16:30:43Z

I want to add that this mechanism is being used by me in few instrumentation libraries and proven to be very handy.

It is useful to reduce the visual and performance overhead of collecting non-interesting underlying implementation details of high-level libraries such as:

the HTTP rpc part of aws-sdk requests.
the HTTP part of elastic-search requests.
the database driver spans where an ORM is being used (suppress mongodb driver spans for operations generated by mongoose, suppress mysql driver spans for operations generated from typeorm, etc).

You can read more about it here

iNikem · 2021-04-28T17:02:02Z

I probably miss something fundamental here, but why #530 cannot be solved by using Samplers? Especially point 1, of health checks and such.

yurishkuro · 2021-04-28T17:41:29Z

I probably miss something fundamental here, but why #530 cannot be solved by using Samplers? Especially point 1, of health checks and such.

+1

I don't understand the need to be able to extend this key in the future. The whole point of having context with opaque keys was to be able to create keys that mean different things. We can make one for tracing and in the future if we want another signal disabled (or all signals) we can just create a new key.

My problem is not with the opaque key, but with growing the surface of the API. For comparison, the OpenTracing API had 6 methods in total: inject, extract, start, finish, setTag, addLog. Here we're proposing two new methods to the public API surface to account for a relatively obscure use case. And we're not even making it extensible so that the cognitive overhead of wider API surface could be amortized for more use cases in the future (back to my original #1653 (comment)).

dyladan · 2021-04-28T18:07:22Z

I probably miss something fundamental here, but why #530 cannot be solved by using Samplers? Especially point 1, of health checks and such.

haha i think i might also be missing something because I can't see how samplers could fix this issue in an automatic way without using context. how does the exporter/spanprocessor signal to the sampler that this is an export span that shouldn't be sampled?

dyladan · 2021-04-28T18:07:29Z

sorry for the accidental close and reopen while trying to comment

My problem is not with the opaque key, but with growing the surface of the API. For comparison, the OpenTracing API had 6 methods in total: inject, extract, start, finish, setTag, addLog.

This is not really a new API, simply leveraging an existing mechanism. Even if it is a new API, I don't think we can use 'growing API' as a counterargument because then no new APIs will ever be added.

Here we're proposing two new methods to the public API surface to account for a relatively obscure use case.

i'm not sure I agree that the use case is obscure. suppressing tracing is used by at least every exporter that uses an instrumented transmission channel like http. additionally, @blumamir already cited several examples where he found this useful.

And we're not even making it extensible so that the cognitive overhead of wider API surface could be amortized for more use cases in the future (back to my original #1653 (comment)).

I must not be understanding what you mean by extensibility. To me it seems like it would be easy to add another key to the list of predefined keys in the future. If you're asking for this one key to be able to disable multiple signals in a configurable way, I would argue that is more confusing than just having multiple keys.

blumamir · 2021-04-29T07:43:35Z

I am not sure that suppressing underlying spans is a right thing to do. E.g. take a look at #1360. I am pretty sure that in most cases we still want information from the underlying transport libraries

In my opinion, the transport spans can be interesting to see in some cases (debugging low-level issues, performance issues, etc), but most of the time they just add noise to the trace and distract the attention from the logical operation. Allowing the user to turn this instrumentation feature on and off via config option can be very handy to support a wide range of users with different needs.

Before this mechanism was introduced, we had to "clean" the trace in the backend which is possible but more tedious.

owais · 2021-04-29T18:07:02Z

specification/trace/sdk.md

@@ -183,6 +183,8 @@ When asked to create a Span, the SDK MUST act as if doing the following in order
   A non-recording span MAY be implemented using the same mechanism as when a
   `Span` is created without an SDK installed or as described in
   [wrapping a SpanContext in a Span](api.md#wrapping-a-spancontext-in-a-span).
+   If the [`SuppressTracing`](./api.md#suppress-tracing) `Context` flag is set,
+   the newly created `Span` should be a non-recording `Span`.


Can there be a situation where it might be desired to suppress tracing but then re-enable it later in the same trace? For example a chain of 5 middlewares (all instrumented by a common instrumentation) where a user suppresses tracing in middleware 2 and then enable it again in middleware 4. If we generate non-recording spans, the spans from middleware 4 and 5 will not will not be children of the span generated by middleware 2. Not recording any spans when tracing is suppressed instead would allow this case to generate correct traces.

This could also cause issues with context propagation. If tracing is suppressed on a code path that end up making a network call and there is user/instrumentation code that inject trace context into the outgoing call, it'll end up either not injecting context or injecting non-recording span's span ID depending on the implementation of non-recording span.

May be there are other such cases.

If we were to actually suppress tracing, i.e, not record any spans at all instead of non-recording spans, both of these cases would work. Neither solution is perfect but not recording anything would probably result in traces that are "less broken".

Can there be a situation where it might be desired to suppress tracing but then re-enable it later in the same trace?

If we generate non-recording spans, the spans from middleware 4 and 5 will not will not be children of the span generated by middleware 2. Not recording any spans when tracing is suppressed instead would allow this case to generate correct traces

Sure I think that's a valid usecase. It's definitely not solved by this though for the reasons you stated. It's not easy to solve either because this puts the burden on the caller to not create spans. If an instrumentation doesn't check the context key before calling startSpan, then it can't be suppressed.

This could also cause issues with context propagation. If tracing is suppressed on a code path that end up making a network call and there is user/instrumentation code that inject trace context into the outgoing call, it'll end up either not injecting context or injecting non-recording span's span ID depending on the implementation of non-recording span.

Yes, it was my intention to suppress injection. For the usecases described, I think this is the desired behavior.

If we were to actually suppress tracing, i.e, not record any spans at all instead of non-recording spans, both of these cases would work. Neither solution is perfect but not recording anything would probably result in traces that are "less broken".

The spec states startSpan MUST return something implementing the Span interface, so I don't really see any option other than returning a non-recording span. Open to ideas if you have them.

I wonder if context API could act like a stack in this case. Suppressing instrumentation could store a reference to the active span and undoing it later would re-activate that span again. That'd solve both the cases but it probably bring up more issues and complicates implementation too much.

context API already works like a stack since context is immutable, you need a new copy for nested scope

Sure I think that's a valid usecase. It's definitely not solved by this though for the reasons you stated. It's not easy to solve either because this puts the burden on the caller to not create spans. If an instrumentation doesn't check the context key before calling startSpan, then it can't be suppressed.

True but if we make it part of the spec or semantic conventions then it's very likely they that they will.

True but if we make it part of the spec or semantic conventions then it's very likely they that they will.

In my experience this is not true. Instrumentations created by SIGs will, but the community is very unlikely to read the spec that closely.

weyert · 2021-04-30T10:52:13Z

Interesting issue, this might help solving my long standing issue at work were a lot of spans are created for RPC and HTTP by instrumentation libraries spans while I don't really care about those in some situations, when calling a specific function/action but do care about in all other cases.

For example, I am using Datastore which uses RPC for it calls means I am having spans for dns (for the host), RPC connect, my database client span, my database sql builder span, etc. In this case I want to suppress the RPC connect, NET connect, DNS connect spans but only in one case were I doing many calls to Datastore or when I am calling many underlaying REST services in one class/function but still have the behaviour everywhere else.

E.g.

  RequestController()
       handleGetRequest()
            - fetchData1()
            - triggerFetch()
                   - getDataFromOtherService()
                             - getDataFromOtherService2()
                                 // SUPPRESS SPAN ALL RPC/NET INSTRUMENTATION
                                  - processDataFromService2()
                                               - moreProcessDataFromService2()
                                // UNSUPPRESS SPAN
                   - getDataFromOtherService3()

This seems to be less trivial than I initally thought.

Oberon00 · 2021-04-30T11:19:11Z

After reading all the feedback on samplers, I think this could indeed also work: We provide, in the SDK, a Sampler SuppressingSampler(innerSampler) that checks the context key and a corresponding SDK method to set the flag on the context. Then we increase only the SDK's API surface but not the API's API surface. The major disadvantage though is that performance will be worse because initial span attributes will still be collected but that is #620.

EDIT: The other difference with the sampler solution would be that the current PR suggests not to inject at all if sampling is suppressed while with the current sampling decisions we would inject a span context, just with sampled flag set to zero. I created #1663 because I think this is an idea worth thinking about.

Oberon00 · 2021-04-30T11:48:42Z

@weyert

Interesting issue, this might help solving my long standing issue at work were a lot of spans are created for RPC and HTTP by instrumentation libraries spans

That use case might be better solved by providing samplers access to a readable parent span (I think this already works in some languages with some casting, since the sampler already gets the full parent context)

cijothomas · 2021-04-30T15:58:08Z

We provide, in the SDK, a Sampler SuppressingSampler(innerSampler) that checks the context key and a corresponding SDK method to set the flag on the context. Then we increase only the SDK's API surface but not the API's API surface.

If the method to set the flag on the context is in the SDK, then this cannot be leveraged by Instrumentations to suppress underlying, right? (unless instrumentation takes SDK dependency.)

dyladan · 2021-04-30T18:43:48Z

I would like to avoid instrumentations taking on the SDK as a dependency since this is essentially the primary reason for the strict API/SDK split. If a library author builds opentelemetry into their library they should only have to depend on the API

pauldraper · 2021-05-02T13:43:07Z

Here we're proposing two new methods to the public API surface to account for a relatively obscure use case

To quote #530

There are circumstances for instrumented actions where:

The action is frequent and of low interest: a healthcheck, polling a message queue, etc.

An OpenTracing exporter uses libraries that themselves may be instrumented (risking infinite tracing).

If the current layer (e.g. RPC) happens to offer sufficiently detailed tracing, lower HTTP/DNS/TCP/UDP layers do not need to be traced when invoked with this library. This need is heighten by the use of server/client spans, which AFAIK needs to remove spans in between them. (SpanKind with layers #526).

(1) has come up in #173

(2) has been a problem in opentelemetry-js (open-telemetry/opentelemetry-js#332) HTTP-based exporters, since the Node.js stdlib is instrumented globally. The solution in that has was adding a special HTTP header x-opentelemetry-outgoing-request headers, that the http instrumentation ignores. In essence it uses HTTP headers as a poor-man's context API. (This may not scale to other APIs.)

Please address alternative solutions to these issues.

The problem is so non-obscure and non-rushed that Python and Ruby implementations independently created this solution over a year ago.

dyladan · 2021-05-03T15:54:27Z

@pauldraper Are you asking me to address alternatives?

pauldraper · 2021-05-04T07:27:48Z

Sorry, @yurishkuro

dyladan · 2021-05-04T16:03:51Z

Discussed this in the spec SIG today. This PR boils down to 3 use-cases

Prevent infinite loop with exporters
Allow instrumentations to suppress lower level instrumentations
Suppress frequent noise like polling and health checks

Prevent Exporter Loops

This can be solved in the SDK only with no API change and would not require this API to exist.

Instrumentations Suppress Lower Levels

This is a special case of a larger question: how should we handle cases where an instrumentation wants to be the "final" span? This PR suggests that they be suppressed and not traced, but there may be other solutions to this problem.

Suppress Noisy Spans

In some ways this is an extension of (2), but was not my original target in this PR and is possibly better solved with other mechanisms like smarter sampling.

Takeaway

For now, the specification SIG suggested that JS SIG should move this functionality out of the API into some sort of API extensions package. Instrumentations can rely on this package which our SDK will respect, but third party SDKs will be under no official obligation to respect.

I am going to leave this open because I still believe the mechanism has value and I have not yet seen a proposal for solving (2) that is obviously better than this. The linked issue is currently one of our oldest open issues and having multiple languages already having implemented this on the SDK level seems to me to show the value in this mechanism.

pauldraper · 2021-05-05T16:07:52Z

This can be solved in the SDK only with no API change and would not require this API to exist.

How?

Example situation:

gRPC module uses the http module.

The gRPC module is instrumented. The http module is instrumented.

The gRPC instrumentation creates a client span. The http module creates a client span.

Client spans are now doubled up which AFAIK is a problem for systems with two-sided spans like Zipkin.

To prevent doubled-up client spans, the gRPC instrumentation (which uses only the API) needs a way to tell the http instrumentation (which uses only the API) to not trace.

This is a very common situation.

Am I missing something?

jkwatson · 2021-05-05T16:20:00Z

This can be solved in the SDK only with no API change and would not require this API to exist.

How?

Example situation:

gRPC module uses the http module.

The gRPC module is instrumented. The http module is instrumented.

The gRPC instrumentation creates a client span. The http module creates a client span.

Client spans are now doubled up which AFAIK is a problem for systems with two-sided spans like Zipkin.

To prevent doubled-up client spans, the gRPC instrumentation (which uses only the API) needs a way to tell the http instrumentation (which uses only the API) to not trace.

This is a very common situation.

Am I missing something?

Your example isn't an exporter loop, which is where you snipped your comment from.

Oberon00 · 2021-05-05T16:20:47Z

@pauldraper

To prevent doubled-up client spans, the gRPC instrumentation (which uses only the API) needs a way to tell the http instrumentation (which uses only the API) to not trace.

This is only one possible solution. Another theoretically (not currently) possible solution is that the SDK user / application owner configures the SDK to suppress any local child spans of spans with type=CLIENT and having an rpc.system=grpc attribute. Or any CLIENT spans that are children of another CLIENT should be suppressed.

IMHO, having the grpc instrumentation decide whether or not the child spans should be suppressed is wrong. The HTTP child spans are potentially useful, e.g., if you have a specific HTTP-related issue you want to debug.

pauldraper · 2021-05-05T18:59:57Z

Your example isn't an exporter loop, which is where you snipped your comment from.

Ah, yes. My bad. Regardless, I would like to understand the group's answer to that.

IMHO, having the grpc instrumentation decide whether or not the child spans should be suppressed is wrong. The HTTP child spans are potentially useful, e.g., if you have a specific HTTP-related issue you want to debug.

FWIW I agree, I think client/server spans are terrible. Zipkin two-sided spans are terrible. Protocols are inherently nested and Otel seems to ignore that (#526).

But, under the current Opentelemetry approach, it seems that double-client spans are Bad, yet there is no good solution to prevent them.

github-actions · 2021-05-13T03:42:20Z

This PR was marked stale due to lack of activity. It will be closed in 7 days.

carlosalberto · 2021-05-17T14:16:35Z

@dyladan As the related issue will still exists, are you ok with closing this PR?

dyladan · 2021-05-17T17:23:43Z

I don't mind if we close this issue as long as we eventually solve this use-case. The issue has been around for over a year with no movement

carlosalberto · 2021-05-17T17:26:52Z

Oh definitely, let's keep #530 open until we finally solve it for all SIGs in an uniform way. Thanks!

dyladan requested review from a team April 27, 2021 16:39

github-actions bot assigned carlosalberto Apr 27, 2021

dyladan force-pushed the suppress-instrumentation branch 2 times, most recently from 4a162f3 to 4f3c671 Compare April 27, 2021 16:44

Add Suppress Instrumentation context key

53ef6f2

dyladan force-pushed the suppress-instrumentation branch from 4f3c671 to 53ef6f2 Compare April 27, 2021 16:46

Merge branch 'main' into suppress-instrumentation

5b47357

yurishkuro reviewed Apr 27, 2021

View reviewed changes

specification/context/context.md Outdated Show resolved Hide resolved

dyladan mentioned this pull request Apr 27, 2021

Use Context to stop tracing #530

Open

yurishkuro reviewed Apr 27, 2021

View reviewed changes

specification/context/context.md Outdated Show resolved Hide resolved

Oberon00 previously approved these changes Apr 28, 2021

View reviewed changes

specification/context/context.md Outdated Show resolved Hide resolved

specification/context/context.md Outdated Show resolved Hide resolved

yurishkuro previously requested changes Apr 28, 2021

View reviewed changes

dyladan added 3 commits April 28, 2021 11:46

Only suppress tracing

bc99c8b

Merge branch 'suppress-instrumentation' of github.com:dynatrace-oss-c…

5855f06

…ontrib/opentelemetry-specification into suppress-instrumentation

Review comment

2b5e273

dyladan changed the title ~~Add Suppress Instrumentation context key~~ Add Suppress Tracing context key Apr 28, 2021

Update changelog

f29ae83

dyladan closed this Apr 28, 2021

dyladan reopened this Apr 28, 2021

Merge branch 'main' into suppress-instrumentation

21b1fa7

owais reviewed Apr 29, 2021

View reviewed changes

Oberon00 mentioned this pull request Apr 30, 2021

Sampling: Introduce decision/flag to not propagate a span. #1663

Open

Merge branch 'main' into suppress-instrumentation

8306351

dyladan mentioned this pull request May 4, 2021

Remove suppressInstrumentation open-telemetry/opentelemetry-js-api#59

Closed

github-actions bot added Stale and removed Stale labels May 13, 2021

carlosalberto closed this May 17, 2021

ocelotl mentioned this pull request May 26, 2021

Add create context key to contrib open-telemetry/opentelemetry-python-contrib#502

Merged

lmolkova mentioned this pull request Jun 23, 2021

Client libraries: Manual/programmatic and auto instrumentation duplication problem #1767

Open

tsloughter mentioned this pull request Jan 13, 2023

add SuppressTracing flag #3103

Closed

Add Suppress Tracing context key #1653

Add Suppress Tracing context key #1653

Conversation

dyladan commented Apr 27, 2021 • edited Loading

carlosalberto commented Apr 28, 2021

yurishkuro left a comment

Choose a reason for hiding this comment

Oberon00 commented Apr 28, 2021

yurishkuro commented Apr 28, 2021

dyladan commented Apr 28, 2021

dyladan commented Apr 28, 2021

blumamir commented Apr 28, 2021

iNikem commented Apr 28, 2021

yurishkuro commented Apr 28, 2021

dyladan commented Apr 28, 2021

dyladan commented Apr 28, 2021 • edited Loading

blumamir commented Apr 29, 2021

owais Apr 29, 2021

Choose a reason for hiding this comment

dyladan Apr 29, 2021

Choose a reason for hiding this comment

owais Apr 29, 2021

Choose a reason for hiding this comment

yurishkuro Apr 29, 2021

Choose a reason for hiding this comment

owais Apr 30, 2021 • edited Loading

Choose a reason for hiding this comment

dyladan Apr 30, 2021

Choose a reason for hiding this comment

weyert commented Apr 30, 2021 • edited Loading

Oberon00 commented Apr 30, 2021 • edited Loading

Oberon00 commented Apr 30, 2021 • edited Loading

cijothomas commented Apr 30, 2021

dyladan commented Apr 30, 2021

pauldraper commented May 2, 2021 • edited Loading

dyladan commented May 3, 2021

pauldraper commented May 4, 2021

dyladan commented May 4, 2021

Prevent Exporter Loops

Instrumentations Suppress Lower Levels

Suppress Noisy Spans

Takeaway

pauldraper commented May 5, 2021 • edited Loading

jkwatson commented May 5, 2021

Oberon00 commented May 5, 2021 • edited Loading

pauldraper commented May 5, 2021 • edited Loading

github-actions bot commented May 13, 2021

carlosalberto commented May 17, 2021

dyladan commented May 17, 2021

carlosalberto commented May 17, 2021

dyladan commented Apr 27, 2021 •

edited

Loading

dyladan commented Apr 28, 2021 •

edited

Loading

owais Apr 30, 2021 •

edited

Loading

weyert commented Apr 30, 2021 •

edited

Loading

Oberon00 commented Apr 30, 2021 •

edited

Loading

Oberon00 commented Apr 30, 2021 •

edited

Loading

pauldraper commented May 2, 2021 •

edited

Loading

pauldraper commented May 5, 2021 •

edited

Loading

Oberon00 commented May 5, 2021 •

edited

Loading

pauldraper commented May 5, 2021 •

edited

Loading