Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sampling based on URL #1597

Closed
pavolloffay opened this issue Apr 6, 2021 · 7 comments
Closed

Sampling based on URL #1597

pavolloffay opened this issue Apr 6, 2021 · 7 comments
Labels
area:sampling Related to trace sampling area:semantic-conventions Related to semantic conventions spec:trace Related to the specification/trace directory

Comments

@pavolloffay
Copy link
Member

What are you trying to achieve?

The Hypertrace agents collect all request and response data (headers and payloads). This data is not needed for every request, hence I would like to implement a sampler that would make a decision based on the URL/endpoint pattern when this data should be collected (e.g. one in 100 requests). The decision should be propagated to downstream services (e.g. via tracestate) to make the decision coherent across the transaction.

The custom sampler would make a decision whether to collect payload and headers, the span sampling would still go through the normal sampling process (e.g. my custom sampler would delegate to it).

To fully implement this use-case these parts are needed:

  • a custom sampler that has access to the URL
  • do not add payload and headers if the span is not sampled - instrumentation can check the tracestate (supported by the API) or drop the header and payload attributes in SpanProcessor#onEnd (not supported by the API).

What is missing in the spec to solve my use-case

The sampler does not have access to the URL. The sampler interface (at least in Java) accepts attributes, however the instrumentation (e.g. servlet in otel-java-instrumentation) is not passing URL attribute there nor spec mandates it to do it.

@pavolloffay pavolloffay added the spec:trace Related to the specification/trace directory label Apr 6, 2021
@Oberon00
Copy link
Member

Oberon00 commented Apr 6, 2021

I think this was fixed recently in Java: open-telemetry/opentelemetry-java-instrumentation#388.

The spec currently says in https://github.com/open-telemetry/opentelemetry-specification/blob/v1.1.0/specification/trace/api.md#span-creation:

Whenever possible, users SHOULD set any already known attributes at span creation instead of calling SetAttribute later.

which would include HTTP headers too (makes sense, what if you have an "X-My-Company-Loadtest" header?). Of course this means that sampling will not prevent initial attribute collection, for this problem see #620.

@pavolloffay
Copy link
Member Author

pavolloffay commented Apr 6, 2021

Thanks @Oberon00,

could we extend the spec and explicitly mention that the URL should be present at the sampling time (if it is known)?

@Oberon00 Oberon00 added area:sampling Related to trace sampling area:api Cross language API specification issue labels Apr 6, 2021
@reyang reyang added area:semantic-conventions Related to semantic conventions and removed area:api Cross language API specification issue labels Apr 9, 2021
@reyang
Copy link
Member

reyang commented Apr 9, 2021

The spec does leave room for the sampler to access attributes that are provided during span creation (not after the span creation), so it is not an API issue.
One possible approach is to encourage or enforce the URL to be provided as the attribute that is available at span creation, as part of the semantic convention.

@Oberon00
Copy link
Member

The quoted spec already says that:

Whenever possible, users SHOULD set any already known attributes at span creation instead of calling SetAttribute later.

Of course this also applies to the URL. If we need to discriminate attributes that are more or less important to sampling (original semantic conventions generator PR had a sampling_relevant: bool attribute), we need to somehow update this statement, otherwise saying that URL is required before StartSpan is redundant.

@cep21
Copy link

cep21 commented Feb 18, 2023

There hasn't been much recent activity here. Is there a reasonable workaround that people agree on: specifically I'm asking for the go sdk. Should I ask there instead?

@bmxpiku
Copy link

bmxpiku commented Apr 7, 2023

Well I would ask the same for js/node sdk, does anyone figured out anything worth sharing?

@austinlparker
Copy link
Member

We have determined that this issue has been completed due to updates in the semantic conventions that require URL to be present at span creation for consumption by samplers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:sampling Related to trace sampling area:semantic-conventions Related to semantic conventions spec:trace Related to the specification/trace directory
Projects
None yet
Development

No branches or pull requests

6 participants