Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Composite Sampler #1844

Open
iNikem opened this issue Aug 3, 2021 · 7 comments
Open

Composite Sampler #1844

iNikem opened this issue Aug 3, 2021 · 7 comments
Assignees
Labels
area:sdk Related to the SDK help wanted Extra attention is needed spec:trace Related to the specification/trace directory

Comments

@iNikem
Copy link
Contributor

iNikem commented Aug 3, 2021

Let's imagine for a second that #1597 is resolved and we have a sampler that drops server spans based on the url. It is safe to assume that for any particular configured Otel SDK we still will need one more sampler, to decide the fate of all other spans. This is already a second example of "composite samplers", the first being existing ParentBased sampler.

This raises several questions:

  1. Should we formalise a notion of composite samplers in the spec?
  2. If yes, should we specify rules of composition? Delegation or list of samplers? Should they be combined as logical "and" or "or"? Short-circuit?
  3. Should env variable OTEL_TRACES_SAMPLER be expanded to allow for configuration of sampler composition?
@iNikem iNikem added the spec:trace Related to the specification/trace directory label Aug 3, 2021
@iNikem
Copy link
Contributor Author

iNikem commented Aug 3, 2021

/cc @carlosalberto

@jmacd
Copy link
Contributor

jmacd commented Aug 5, 2021

I feel this is connected with the topic of "View configuration" in the metrics SDK:
https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/sdk.md#view

@iNikem
Copy link
Contributor Author

iNikem commented Aug 6, 2021

I feel this is connected with the topic of "View configuration" in the metrics SDK:
https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/sdk.md#view

I probably know too little about metrics, but I cannot see any connection between Sampler and View :(

@jmacd
Copy link
Contributor

jmacd commented Sep 15, 2021

People use "Sampler" to describe a standard mechanism for selecting which spans will be reported. People use "View" to describe a standard mechanism for selecting which metrics will be reported.
These usages are nearly identical.

@jmacd
Copy link
Contributor

jmacd commented Sep 15, 2021

After reading @iNikem's comment on open-telemetry/oteps#175, I will take a stronger position here.

The term "non-probability sampler" is an oxymoron. Outside of OpenTelemetry, "Sampling" refers to a statistical process used to estimate something, and if there aren't probabilities involved, what you have may be called "Subjective analysis".

We are stuck with the term "Sampling" in OTel being used to mean both filtering and excluding spans from the population (what I call "View construction") and also the statistical process of counting the non-excluded non-filtered spans (what I call "Probability sampling", a form of span aggregation). OTEPs 170 and 168 are trying to finish this second aspect of what "Sampling" means, and you are asking to refine the first aspect of what Sampling means. In that sense, your comment open-telemetry/oteps#175 (comment) makes sense. It's confusing that the document talks about composition but doesn't answer to the need for View construction.

@iNikem asked above:

logical "and" or "or"? Short-circuit?

OTEP 170 specifies how to compose Samplers using logical-OR, which is useful in the sense that it allows composition of a probability sampler with a non-probability sampler (i.e., probability sampling plus subjective analysis). The concept of "zero adjusted count" is critical to understanding how we compose probability samplers and non-probability samplers.

Combining probability sampling with AND- and short-circuiting goes against the goals of probability sampling. Any span that is excluded before a probability sampler sees it is effectively uncounted, so placing filters in front of a Sampler distorts the result. I'm not convinced there's a real need to specify AND- and short-circuit types of composition, that doesn't seem like a natural way to configure what (IMO) users want to configure.

The form of sampler configuration we've seen from e.g., Jaeger remote sampling and AWS xRay look like a single Sampler with a list of rules that are internally followed, with short-circuiting. In pseudo-code you have a probability lookup like:

if ignoreSomeSpans(span) {
  return alwaysOff // These spans are UNCOUNTED, as if they never existed.
} else if firstConditionMatch(span) {
  return firstSamplingProbability
} else if secondConditionMatch(span) {
  return secondSamplingProbability
} else {
  return defaultSamplingProbability
}

My understanding is that users would like a way to configure this sort of logic, but it won't be through composition of multiple samplers, it will be through configuring one sophisticated sampler. The reason we talk about composing a probability and non-probability sampler is that we are aware of adaptive schemes that combine the two (open-telemetry/oteps#168 (comment)). Please treat the review of open-telemetry/oteps#168 and open-telemetry/oteps#175 as an independent topic from more general forms of View construction for spans.

It's possible in the distant future that OTel will evolve support for computing multiple samples in parallel with multiple independent head and tail probabilities. That will offer meaningfully more ways to sample and track down rare conditions in large-scale systems, but I don't think users will be asking for that soon. Besides, all of this discussion points to a larger problem with configuration in general; for now we do not expect to configure sophisticated samplers through the environment, we expect to use "jaeger_remote" or similar to select a module of code that performs the above logic and returns one of the built-in samplers. Please approve the two OTEPs so that we can proceed with necessary work on the built-in samplers.

@iNikem
Copy link
Contributor Author

iNikem commented Sep 18, 2021

Thank you @jmacd for this explanation, I now understand current landscape much better.

I can agree with your suggestion that non-probabilistic samplers don't need to be composed and my problem of span "filtering" can be solved with one exhaustive Sampler. This approach will resolve this current issue. @anuraaga, taking this into account, should we change https://github.com/open-telemetry/opentelemetry-java-contrib/blob/53e8a363afe1d5ec6cd7afdf4285e4bf8dbc63b3/contrib-samplers/src/main/java/io/opentelemetry/contrib/samplers/RuleBasedRoutingSampler.java#L38 to accept fallback decision instead of fallback sampler?

@jmacd Do you think that we need to introduce a new entity, SpanFilter, to separate "View construction" from proper sampling?

@jmacd
Copy link
Contributor

jmacd commented Jan 12, 2022

#2047 specifies the rules for composing samplers that I believe we need. However, we discovered a problem in the way the sampler APIs interact, discussed in #2179.

I would like to see #2047 merge, after that point I believe we can implement a composite sampler based on the rules there. Please review and approve #2047 -- we are blocked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:sdk Related to the SDK help wanted Extra attention is needed spec:trace Related to the specification/trace directory
Projects
None yet
Development

No branches or pull requests

4 participants