Skip to content
This repository has been archived by the owner on Dec 6, 2024. It is now read-only.

A Sampling Configuration proposal #240

Closed
wants to merge 13 commits into from
213 changes: 213 additions & 0 deletions text/0240-Sampling_Configuration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,213 @@
# Sampling Configuration
PeterF778 marked this conversation as resolved.
Show resolved Hide resolved

An attempt to provide a framework for defining sampling configurations.

Calling this proposal half-baked would be too generous. At this time it just a vision, with many questions unanswered and without any working prototype. Its purpose is to start a discussion on the needs and wants the Open Telemetry community might have on the subject of sampling configuration and a possible way to accomplish them.

The focus is on head-based sampling, but a similar approach might be used for later sampling stages as well. Most of the technical details presented here assume Java as the platform, but should be general enough to have corresponding concepts and solutions available for other platformstoo.

## Motivation

The need for sampling configuration has been explicitly or implicitly indicated in several discussions, some of them going back a number of years, see for example
- issue [173](https://github.com/open-telemetry/opentelemetry-specification/issues/173): Way to ignore healthcheck traces when using automatic tracer across all languages?
- issue [679](https://github.com/open-telemetry/opentelemetry-specification/issues/679): Configuring a sampler from an environment variable
- issue [1060](https://github.com/open-telemetry/opentelemetry-java-instrumentation/issues/1060): Exclude URLs from Tracing
- issue [1844](https://github.com/open-telemetry/opentelemetry-specification/issues/1844): Composite Sampler
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This issue is very relevant to my comment here. If we had a spec'd component for a composable rule bases sampler, we could define its configuration across languages in opentelemetry-configuration, and allow users to specify complex rule-based sampling configuration in a YAML syntax with file configuration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the way this OTEP enables us to experiment with sampler configuration generally, because (as I think other commenters, including @spencerwilson have shown) this is complicated. After we have integrated the new configuration model, with the new tracestate information in #235 and demonstrated prototypes compatible with head sampling, tail sampling, and Jaeger (for example), then I think we can work on specifying a standard rule-based sampler configuration for OTel.

- issue [2085](https://github.com/open-telemetry/opentelemetry-specification/issues/2085): Remote controlled sampling rules using attributes
- issue [3205](https://github.com/open-telemetry/opentelemetry-specification/issues/3205): Equivalent of "logLevel" for spans
- discussion [3725](https://github.com/open-telemetry/opentelemetry-specification/discussions/3725): Overriding Decisions of the Built-in Trace Samplers via a "sample.priority" Like Span Attribute


A number of custom samplers are already available as indpendent contributions
([RuleBasedRoutingSampler](https://github.com/open-telemetry/opentelemetry-java-contrib/blob/main/samplers/src/main/java/io/opentelemetry/contrib/sampler/RuleBasedRoutingSampler.java),
[LinksBasedSampler](https://github.com/open-telemetry/opentelemetry-java-contrib/blob/main/samplers/src/main/java/io/opentelemetry/contrib/sampler/LinksBasedSampler.java),
and, of course, the latest and greatest [Consistent Probability Sampling](https://github.com/open-telemetry/opentelemetry-java-contrib/tree/main/consistent-sampling)) or just as ideas. They all share the same pain point, which is lack of easy to use configuration mechanism.

Even when the code for these samplers is available, their use is not very simple. In case of Java, they require writing a custom agent [extension](https://opentelemetry.io/docs/instrumentation/java/automatic/extensions/). This can become a hurdle, especially for the Open Telemetry users with no hands-on coding experience.

## The Goal

The goal of this proposal is to create an open-ended configuration schema which supports not only the current set of SDK standard samplers, but also non-standard ones, and even samplers that will be added in the future. Furthermore the samplers should be composable together, as it might be required by the users.

In contrast, the existing configuration schemas, such as [Jaeger sampling](https://www.jaegertracing.io/docs/1.50/sampling/#file-based-sampling-configuration), or Agent [OTEP 225 - Configuration proposal](https://github.com/open-telemetry/oteps/pull/225) address sampling configuration with a limited known set of samplers only.
jmacd marked this conversation as resolved.
Show resolved Hide resolved

## Use cases

- I want to use one of the samplers from the `opentelemetry-java-contrib` repository, but I do not want to build my own agent extension. I prefer to download one or more jarfiles containing the samplers and configure their use without writing any additional code.
- I want to apply a sampling strategy that combines different samplers depending on the span attributes, such as the URL of the incoming request, and I expect to update the configuration frequently, so I prefer that it is file-based (rather than hardcoded), and better yet, applied dynamically
- I want to write my own sampler with some unique logic, but I want to focus entirely on the sampling algorithm and avoid writing any boilerplate code for instantiating, configuring, and wrapping it up as an agent extension

## The basics

It is assumed that the sampling configuration will be a YAML document, in most cases available as a file. Remote configuration remains an option, as well as dynamic changes to the configuration.

The configuration file will contain a definition of the sampler to use:

```yaml
---
sampler:
SAMPLER
```
Additional information could be placed there as well. For example, for Java, there could be a location of a jarfile containing any non-standard samplers which have been configured.

A SAMPLER is described by a structure with two fields:
```yaml
samplerType: TYPE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than a scalar value, it may be valuable to support this property being a map type, acting as a tagged union: https://jsontypedef.com/docs/jtd-in-5-minutes/#discriminator-schemas

Parseability and extensibility can be greatly enhanced by tagged unions. If everything's a string, then you must invent some kind of URI-type format to refer to things.

parameters: # an optional list of parameters for the sampler
- PARAM1
- PARAM2
...
- PARAMn
```
The mechanism to map the sampler TYPE to the sampler implementation will be platform specific. For Java, the TYPE can be a simple class name, or a part of class name, and the implementing class can be found using reflection. Specifying a fully qualified class name should be an option.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

semantics of TYPE are platform specific

It's worth emphasizing this. If these configs aren't portable ("Here's a sampler config for a JVM workload; there's one for Node.js workload", e.g.), then that limits the ability of a central team to author and manage a small set of configs that many downstream workloads adopt.

Perhaps if we at least define behavior for when a sampler is not usable by a given platform, that'd be ok? E.g., if a Node.js workload has sampler config that directs it to use a sampler with type like

- kind: jvm.classpath
  package: org.something.or.other
  class: CoolSampler

then the Node.js OTel SDK could say "Ah, I don't support jvm.classpath samplers. In this case the sampler is inactive". TBD precisely what "inactive" means for consistent probability sampling.


There will be a need for each supported sampler to have a documented canonical way of instantiation or initialization, which takes a known list of parameters. The order of the parameters specified in the configuration file will have to match exactly that list.

A sampler can be passed as an argument for another sampler:
```yaml
samplerType: ParentBased
parameters:
- samplerType: TraceIdRatioBased # for root spans
parameters:
- 0.75
- samplerType: AlwaysOn # remote parent sampled
- samplerType: AlwaysOff # remote parent not sampled
- samplerType: AlwaysOn # local parent sampled
- samplerType: AlwaysOff # local parent not sampled
```
yurishkuro marked this conversation as resolved.
Show resolved Hide resolved

There's no limit on the depth of nesting samplers, which hopefully allows to create complex configurations addressing most of the sampling needs.

## Composite Samplers

New composite samplers are proposed to make group sampling decisions. They always ask the child samplers for their decisions, but eventually make the final call.

### Logical-Or Sampling
```yaml
samplerType: AnyOf
parameters:
- - SAMPLER1
- SAMPLER2
...
- SAMPLERn
```
The AnyOf sampler takes a list of Samplers as the argument. When making a sampling decision, it goes through the list to find a sampler that decides to sample. If found, this sampling decision is final. If none of the samplers from the list wants to sample the span, the AnyOf sampler drops the span.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe we have such sampler type currently in the spec. The challenge with it is not introducing a configuration, but defining statistically meaningful behavior and sample weight calculation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we don't. This proposal does not call for adding such sampler to the specs, it can remain an internal gadget, but adding it to the specs is also a possibility.
I admit I do not understand your second sentence.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I admit I do not understand your second sentence.

there was a lot of effort put into the spec to develop sampling strategies that allow calculating summary statistics by capturing weight of each sample (simplest example: probabilistic 1-in-10 sampler gives each sample/span a weight=10). When you have a sequential composite sampler like you have here, how is the weight of a sample will be calculated?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consistent Probability Sampling users have their own composite samplers, which properly handle logical-or and logical-and operations on consistent probability samplers, so they do not need AnyOf or AllOf samplers proposed here, and they should not use them.
Generally, composing consistent probability samplers with non-probabilistic samplers leads to undefined (or incorrect) adjusted counts anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Paraphrasing @yurishkuro in another thread:

The challenge with [an AnyOf / OR sampler] is not introducing [its configuration schema], but defining statistically meaningful behavior and sample weight calculation.

I spent a lot of time thinking about this last year. The following is adapted from notes that I never quite edited enough to put up in a PR, but I did present to Sampling SIG in some biweekly meeting (though I can't find any notes/YouTube recording, unfortunately).

Also, I'll also note that I focused entirely on consistent probability samplers since I'm so enthusiastic about validly estimating span counts that I limited my analysis to a world where every sampler is a consistent probability sampler. For the OTel-instrumented projects I work on, I will always lobby hard for keeping our spans countable in a statistically valid manner.


Evaluating a tree of consistent probability samplers

When I was thinking about how to compute the weight/p-value/etc. remains well-defined in a tree of consistent probability samplers, I arrived at the following design:

Definitions:

  • Sampler: Any object that, given a span, produces a p in (0, 1] indicating the probability with which to sample the span.
  • A sampler that considers the p-values of other samplers is called a composite sampler. A non-composite sampler may be called a leaf sampler.

First, start with a sampling policy: config specifying a tree of (predicate, sampler) pairs. Then, to make a sample/drop decision for a span you evaluate the tree:

  1. Evaluate every predicate. If a sampler's predicate is true, call that sampler 'active'.
  2. Disregard any samplers whose condition is false. Also discard any composite samplers with 0 active children. Call the resulting subtree the 'tree of active samplers'.
  3. Evaluate active sampler tree:
    1. Ask the root sampler for its p-value. If the sampler is composite, it will calculate its p-value using the p-values of its children, and so on, recursively.
    2. Finally, perform a single Bernoulli trial with the single resulting p. SAMPLE or DROP as appropriate.

(update p-value -> r-value or your preferred 'weight' notion as necessary)

Complication: Samplers with side effects

The above works well for stateless, no-side-effects samplers. But some samplers have side effects. For example, a sampler that collaborates with samplers in other processes in order to stay under a system-wide throughput limit does I/O to communicate with those processes. Or even simpler: a sampler that maintains some internal statistics and adapts its selectivity based only on what spans it has "seen". In what circumstances should it write updates to those statistics?

A policy containing a single effectful sampler that's activated on all traces has clear semantics: do all effects as normal. The semantics are less obvious when an effectful sampler only activates for some traces, or is composed with other samplers. In the 'tree-of-conditional-samplers' model, in what circumstances should a sampler perform its effects (vs. electing not to perform them at all)?

A sampler with side effects might have unconditional effects—effects triggered for every trace that is submitted to the policy—but it may want to also have effects that are conditional on things like the following:

  • Am I in the active sampler tree? If not, was it my own predicate or that of an ancestor (or both?) that caused me to be excluded?
  • If the sampler did make it to the active sampler tree,
    • What decision was made using the p-value I provided?
    • Was my decision "used"? This concept is akin to short-circuiting in logical operators in programming languages.
      • A sampler is not used if its parent (if it has one) is not used.
      • OR's children 2..n are not used if its first child decides to SAMPLE
      • AND's children 2..n are not used if its first child decides to DROP
      • FIRST's children 2..n are never used
    • If my decision was used, was it also "pivotal" to the overall decision? That is, if my decision were inverted (and all other decisions stayed the same), would the final decision have been inverted?
  • What was the final decision for the trace?

You could imagine that samplers interested in knowing these facts could provide a callback which, if defined, the tree evaluator calls after a final decision has been made. It could invoke the callback with 'receipt' data that conveys 'Here are a bunch of facts about a decision just made, and your relation to it / impact on it.' The sampler could then do with that information what it will.

What systems confront and answer this question today?

  • Refinery's rule-based policy calls to an instance of the secondary sampler only if it's used. It then unconditionally updates its state.

Hopefully the above inspires some ideas! I'm not sure how much of this complexity must be addressed at this stage in order to avoid boxing ourselves in. I suspect much of it is 'severable', able to be punted on. I figured I'd raise it now though in case someone smarter than me sees something critical.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are challenges with stateful (or with other side effects) samplers, as well as with samplers that modify the tracestate. We've made some effort to provide a clear framework for these samplers to operate. We want to see how far we can go with useful enhancements to the samplers library without having to change the specs for the head samplers API, but that means we cannot provide support for your "used" and "pivotal" concepts.

If the first child which decided to sample modified the trace state, the effect of the modification remains in effect.

### Logical-And Sampling
```yaml
samplerType: AllOf
parameters:
- - SAMPLER1
- SAMPLER2
...
- SAMPLERn
```
The AllOf sampler takes a list of SAMPLERs as the argument. When making a sampling decision, it goes through the list to find a sampler that decides not to sample. If found, thie final sampling decision is not to sample. If all of the samplers from the list want to sample the span, the AllOf sampler samples the span.

If all of the child samplers agreed to sample, and some of them modified the trace state, the modifications are cumulative as performed in the given order. If the final decision is not to sample, the trace state remains unchanged.

### Rule based sampling

For rule-based sampling (e.g. decision depends on Span attributes), we need a RuleBasedSampler, which will take a list of Rules as an argument, an optional Span Kind and a fallback Sampler. Each rule will consist of a Predicate and a Sampler. For a sampling decision, if the Span kind matches the optionally specified kind, the list will be worked through in the declared order. If a Predicate holds, the corresponding Sampler will be called to make the final sampling decision. If the Span Kind does not match the final decision is not to sample. If the Span Kind matches, but none of the Predicates evaluates to True, the fallback sampler makes the final decision.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RuleBased here reminds me of the consistent probability sampler I defined last year called first:

# 'first' is a composite sampler, which takes a sequence of child
# samplers and uses the first active child.
kind: first
children:
  - condition: 'trace.service == "A"'
    kind: probability
    adjusted_count: 20
    
  - condition: 'trace.service == "B"'
    kind: probability
    adjusted_count: 50
    
  - kind: probability
    adjusted_count: 1000

The root sampler's p-value is that of the first active child sampler (child whose predicate is true). This is the evaluation model that both X-Ray and Refinery use, and expressiveness-wise it's a superset of Jaeger's config (a mapping from the pair (service name, span name) -> sampler). So I think it's a strong starting place for this conversation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@spencerwilson Do you have any references for it? I'll be happy to include them as prior art.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I never pushed any of the actual config speccing to the internet (prior to these comments, anyway).

For X-Ray and Refinery, their documentation are the primary sources. I probably linked to the relevant parts in my notes in #213.


Note: The `opentelemetry-java-contrib` repository contains [RuleBasedRoutingSampler](https://github.com/open-telemetry/opentelemetry-java-contrib/blob/main/samplers/src/main/java/io/opentelemetry/contrib/sampler/RuleBasedRoutingSampler.java), with similar functionality.
```yaml
samplerType: RuleBased
parameters:
- spanKind: SERVER | CONSUMER | ... # optional
- - RULE1
- RULE2
...
- RULEn
- FALLBACK_SAMPLER
jmacd marked this conversation as resolved.
Show resolved Hide resolved
```
where each RULE is
```yaml
predicate: PREDICATE
sampler: SAMPLER
```
The Predicates represent logical expressions which can access Span Attributes (or anything else available when the sampling decision is to be taken), and perform tests on the accessible values.
For example, one can test if the target URL for a SERVER span matches a given pattern.

## Example

Let's assume that a user wants to configure head sampling as follows:
- for root spans:
- drop all `/healthcheck` requests
- capture all `/checkout` requests
- capture 25% of all other requests
- for non-root spans
- follow the parent sampling decision
- however, capture all calls to service `/foo` (even if the trace will be incomplete)
- in any case, do not exceed 1000 spans/second

Such sampling requirements can be expressed as:
```yaml
samplerType: AllOf
parameters:
- - samplerType: AnyOf
PeterF778 marked this conversation as resolved.
Show resolved Hide resolved
parameters:
- - samplerType: ParentBased
parameters:
- samplerType: RuleBased # for root spans
parameters:
- spanKind: SERVER
- - predicate: http.target == /healtcheck
sampler:
samplerType: AlwaysOff
- predicate: http.target == /checkout
sampler:
samplerType: AlwaysOn
- samplerType: TraceIdRatioBased # fallback sampler
parameters:
- 0.25
- samplerType: AlwaysOn # remote parent sampled
- samplerType: AlwaysOff # remote parent not sampled
- samplerType: AlwaysOn # local parent sampled
- samplerType: AlwaysOff # local parent not sampled
- samplerType: RuleBased
parameters:
- spanKind: CLIENT
- - predicate: http.url == /foo
sampler:
samplerType: AlwaysOn
- samplerType: AlwaysOff # fallback sampler
- samplerType: RateLimiting
parameters:
- 1000 # spans/second
```

The above example uses plain text representation of predicates. Actual representation for predicates is TBD. The example also assumes that a hypothetical `RateLimitingSampler` is available.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actual representation for predicates is TBD

Here are some possible directions, for reference:


## Strong and Weak Typing

Constructing a sampler instance may require using some data types which are not strings, numbers, samplers, or lists of those types. Obviously, the beforementioned rule-based sampler needs to get `Rules`, which, in turn, will reference `Predicates`. The knowledge of these types can be built-in to the YAML parser, to ensure proper support.

If there's a need to support other complex types, the supported parsers can take maps of (key, value) pairs, which will be provided directly by the YAML parser. The samplers will be responsible for converting these values into a suitable internal representation.

## Suggested deployment pattern

Using the file-based configuration for sampling as described in this proposal does not require any changes to the OpenTelemetry Java Agent. The YAML file parser and the code to instantiate and configure requested samplers can be provided as an Extension jarfile (`config_based_sampler.jar` below). All samplers from the `opentelemetry-java-contrib` repository can be also made available as a separate jarfile (`all_samplers.jar` below).

```bash
$ java -javaagent:path/to/opentelemetry-javaagent.jar \
-Dotel.javaagent.extensions=path/to/config_based_sampler.jar \
-Dotel.sampling.config.file=path/to/sampling_config.yaml \
-Dotel.sampling.classpath=path/to/all_samplers.jar \
... \
-jar myapp.jar
```

## Compatibility with existing standards and proposals

Generally, the standard SDK samplers as well as those from the `opentelemetry-java-contrib` repository, with few exceptions, are not prepared to be used directly by this proposal. Even the standard samplers do not have a uniform way of instantiation. For example `ParentBasedSampler` offers only a constructor, while `TraceIdRatioBasedSampler` is typically instantiated using static `create` method.

However, adding some uniformity there, as well as to the samplers from `opentelemetry-java-contrib` should be quite easy, and hopefully not very controversial. It is also possible to demand a uniform instantiation mechanism only for non-standard samplers; the knowledge about the standard samplers can be built-in.

Another point of contention is that the existing configuration practices and proposals (see [JSON Schema Definitions for OpenTelemetry File Configuration](https://github.com/open-telemetry/opentelemetry-configuration/blob/main/schema/tracer_provider.json)) expect very specific knowledge about the samplers, while in this proposal responsibility for matching the samplers' arguments with the samplers signature becomes user's responsibility. However, to decrease the risk of misconfiguration, this proposal can be extended by introducing another configuration section which would describe the number of arguments and their type for non-standard samplers, thus providing some level of consistency checking.

## Open Issues
- How to encode Predicates so they will be readable and yet powerfull and efficiently calculated?
- How to handle RecordOnly (_do not export_) sampling decisions?