open-telemetry · lmolkova · May 18, 2020 · May 18, 2020 · May 28, 2020 · May 28, 2020
diff --git a/text/trace/0107-sampling-score.md b/text/trace/0107-sampling-score.md
@@ -0,0 +1,223 @@
+# Associating sampling score with the trace
+
+Enable consistent sampling across distributed application with different
+sampling rates and probability calculation algorithms.
+
+## TL;DR
+
+**Score** is a floating point number associated with the trace.
+It's calculated when trace starts and flows in the `tracestate`.
+
+*Score* is independent of sampling *probability* (aka *rate*) which represents
+sampler's configuration, not specific to trace.
+
+Sampler can compare the *score* with the configured *probability* to make
+sampling decisions.
+
+Service that starts the trace calculates the score and adds it to the
+`tracestate` so downstream services can re-use it to make their sampling
+decisions *instead of* re-calculating score as a function of trace-id
+(or trace-flags). This allows to configure sampling algorithm on the first
+service ans avoid coordination of algorithms when multiple tracing tools are
+involved.
+
+## Motivation
+
+The goal is to enable a mechanism for consistent (best effort) sampling
+between services with different sampling rates and different probability
+calculation algorithms (for interoperability with existing tracing tools).
+
+Today consistency across multiple services is achieved by following means:
+
+1. Same hashing algorithms on trace-id applied on each span.
+   Problems:
+   - **same sampling algorithm must be used across multiple apps**: it is
+   not always possible e.g. when existing components in a system use
+   vendor-specific tracing tool (pre-OpenTelemetry and major upgrade is hard to
+   justify) while new components are instrumented with OpenTelemetry.
+   - **trace-id uniform distribution is not guaranteed** therefore sampling
+   decisions could be biased
+
+2. Sampling flag propagated from the head component/app is used by downstream
+   apps to sample in a given trace.
+   It requires to trust upstream decision and does not allow to have different
+   sampling rates across different components.
+
+## Explanation
+
+Sampling score is generated by the first service to make sampling
+decision. It's a random float (6-9 digits precision, IEEE-754 32-bit
+floating-point) number in [0, 1] range.
+Score is stamped on the span and also propagated further within `tracestate`.
+
+Next service reads score from `tracestate` (instead of calculating it from
+trace-id) and compares it with its sampling rate to make sampling decision.
+
+Score is exposed through span attributes. Vendors can leverage it
+to sort traces based on their completeness: the lower the value of score is,
+the higher the chance it was sampled in by each component.
+
+Vendors can enable interoperability (in terms of sampling) between legacy
+tools and OpenTelemetry: legacy libraries can be updated in non-breaking way to
+support external score sampling. Updating current vendor-specific library
+version on the existing service in a backward-compatible way is much easier
+than upgrading to OpenTelemetry.
+
+### Example
+
+```
++----------------------+     +----------------------+     +----------------------+
++ Service-A (rate 0.6) + --> + Service-B (rate 0.1) + --> + Service-C (rate 0.5) +
++-------------- -------+     +----------------------+     +----------------------+
+```
+
+1. Service-A receives a request
+   - starts a new trace, generates random trace-id
+   - generates score: `0.17935003`. It's **smaller** than sampling rate
+     (`0.6`), so decision is `RECORD_AND_SAMPLED`
+   - span gets a new attribute `sampling.score = 0.17935003`
+   - tracestate is modified `sampling.score=0.17935003`
+2. Service-B gets request from A
+   - reads trace-context from headers and `sampling.score` from the
+     tracestate
+   - decision is `NOT_RECORD` as `0.17935003` is **bigger** than its
+     sampling rate (0.1)
+3. Service-C get a request from B
+   - reads trace-context from headers and `sampling.score` from the
+     tracestate
+   - decision is `RECORD_AND_SAMPLED` as `0.17935003` is **smaller** than its
+     sampling rate (0.5)
+   - span gets a new attribute `sampling.score = 0.17935003`
+   - tracestate is left untouched
+
+As a result, spans from Service-A and Service-C are exported.
+It's not possible to restore relationship between A and C without B and the
+trace is broken, but Service-C can trace their own requests regardless of B's
+sampling rate and B can have smaller tracing budget regardless of A's decisions.
+All of them can still debug integration issues using common trace-id.
+
+Vendors can pick the most complete traces sorting them by score.
+
+## Internal details
+
+- Service that starts a trace makes sampling decision.  It's configured to use
+`ExternalScoreSampler`(name TBD) is configured by user. Within `ShouldSample`
+callback sampler
+  - generates score [0, 1] interval using `SamplingScoreGenerator` that can run
+    random or deterministic `hash(trace-id)` algorithm.
+  - makes sampling decision by comparing generated score to configured rate
+  - if decision is `RECORD` (or `RECORD_AND_SAMPLED`), sampler adds
+    `sampling.score` attribute to attributes collection of to-be-created span
+  - regardless of sampling decision: prepends `sampling.score` key-value pair
+    into tracestate of to-be-created span
+- Downstream service continues a trace but has different sampling rate (it's
+  also configured to use `ExternalScoreSampler`)
+  - `ExternalScoreSampler.ShouldSample` checks if score is provided in
+    `tracestate`.
+  - makes sampling decision by comparing upstream-generated score with its
+    sampling rate
+  - if span will be recorded: sampler adds `sampling.score` attribute to
+    attributes collection of to-be-created span
+- If downstream service does not find a score in the tracestate, it falls back
+  to the configured score generation algorithm and updates tracestate and
+  attributes
+- Any service can be configured to use other samplers (e.g. `TraceIdRatioBased`)
+  In this case, score in tracestate is not affecting sampling decisions and is
+  re-calculated by sampler.
+
+`ExternalScoreSampler` is responsible for:
+
+- reading and writing score to the `tracestate`
+- if score is set on the tracestate it makes sampling decision
+- if score is not present, it generates one using `SamplingScoreGenerator`.
+
+`SamplingScoreGenerator` responsible for:
+
+- calculating score in random or deterministic way based on sampling parameters.
+
+Here is a [proof of concept](https://github.com/lmolkova/opentelemetry-dotnet/pull/1)
+in .NET.
+
+### Specification Delta
+
+1. Add convention for `sampling.score` attribute on span (TBD). Check out
+   [open questions](open-questions) regarding attribute vs special field.
+2. Add notion of `SamplingScoreGenerator` that is capable of calculating float
+   score from sampling parameters.
+   It  has `TraceIdRatioGenerator`, `RandomGenerator` and possible other
+   implementations.
+   - Change `TraceIdRatioBased` sampler to use corresponding generator and serve
+   as generic probability sampler with configurable score generation approach.
+3. Add `ExternalScoreSampler` implementation of `Sampler`. It's created with
+   probability value and implementation of `SamplingScoreGenerator`.
+
+### Trade-offs and mitigations
+
+This change would be the first (AFAIK) common use case of the `tracestate`.
+It comes with bandwidth and performance overhead: `tracestate` could have
+been just propagated [blindly](https://github.com/open-telemetry/opentelemetry-specification/issues/478).
+and the overhead is made before sampling decision and cannot be mitigated.
+
+Customers should configure it explicitly to avoid the overhead in the default
+case when interoperability is not necessary.
+
+Vendors may gradually update their existing solutions to support external
+score in order to interoperate with OpenTelemetry and should recommend
+customers to configure such sampler.
+
+It may be the case that after migration to OpenTelemetry is finalized, the need
+of `sampling.score` will decrease and customers can remove
+`ExternalScoreSampler` from configuration.
+
+## Prior art and alternatives
+
+[TraceIdRatioBased](https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/trace/sdk.md#traceidratiobased)  sampler.
+
+Related discussions on [Probability sampler](https://github.com/open-telemetry/opentelemetry-specification/pull/570)
+
+### Sampling.score is NOT priority
+
+Priority is used by [OpenTracing](https://github.com/opentracing/specification/blob/master/semantic_conventions.md)
+as an implementation-specific hint for sampler to prioritize recording a span.
+
+[OpenTelemetry collector](https://github.com/open-telemetry/opentelemetry-collector/blob/60b03d0d2d503351501291b30836d2126487a741/processor/samplingprocessor/probabilisticsamplerprocessor/testdata/config.yaml#L10)
+uses `sampling.priority` to hint collector's sampler decision
+
+To avoid conflicts with existing implementations we do not reuse priority term.
+
+## Open questions
+
+### Should we separate sampling from score generation?
+
+Rate-based sampling in this spec is separated from score generation. Sampler can
+be configured to use any algorithm on sampling parameters. Different samplers
+may reuse generation algorithms.
+
+### Attribute vs field on the span to-be-created
+
+Collection of attributes which is passed to sampler is empty by default to
+minimize perf impact. Propagating score back from sampler to span requires
+to initialize the collection.
+
+Creating a new float field on `SamplingDecision` could be an alternative.
+It'd also require adding similar property on Span/SpanData.
+
+There are other scenarios when sampling information is useful for
+exporter: e.g. sampling rate (or it's inverse value: count of spans
+this span represents), exporters can use it to estimate metrics.
+
+Populating all sampling information on all spans may be inefficient in terms of
+event payload size and storage while being useful for a subset of vendors.
+
+Extensible solution may look like a `SamplingInfo` struct that carries all
+fields exporters may need.
+
+```
+struct SamplingInfo
+   Score,
+   Rate/Count,
+   ...
+```
+
+`SamplingResult` would allow sampler for fill it for the span-to-be-created.
+`Span` and its exportable representations will also need to be updated.