Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[probabilistic sampling processor] encoded sampling probability (support OTEP 235) #31894

Merged
merged 99 commits into from
Jun 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
99 commits
Select commit Hold shift + click to select a range
e822a9b
Add t-value sampler draft
jmacd May 12, 2023
1bc6017
copy/import tracestate parser package
jmacd May 15, 2023
d1fd891
test ot tracestate
jmacd May 16, 2023
85e4472
tidy
jmacd May 16, 2023
bb75f8a
renames
jmacd May 16, 2023
6a57b77
testing two parsers w/ generic code
jmacd May 17, 2023
7fa8130
integrated
jmacd May 17, 2023
36230e7
Comments
jmacd May 17, 2023
7bae35c
revert two files
jmacd May 17, 2023
9010a67
Update with r, s, and t-value. Now using regexps and strings.IndexBy…
jmacd Jun 1, 2023
0e27e40
fix sampler build
jmacd Jun 1, 2023
efcdc3d
add support for s-value for non-consistent mode
jmacd Jun 1, 2023
939c758
WIP
jmacd Jul 10, 2023
b9a1e56
Merge branch 'main' of github.com:open-telemetry/opentelemetry-collec…
jmacd Aug 2, 2023
a31266c
use new proposed syntax see https://github.com/open-telemetry/opentel…
jmacd Aug 2, 2023
690cd64
update tracestate libs for new encoding
jmacd Aug 2, 2023
c8baf29
wip working on probabilistic sampler with two new modes: downsampler …
jmacd Aug 2, 2023
7f47e4a
unsigned implement split
jmacd Aug 3, 2023
422e0b2
two implementations
jmacd Aug 3, 2023
787b9fd
wip
jmacd Sep 5, 2023
ed36f03
Merge branch 'main' of github.com:open-telemetry/opentelemetry-collec…
jmacd Sep 6, 2023
d795210
Updates for OTEP 235
jmacd Sep 6, 2023
09000f7
wip TODO
jmacd Sep 6, 2023
a4d467b
versions.yaml
jmacd Sep 6, 2023
e373b9b
Add proportional sampler mode; comment on TODOs; create SamplerMode t…
jmacd Sep 7, 2023
fe6a085
back from internal
jmacd Oct 4, 2023
396efb1
wip
jmacd Oct 4, 2023
36de5dd
fix existing tests
jmacd Oct 6, 2023
f1aa0ad
:wip:
jmacd Oct 12, 2023
700734e
Update for rejection threshold
jmacd Nov 15, 2023
ae50bdd
Merge branch 'main' of github.com:open-telemetry/opentelemetry-collec…
jmacd Nov 15, 2023
a94b8e7
fix preexisting tests
jmacd Nov 16, 2023
4edcbcb
basic yes/no t-value sampling test
jmacd Nov 16, 2023
53bf119
Merge branch 'main' of github.com:open-telemetry/opentelemetry-collec…
jmacd Nov 29, 2023
3cdb957
add version for sampling pkg
jmacd Nov 29, 2023
e506847
more testing
jmacd Dec 7, 2023
2cddfeb
add probability to threshold with precision option
jmacd Dec 8, 2023
f69d6ee
ProbabilityToThresholdWithPrecision
jmacd Dec 8, 2023
cc02934
test coverage for equalizing and proportional
jmacd Dec 8, 2023
1eecc4a
config test
jmacd Dec 8, 2023
2159107
comments and notes
jmacd Dec 8, 2023
e0898a6
update README
jmacd Dec 8, 2023
d0991ed
Merge branch 'main' of github.com:open-telemetry/opentelemetry-collec…
jmacd Jan 10, 2024
a002774
Remove sampling pkg, it is now upstream
jmacd Feb 14, 2024
3a49922
Merge branch 'main' of github.com:open-telemetry/opentelemetry-collec…
jmacd Feb 28, 2024
fca0184
build w/ new sampling pkg
jmacd Feb 28, 2024
f11e0a4
more test coverage
jmacd Feb 29, 2024
3f495a6
more config tests
jmacd Feb 29, 2024
581095c
test precision underflow
jmacd Mar 1, 2024
7b8fd31
test warning logs
jmacd Mar 1, 2024
1a6be4f
hash_seed fixes
jmacd Mar 1, 2024
712cf17
wip
jmacd Mar 4, 2024
34c0d3b
aip
jmacd Mar 5, 2024
7742668
wip-refactoring
jmacd Mar 13, 2024
8d60168
refactor wip
jmacd Mar 14, 2024
3779caa
cleanup refactor
jmacd Mar 14, 2024
c261ac1
wip
jmacd Mar 14, 2024
34469e4
moving code
jmacd Mar 15, 2024
8dabf47
fix tests; round up small probs to avoid errors
jmacd Mar 15, 2024
d44afb5
preserve legacy behavior
jmacd Mar 15, 2024
1cf9991
logs handled sampling priority differently
jmacd Mar 15, 2024
365d35d
still two errors
jmacd Mar 18, 2024
12a3047
builds
jmacd Mar 19, 2024
8655f42
needs testing
jmacd Mar 19, 2024
468e6c6
fixing tests
jmacd Mar 21, 2024
23b4423
cleanup
jmacd Mar 21, 2024
07841e5
remove strict feature
jmacd Mar 21, 2024
6936bc4
tests fixed
jmacd Mar 21, 2024
c132f4c
wip
jmacd Mar 22, 2024
bd13ac9
typo
jmacd Mar 22, 2024
aa33b1c
more logs tests
jmacd Mar 22, 2024
06556dc
Add more comments
jmacd Mar 22, 2024
a4940e6
update README
jmacd Mar 22, 2024
4f616e9
Merge branch 'main' of github.com:open-telemetry/opentelemetry-collec…
jmacd Mar 22, 2024
b4ca3aa
Merge branch 'main' of github.com:open-telemetry/opentelemetry-collec…
jmacd Mar 25, 2024
fdd26ac
Merge branch 'main' of github.com:open-telemetry/opentelemetry-collec…
jmacd Mar 25, 2024
794d1a1
wip update
jmacd May 30, 2024
a305a7f
undo comment changes
jmacd May 30, 2024
98433af
test all modes logs missing randomness
jmacd May 30, 2024
3aa4608
more missing rando
jmacd May 30, 2024
a0bc49e
smaller diff
jmacd May 30, 2024
d0aea21
comment carrier
jmacd May 30, 2024
7b81625
chlog
jmacd May 30, 2024
fe4dd37
simplify ctcom
jmacd May 30, 2024
a244866
lint
jmacd May 30, 2024
89331bc
combine README updates
jmacd May 30, 2024
04d65c4
tidy
jmacd May 30, 2024
9cb1586
Merge branch 'main' of github.com:open-telemetry/opentelemetry-collec…
jmacd May 30, 2024
a98db61
Add sampler mode use-cases
jmacd Jun 10, 2024
d33660b
rephrase tracestate; logs do not use tracestate
jmacd Jun 10, 2024
c67350d
explain sampling precision
jmacd Jun 10, 2024
b0a9516
move misplaced text
jmacd Jun 10, 2024
95ecbae
remove multierr
jmacd Jun 10, 2024
cbcc853
Apply suggestions from code review
jmacd Jun 11, 2024
ad32651
only debug and info
jmacd Jun 11, 2024
6b71ea8
adjust test for debug-level logs
jmacd Jun 11, 2024
61abf1f
revert change of default mode
jmacd Jun 11, 2024
0664ea1
Merge branch 'main' of github.com:open-telemetry/opentelemetry-collec…
jmacd Jun 12, 2024
1926afb
Merge branch 'main' into jmacd/tvaluesampler
jmacd Jun 12, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions .chloggen/probabilisticsampler_modes.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Use this changelog template to create an entry for release notes.

# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: enhancement

# The name of the component, or a single word describing the area of concern, (e.g. filelogreceiver)
component: probabilisticsamplerprocessor

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: Add Proportional and Equalizing sampling modes

# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists.
issues: [31918]

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext: Both the existing hash_seed mode and the two new modes use OTEP 235 semantic conventions to encode sampling probability.

# If your change doesn't affect end users or the exported elements of any package,
# you should instead start your pull request title with [chore] or use the "Skip Changelog" label.
# Optional: The change log or logs in which this entry should be included.
# e.g. '[user]' or '[user, api]'
# Include 'user' if the change is relevant to end users.
# Include 'api' if there is a change to a library API.
# Default: '[user]'
change_logs: [user]
156 changes: 154 additions & 2 deletions processor/probabilisticsamplerprocessor/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@

# Probabilistic Sampling Processor

<!-- status autogenerated section -->
Expand Down Expand Up @@ -115,7 +116,9 @@ interpreted as a percentage, with values >= 100 equal to 100%
sampling. The logs sampling priority attribute is configured via
`sampling_priority`.

## Sampling algorithm
## Mode Selection

There are three sampling modes available. All modes are consistent.

### Hash seed

Expand All @@ -135,7 +138,154 @@ In order for hashing to be consistent, all collectors for a given tier
at different collector tiers to support additional sampling
requirements.

This mode uses 14 bits of sampling precision.
This mode uses 14 bits of information in its sampling decision; the
default `sampling_precision`, which is 4 hexadecimal digits, exactly
encodes this information.

This mode is selected by default.

#### Hash seed: Use-cases

The hash seed mode is most useful in logs sampling, because it can be
applied to units of telemetry other than TraceID. For example, a
deployment consisting of 100 pods can be sampled according to the
`service.instance.id` resource attribute. In this case, 10% sampling
implies collecting log records from an expected value of 10 pods.

### Proportional

OpenTelemetry specifies a consistent sampling mechanism using 56 bits
of randomness, which may be obtained from the Trace ID according to
the W3C Trace Context Level 2 specification. Randomness can also be
explicly encoding in the OpenTelemetry `tracestate` field, where it is
known as the R-value.

This mode is named because it reduces the number of items transmitted
proportionally, according to the sampling probability. In this mode,
items are selected for sampling without considering how much they were
already sampled by preceding samplers.

This mode uses 56 bits of information in its calculations. The
default `sampling_precision` (4) will cause thresholds to be rounded
in some cases when they contain more than 16 significant bits.

#### Proportional: Use-cases

The proportional mode is generally applicable in trace sampling,
because it is based on OpenTelemetry and W3C specifications. This
mode is selected by default, because it enforces a predictable
(probabilistic) ratio between incoming items and outgoing items of
telemetry. No matter how SDKs and other sources of telemetry have
been configured with respect to sampling, a collector configured with
25% proportional sampling will output (an expected value of) 1 item
for every 4 items input.

### Equalizing

This mode uses the same randomness mechanism as the propotional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This mode uses the same randomness mechanism as the propotional
This mode uses the same randomness mechanism as the proportional

sampling mode, in this case considering how much each item was already
sampled by preceding samplers. This mode can be used to lower
sampling probability to a minimum value across a whole pipeline,
making it possible to conditionally adjust sampling probabilities.

This mode compares a 56 bit threshold against the configured sampling
probability and updates when the threshold is larger. The default
`sampling_precision` (4) will cause updated thresholds to be rounded
in some cases when they contain more than 16 significant bits.

#### Equalizing: Use-cases

The equalizing mode is useful in collector deployments where client
SDKs have mixed sampling configuration and the user wants to apply a
uniform sampling probability across the system. For example, a user's
system consists of mostly components developed in-house, but also some
third-party software. Seeking to lower the overall cost of tracing,
the configures 10% sampling in the samplers for all of their in-house
components. This leaves third-party software components unsampled,
making the savings less than desired. In this case, the user could
configure a 10% equalizing probabilistic sampler. Already-sampled
items of telemetry from the in-house components will pass-through one
for one in this scenario, while items of telemetry from third-party
software will be sampled by the intended amount.

## Sampling threshold information

In all modes, information about the effective sampling probability is
added into the item of telemetry. The random variable that was used
may also be recorded, in case it was not derived from the TraceID
using a standard algorithm.

For traces, threshold and optional randomness information are encoded
in the W3C Trace Context `tracestate` fields. The tracestate is
divided into sections according to a two-character vendor code;
OpenTelemetry uses "ot" as its section designator. Within the
OpenTelemetry section, the sampling threshold is encoded using "th"
and the optional random variable is encoded using "rv".

For example, 25% sampling is encoded in a tracing Span as:

```
tracestate: ot=th:c
```

Users can randomness values in this way, independently, making it
possible to apply consistent sampling across traces for example. If
the Trace was initialized with pre-determined randomness value
`9b8233f7e3a151` and 100% sampling, it would read:

```
tracestate: ot=th:0;rv:9b8233f7e3a151
```

This component, using either proportional or equalizing modes, could
apply 50% sampling the Span. This span with randomness value
`9b8233f7e3a151` is consistently sampled at 50% because the threshold,
when zero padded (i.e., `80000000000000`), is less than the randomess
value. The resulting span will have the following tracestate:

```
tracestate: ot=th:8;rv:9b8233f7e3a151
```

For log records, threshold and randomness information are encoded in
jmacd marked this conversation as resolved.
Show resolved Hide resolved
the log record itself, using attributes. For example, 25% sampling
with an explicit randomness value is encoded as:

```
sampling.threshold: c
sampling.randomness: e05a99c8df8d32
```

### Sampling precision

When encoding sampling probability in the form of a threshold,
variable precision is permitted making it possible for the user to
restrict sampling probabilities to rounded numbers of fixed width.

Because the threshold is encoded using hexadecimal digits, each digit
contributes 4 bits of information. One digit of sampling precision
can express exact sampling probabilities 1/16, 2/16, ... through
16/16. Two digits of sampling precision can express exact sampling
probabilities 1/256, 2/256, ... through 256/256. With N digits of
sampling precision, there are exactly `(2^N)-1` exactly representable
probabilities.

Depending on the mode, there are different maximum reasonable settings
for this parameter.

- The `hash_seed` mode uses a 14-bit hash function, therefore
precision 4 completely captures the available information.
- The `equalizing` mode configures a sampling probability after
parsing a `float32` value, which contains 20 bits of precision,
therefore precision 5 completely captures the available information.
- The `proportional` mode configures its ratio using a `float32`
value, however it carries out the arithmetic using 56-bits of
precision. In this mode, increasing precision has the effect
of preserving precision applied by preceding samplers.

In cases where larger precision is configured than is actually
available, the added precision has no effect because trailing zeros
are eliminated by the encoding.

### Error handling

Expand All @@ -153,9 +303,11 @@ false, in which case erroneous data will pass through the processor.

The following configuration options can be modified:

- `mode` (string, optional): One of "proportional", "equalizing", or "hash_seed"; the default is "proportional" unless either `hash_seed` is configured or `attribute_source` is set to `record`.
- `sampling_percentage` (32-bit floating point, required): Percentage at which items are sampled; >= 100 samples all items, 0 rejects all items.
- `hash_seed` (32-bit unsigned integer, optional, default = 0): An integer used to compute the hash algorithm. Note that all collectors for a given tier (e.g. behind the same load balancer) should have the same hash_seed.
- `fail_closed` (boolean, optional, default = true): Whether to reject items with sampling-related errors.
- `sampling_precision` (integer, optional, default = 4): Determines the number of hexadecimal digits used to encode the sampling threshold. Permitted values are 1..14.

### Logs-specific configuration

Expand Down
65 changes: 63 additions & 2 deletions processor/probabilisticsamplerprocessor/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,11 @@ package probabilisticsamplerprocessor // import "github.com/open-telemetry/opent

import (
"fmt"
"math"

"go.opentelemetry.io/collector/component"

"github.com/open-telemetry/opentelemetry-collector-contrib/pkg/sampling"
)

type AttributeSource string
Expand Down Expand Up @@ -35,6 +38,33 @@ type Config struct {
// different sampling rates, configuring different seeds avoids that.
HashSeed uint32 `mapstructure:"hash_seed"`

// Mode selects the sampling behavior. Supported values:
//
// - "hash_seed": the legacy behavior of this processor.
// Using an FNV hash combined with the HashSeed value, this
// sampler performs a non-consistent probabilistic
// downsampling. The number of spans output is expected to
// equal SamplingPercentage (as a ratio) times the number of
// spans inpout, assuming good behavior from FNV and good
// entropy in the hashed attributes or TraceID.
//
// - "equalizing": Using an OTel-specified consistent sampling
// mechanism, this sampler selectively reduces the effective
// sampling probability of arriving spans. This can be
// useful to select a small fraction of complete traces from
// a stream with mixed sampling rates. The rate of spans
// passing through depends on how much sampling has already
// been applied. If an arriving span was head sampled at
// the same probability it passes through. If the span
// arrives with lower probability, a warning is logged
// because it means this sampler is configured with too
// large a sampling probability to ensure complete traces.
//
// - "proportional": Using an OTel-specified consistent sampling
// mechanism, this sampler reduces the effective sampling
// probability of each span by `SamplingProbability`.
Mode SamplerMode `mapstructure:"mode"`

// FailClosed indicates to not sample data (the processor will
// fail "closed") in case of error, such as failure to parse
// the tracestate field or missing the randomness attribute.
Expand All @@ -45,6 +75,14 @@ type Config struct {
// despite errors using priority.
FailClosed bool `mapstructure:"fail_closed"`

// SamplingPrecision is how many hex digits of sampling
// threshold will be encoded, from 1 up to 14. Default is 4.
// 0 is treated as full precision.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not according to invalid_zero.yaml.

SamplingPrecision int `mapstructure:"sampling_precision"`
jmacd marked this conversation as resolved.
Show resolved Hide resolved

///////
// Logs only fields below.

// AttributeSource (logs only) defines where to look for the attribute in from_attribute. The allowed values are
// `traceID` or `record`. Default is `traceID`.
AttributeSource `mapstructure:"attribute_source"`
Expand All @@ -61,11 +99,34 @@ var _ component.Config = (*Config)(nil)

// Validate checks if the processor configuration is valid
func (cfg *Config) Validate() error {
if cfg.SamplingPercentage < 0 {
return fmt.Errorf("negative sampling rate: %.2f", cfg.SamplingPercentage)
pct := float64(cfg.SamplingPercentage)

if math.IsInf(pct, 0) || math.IsNaN(pct) {
return fmt.Errorf("sampling rate is invalid: %f%%", cfg.SamplingPercentage)
}
ratio := pct / 100.0

switch {
case ratio < 0:
return fmt.Errorf("sampling rate is negative: %f%%", cfg.SamplingPercentage)
case ratio == 0:
// Special case
case ratio < sampling.MinSamplingProbability:
// Too-small case
return fmt.Errorf("sampling rate is too small: %g%%", cfg.SamplingPercentage)
default:
// Note that ratio > 1 is specifically allowed by the README, taken to mean 100%
}

if cfg.AttributeSource != "" && !validAttributeSource[cfg.AttributeSource] {
return fmt.Errorf("invalid attribute source: %v. Expected: %v or %v", cfg.AttributeSource, traceIDAttributeSource, recordAttributeSource)
}

if cfg.SamplingPrecision == 0 {
return fmt.Errorf("invalid sampling precision: 0")
} else if cfg.SamplingPrecision > sampling.NumHexDigits {
return fmt.Errorf("sampling precision is too great, should be <= 14: %d", cfg.SamplingPrecision)
}

return nil
}
10 changes: 9 additions & 1 deletion processor/probabilisticsamplerprocessor/config_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@ func TestLoadConfig(t *testing.T) {
id: component.NewIDWithName(metadata.Type, ""),
expected: &Config{
SamplingPercentage: 15.3,
SamplingPrecision: 4,
Mode: "proportional",
AttributeSource: "traceID",
FailClosed: true,
},
Expand All @@ -34,7 +36,9 @@ func TestLoadConfig(t *testing.T) {
id: component.NewIDWithName(metadata.Type, "logs"),
expected: &Config{
SamplingPercentage: 15.3,
SamplingPrecision: defaultPrecision,
HashSeed: 22,
Mode: "",
AttributeSource: "record",
FromAttribute: "foo",
SamplingPriority: "bar",
Expand Down Expand Up @@ -68,7 +72,11 @@ func TestLoadInvalidConfig(t *testing.T) {
file string
contains string
}{
{"invalid_negative.yaml", "negative sampling rate"},
{"invalid_negative.yaml", "sampling rate is negative"},
{"invalid_small.yaml", "sampling rate is too small"},
{"invalid_inf.yaml", "sampling rate is invalid: +Inf%"},
{"invalid_prec.yaml", "sampling precision is too great"},
{"invalid_zero.yaml", "invalid sampling precision"},
} {
t.Run(test.file, func(t *testing.T) {
factories, err := otelcoltest.NopFactories()
Expand Down
6 changes: 4 additions & 2 deletions processor/probabilisticsamplerprocessor/factory.go
Original file line number Diff line number Diff line change
Expand Up @@ -40,8 +40,10 @@ func NewFactory() processor.Factory {

func createDefaultConfig() component.Config {
return &Config{
AttributeSource: defaultAttributeSource,
FailClosed: true,
AttributeSource: defaultAttributeSource,
FailClosed: true,
Mode: modeUnset,
SamplingPrecision: defaultPrecision,
}
}

Expand Down

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading