Ignored transactions and consistent probability sampling #3307

PeterF778 · 2023-03-08T22:57:59Z

Let's consider the following hypothetical example.

An OpenTelemetry user deploys a distributed application with tiers A, B, and C. Customer requests hit tier A, which makes calls to tier B, and tier B makes calls to C. The OTel user wants to trace some of the transactions only. Other transactions are to be ignored (for whatever reasons). The distinction between ignored and traced transaction is made by a custom sampler at tier A, based on the URL of the incoming requests.
When tiers B and C use ParentBasedSamplers with the default configuration, things work as expected, but the user may have performance issues because the sampling decisions made by tier A propagate to tiers B and C without any adjustment (fan out problem).

Employing Consistent Probability Samplers may fix the performance issue because sampling rates at tiers B and C can now be different. However, now calls to tier B and C may be traced even if they belong to transactions which the user wanted to ignore. This is because Consistent Probability Samplers ignore the sampled flag received from the parent. This behavior is not what the OTel user may want. For the ignored transactions the traces will always be incomplete, missing the root span.

But is it a valid use case?

If it is, a possible fix could be removing the requirement that a Consistent Probability Sampler always generates a new r-value for non-root spans (think tiers B and C), if it is missing. It should do it only when the parent's sampled flag is set. If the flag is false and there is no r-value, it should simply decide not to sample and leave the TraceState as is.

The text was updated successfully, but these errors were encountered:

PeterF778 · 2023-04-11T23:21:16Z

After giving the issue some thought, I now believe this is not a valid use case. By that I do not mean that it should forbidden to completely ignore a class of requests from observability perspective, but it should be discouraged, and my proposed "fix" would make things even worse.
A downstream service (C), which might have a different owner than the root service A, needs to be able to trace/sample all incoming requests, if it desires. Otherwise, it would not have any way to calculate basic metrics, such as incoming request rate.
Another reason to not apply my proposed "fix" is that getting truly random trace-id bits (see randomness of trace id) opens an opportunity to omit the r-value from trace state. This would force each consistent probability sampler to (consistently) derive the r-value from the trace-id, but it also means that the absence of r-value could not be used to assume anything about the trace.

PeterF778 added the spec:trace Related to the specification/trace directory label Mar 8, 2023

github-actions bot assigned jack-berg Mar 8, 2023

PeterF778 mentioned this issue Mar 9, 2023

Non-power-of-two consistent tail probability sampling in TraceState open-telemetry/oteps#226

Closed

jmacd mentioned this issue Feb 27, 2024

Probability Samplers based on W3C Trace Context Level 2 #3910

Closed

5 tasks

This was referenced Jul 25, 2024

Randomness requirements following W3C Trace Context level 2 #4162

Draft

OpenTelemetry TraceIdRatioBased sampler requirements following OTEP 235 #4166

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ignored transactions and consistent probability sampling #3307

Ignored transactions and consistent probability sampling #3307

PeterF778 commented Mar 8, 2023 •

edited

Loading

PeterF778 commented Apr 11, 2023

Ignored transactions and consistent probability sampling #3307

Ignored transactions and consistent probability sampling #3307

Comments

PeterF778 commented Mar 8, 2023 • edited Loading

PeterF778 commented Apr 11, 2023

PeterF778 commented Mar 8, 2023 •

edited

Loading