-
Notifications
You must be signed in to change notification settings - Fork 40
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Document tracer configuration options (#225)
* rough pass at beginning to document tracer configuration * finish up configuration.md, add sampling.md * fix link * spelling typos * brain damage * slightly better wording * address review comments
- Loading branch information
Showing
3 changed files
with
382 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,255 @@ | ||
Configuration | ||
============= | ||
The Datadog tracer's configuration can be specified in multiple ways: | ||
|
||
- programmatically in C++ code via the `TracerOptions` object defined in | ||
[datadog/opentracing.h][1], | ||
- process-wide via setting values for certain [environment variables][4], | ||
- [dynamically][3] by reading JSON-formatted text, as in done in the [nginx | ||
plugin][2]. | ||
|
||
Most options support all three methods of configuration. | ||
|
||
Environment variables override any corresponding configuration in | ||
`TracerOptions` or loaded from JSON. | ||
|
||
Options | ||
------- | ||
### Agent Host | ||
The name of the host at which the Datadog Agent can be contacted, or the host's | ||
IP address. | ||
|
||
- **TracerOptions member**: `std::string agent_host` | ||
- **JSON property**: `"agent_host"` _string_ | ||
- **Environment variable**: `DD_AGENT_HOST` | ||
- **Default value**: `"localhost"` | ||
|
||
### Agent Port | ||
The port on which the Datadog Agent is listening. | ||
|
||
- **TracerOptions member**: `uint32_t agent_port` | ||
- **JSON property**: `"agent_port"` _integer_ | ||
- **Environment variable**: `DD_TRACE_AGENT_PORT` | ||
- **Default value**: `8126` | ||
|
||
### Agent URL | ||
As an alternative to specifying a host and port separately, a URL may be | ||
specified indicating where the Datadog Agent can be contacted. Both TCP | ||
and Unix domain sockets are supported. For more information about using a | ||
Unix domain socket, see the [relevant example][5]. | ||
|
||
If the Agent URL is specified, then it overrides the Agent host and Agent port | ||
settings. | ||
|
||
The following forms are supported: | ||
|
||
- `http://host` (TCP) | ||
- `http://host:port` (TCP) | ||
- `https://host` (TCP) | ||
- `https://host:port` (TCP) | ||
- `unix://path` (Unix domain socket) | ||
- `path` (Unix domain socket) | ||
|
||
- **TracerOptions member**: `std::string agent_url` | ||
- **JSON property**: `"agent_url"` _(string)_ | ||
- **Environment variable**: `DD_TRACE_AGENT_URL` | ||
- **Default value**: `""` | ||
|
||
### Service Name | ||
The default service name to associate with spans produced by the tracer. | ||
Service name can be overridden programmatically on a per-span basis by setting | ||
a value for the `datadog::tags::service_name` tag. | ||
|
||
- **TracerOptions member**: `std::string service` | ||
- **JSON property**: `"service"` _(string)_ | ||
- **Environment variable**: `DD_SERVICE` | ||
- **Required** | ||
|
||
### Service Type | ||
The default "service type" to associate with spans produced by the tracer. | ||
|
||
Service type is used in multiple places throughout Datadog to distinguish | ||
different categories of instrumented service from each other. For example, | ||
it is used in the following ways: | ||
|
||
- to identify whether the service's spans need to be obfuscated | ||
- to control display of the service in the Datadog UI. | ||
|
||
Example values for service type are `web`, `db`, and `lambda`. | ||
|
||
- **TracerOptions member**: `std::string type` | ||
- **JSON property**: `"type"` _(string)_ | ||
- **Default value**: `"web"` | ||
|
||
### Environment | ||
The default release environment in which the service is running, e.g. "prod," | ||
"dev," or "staging." | ||
|
||
Environment is one of the core properties associated with a service, together | ||
with its name and version. See [Unified Service Tagging][9]. | ||
|
||
- **TracerOptions member**: `std::string environment` | ||
- **JSON property**: `"environment"` _(string)_ | ||
- **Environment variable**: `DD_ENV` | ||
- **Default value**: `""` | ||
|
||
### Sample Rate | ||
The default probability that a trace beginning at this tracer will be sampled | ||
for ingestion. | ||
|
||
For more information about the configuration of trace sampling, see | ||
[sampling.md][6]. | ||
|
||
- **TracerOptions member**: `double sample_rate` | ||
- **JSON property**: `"sample_rate"` _(number)_ | ||
- **Environment variable**: `DD_TRACE_SAMPLE_RATE` | ||
|
||
### Sampling Rules | ||
Sampling rules allow for fine-grained control over the rate at which traces | ||
beginning at this tracer will be sampled for ingestion. Sampling rules are | ||
specified as a JSON array of objects. | ||
|
||
For more information about the configuration of trace sampling, see | ||
[sampling.md][6]. | ||
|
||
- **TracerOptions member**: `std::string sampling_rules` | ||
- **JSON property**: `"sampling_rules"` _(array of objects)_ | ||
- **Environment variable**: `DD_TRACE_SAMPLING_RULES` _(JSON)_ | ||
- **Default value**: `[]` | ||
|
||
### Trace Flushing Period | ||
How often a batch of finished traces is sent to the Datadog Agent. | ||
|
||
- **TracerOptions member**: `int64_t write_period_ms` _(milliseconds)_ | ||
- **Default value**: `1000` _(milliseconds)_ | ||
|
||
### Operation Name | ||
The default operation name to associate with spans produced by the tracer. | ||
|
||
A span's operation name (sometimes just called "name" or "operation") indicates | ||
which of a service's functions the span represents. | ||
|
||
Operation name is often fixed for a given service, e.g. the "nginx" service | ||
entry spans might always have operation name "handle.request". | ||
|
||
Operation name is not to be confused with a span's associated resource, also | ||
known as endpoint. Resource (endpoint) contains information about the | ||
particular request, whereas operation name is more like a subcategory of the | ||
service name. | ||
|
||
- **TracerOptions member**: `std::string operation_name_override` | ||
- **JSON property**: `"operation_name_override"` _(string)_ | ||
- **Default value**: `""` | ||
|
||
### Trace Context Extraction Styles | ||
When one service calls another along a distributed trace, information about the | ||
trace must be propagated in the call; information such as the trace ID, the | ||
parent span ID, and the sampling decision. | ||
|
||
Different tracing systems have different standards for how trace context is | ||
propagated, e.g. which HTTP request headers are used. | ||
|
||
The Datadog C++ tracer supports two styles of trace context propagation. The | ||
default style, `Datadog`, decodes trace information from multiple `X-Datadog-*` | ||
request headers. For compatibility with [other tracing systems][7], another | ||
style, `B3`, is also supported. The `B3` style decodes trace information from | ||
multiple `X-B3-*` request headers. | ||
|
||
The trace context extraction styles setting indicates which styles the tracer | ||
will consider when extracting trace context from a request. At least one style | ||
must be specified, but multiple may be specified. If multiple styles are | ||
specified, then trace context must be successfully extractable in at least one | ||
of the styles, and if trace context can be extracted in both styles, the two | ||
extracted contexts must agree. | ||
|
||
- **TracerOptions member**: `std::set<PropagationStyle> extract` | ||
- **JSON property**: `"propagation_style_extract"` _(array of string)_ | ||
- **Environment variable**: `DD_PROPAGATION_STYLE_EXTRACT` _(JSON)_ | ||
- **Default value**: `["Datadog"]` | ||
|
||
### Trace Context Injection Styles | ||
Trace context injection styles are analogous to trace context extraction styles | ||
(see the previous section), except that rather than indicating which trace | ||
context encoding are supported when _extracting_ trace context, trace context | ||
injection styles indicate which trace context encoding(s) will be used when | ||
_injecting_ context into a request to the next service along a trace. | ||
|
||
Note that even if the `B3` injection style is used, the tracer still may inject | ||
Datadog-specific trace context, such as in the `X-Datadog-Origin` request | ||
header. | ||
|
||
- **TracerOptions member**: `std::set<PropagationStyle> inject` | ||
- **JSON property**: `"propagation_style_inject"` _(array of string)_ | ||
- **Environment variable**: `DD_PROPAGATION_STYLE_INJECT` _(JSON)_ | ||
- **Default value**: `["Datadog"]` | ||
|
||
### Host Name Reporting | ||
If `true`, the tracer will look up its host's name on the network using the | ||
[gethostname][8] function and send it to the Datadog backend in a reserved span | ||
tag. | ||
|
||
- **TracerOptions member**: `bool report_hostname` | ||
- **JSON property**: `"dd.trace.report-hostname"` _(boolean)_ | ||
- **Environment variable**: `DD_TRACE_REPORT_HOSTNAME` | ||
- **Default value**: `false` | ||
|
||
### Span Tags | ||
Tags to add to every span produced by the tracer. | ||
|
||
When specified as `std::map<std::string, std::string> tags`, each entry in the | ||
map is a (key, value) pair, where the key is the name of the span tag, and the | ||
value is its value. The value is a string. | ||
|
||
When specified as the `DD_TAGS` environment variable, tags are formatted as a | ||
comma-separated list of `key:value` pairs (the key and value are separated by a | ||
colon). | ||
|
||
- **TracerOptions member**: `std::map<std::string, std::string> tags` | ||
- **JSON property**: `tags` _(object)_ | ||
- **Environment variable**: `DD_TAGS` _(format: `"name:value,name:value,..."`)_ | ||
- **Default value**: `{}` | ||
|
||
### Application Version | ||
The version of the application that is being instrumented. | ||
|
||
If set, the application version is sent to the Datadog backend as the `version` | ||
tag on the first span that the tracer produces in every trace. | ||
|
||
- **TracerOptions member**: `std::string version` | ||
- **JSON property**: `version` _(string)_ | ||
- **Environment variable**: `DD_VERSION` | ||
- **Default value**: `""` | ||
|
||
### Logging Function | ||
The function used by the library to log diagnostics. | ||
|
||
The provided function takes two arguments: | ||
|
||
- `LogLevel level` is the severity of the diagnostic: `debug`, `info`, or | ||
`error`. | ||
- `::opentracing::string_view message` is the diagnostic message itself. | ||
|
||
- **TracerOptions member**: `std::function<void(LogLevel, ::opentracing::string_view)> log_func` | ||
- **Default value**: _(prints to `std::cerr`)_ | ||
|
||
### Limit Traces Sampled Per Second | ||
The maximum number of traces per second that may be sampled on account of | ||
either sampling rules or `DD_TRACE_SAMPLE_RATE`. | ||
|
||
For more information about the configuration of trace sampling, see | ||
[sampling.md][6]. | ||
|
||
- **TracerOptions member**: `double sampling_limit_per_second` | ||
- **JSON property**: `sampling_limit_per_second` _(number)_ | ||
- **Environment variable**: `DD_TRACE_RATE_LIMIT` | ||
- **Default value**: `100` | ||
|
||
[1]: /include/datadog/opentracing.h | ||
[2]: https://docs.datadoghq.com/tracing/setup_overview/proxy_setup/?tab=nginx#nginx-configuration | ||
[3]: https://docs.datadoghq.com/tracing/setup_overview/setup/cpp/?tab=containers#dynamic-loading | ||
[4]: https://docs.datadoghq.com/tracing/setup_overview/setup/cpp/?tab=containers#environment-variables | ||
[5]: /examples/cpp-tracing/unix-domain-socket | ||
[6]: sampling.md | ||
[7]: https://github.com/openzipkin/b3-propagation | ||
[8]: https://pubs.opengroup.org/onlinepubs/9699919799/ | ||
[9]: https://docs.datadoghq.com/getting_started/tagging/unified_service_tagging |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,124 @@ | ||
Configuring Trace Sampling | ||
========================== | ||
If instrumented services are producing a higher volume of tracing data than is | ||
desired, then the services can be configured to send tracing data for only a | ||
subset of processed requests. This is called trace sampling. | ||
|
||
By default, the rate at which instrumented services sample traces is governed by | ||
the Datadog Agent, which dynamically adjusts the sampling rates of its clients | ||
in order to reach a [configured target number][1] of traces per second. | ||
|
||
For fine-grained control over trace sampling, instrumented services can be | ||
configured with _sampling rules_. What follows is a description of how | ||
trace sampling may be configured in the Datadog C++ tracing library. | ||
|
||
Sampling Rules | ||
-------------- | ||
It is the _first_ service in a trace (the "root service") that determines | ||
whether the trace will be sent to Datadog. Subsequent services in the trace | ||
follow whichever decision was made by the root service. | ||
|
||
The root service may define rules that assign different sampling rates to | ||
different kinds of traces. In these rules, traces are distinguished by the | ||
"service" and "operation name" associated with the root span. Typically, the | ||
root span of a service is always associated with the same "service" and | ||
"operation name." However, services acting as hosts to multiple services may | ||
produce different "service" spans for different requests. | ||
|
||
For example, consider the following array of rules: | ||
```json | ||
[ | ||
{"service": "usersvc", "name": "healthcheck", "sample_rate": 0.0}, | ||
{"service": "usersvc", "sample_rate": 0.5}, | ||
{"service": "authsvc", "sample_rate": 1.0}, | ||
{"sample_rate": 0.1} | ||
] | ||
``` | ||
These rules stipulate the following trace sampling behavior: | ||
|
||
- `usersvc` requests whose operation name is `healthcheck` are never sampled. | ||
- Other `usersvc` requests are sampled 50% of the time. | ||
- `authsvc` requests are sampling 100% of the time. | ||
- All other requests are sampled 10% of the time. | ||
|
||
`sample_rate` is a probability. Its minimum value is zero, indicating "never," | ||
and its maximum value is one, indicating "always." | ||
|
||
Note that the sampling behavior stipulated by sampling rules is relevant only | ||
if the tracer being configured is the _first_ in the trace. | ||
|
||
When a trace is created, its root span is evaluated against each sampling rule | ||
in order. The first rule that matches determines the probability that the | ||
trace will be sampled. If no rule matches, then the trace is subject to the | ||
sampling rates governed by the Datadog Agent, as explained above. | ||
|
||
Sampling rules can be configured programmatically in `std::string | ||
TracerOptions::sampling_rules` or via the environment variable | ||
`DD_TRACE_SAMPLING_RULES`. In either case, the rules are expressed as a JSON | ||
array of objects. Each object supports the following properties: | ||
``` | ||
[{ | ||
"service": <the root span's service name, or any if absent>, | ||
"name": <the root span's operation name, or any if absent>, | ||
"sample_rate": <the probability of sampling the trace, or 1.0 if absent> | ||
}, ...] | ||
``` | ||
|
||
`DD_TRACE_SAMPLE_RATE` | ||
---------------------- | ||
Setting a (numeric) value for the `DD_TRACE_SAMPLE_RATE` environment variable | ||
effectively appends a sampling rule to the tracer's array of sampling rules: | ||
``` | ||
[ | ||
..., | ||
{"sample_rate": $DD_TRACE_SAMPLE_RATE | ||
] | ||
``` | ||
Now there is a sampling rule that matches _any_ trace, and so traces that do | ||
not match an earlier sampling rule are subject to the configured sampling rate. | ||
|
||
Note that using `DD_TRACE_SAMPLE_RATE` means that the Datadog Agent no longer | ||
governs the sampling rate of any traces produced by the tracer. The implicit | ||
"catch-all" rule, with the configured sampling rate, always takes precedence | ||
over the Agent-based fallback. | ||
|
||
`double TracerOptions::sample_rate` | ||
----------------------------------- | ||
This configuration option has the same meaning as the `DD_TRACE_SAMPLE_RATE` | ||
environment variable. Note that the environment variable overrides the | ||
`TracerOptions` field if both are specified. | ||
|
||
`DD_TRACE_RATE_LIMIT` | ||
--------------------- | ||
Sampling rules (and, by extension, `DD_TRACE_SAMPLE_RATE`) specify the | ||
_probability_ that a trace will be sampled, but they do not specify the maximum | ||
number of traces that may be produced by the tracer in a given time period. | ||
|
||
`DD_TRACE_RATE_LIMIT` is the maximum number of traces, per second, that may be | ||
sampled by the tracer on account of sampling rules or `DD_TRACE_SAMPLE_RATE`. | ||
The limit applies globally across all applicable traces, i.e. there is not a | ||
separate limit for each sampling rule. | ||
|
||
`DD_TRACE_RATE_LIMIT` is a floating point number, but is usually specified as an integer, e.g. | ||
```shell | ||
export DD_TRACE_RATE_LIMIT=200 | ||
``` | ||
for a limit of 200 traces per second. | ||
|
||
If this limit is not configured, its default value is 100 traces per second. | ||
|
||
Note that this limit applies separately to each tracer. If the instrumented | ||
service spawns multiple processes, then each process contains its own tracer, | ||
and each tracer is separately subject to the configured rate limit. For | ||
example, if [nginx][2] is configured with `DD_TRACE_RATE_LIMIT=200` and also | ||
spawns eight worker processes, then the actual limit overall is `200 * 8 = | ||
1600` traces per second. | ||
|
||
`double TracerOptions::sampling_limit_per_second` | ||
------------------------------------------------- | ||
This configuration option has the same meaning as the `DD_TRACE_RATE_LIMIT` | ||
environment variable. Note that the environment variable overrides the | ||
`TracerOptions` field if both are specified. | ||
|
||
[1]: https://docs.datadoghq.com/tracing/trace_ingestion/mechanisms/?tab=environmentvariables#in-the-agent | ||
[2]: https://docs.datadoghq.com/tracing/setup_overview/proxy_setup/?tab=nginx |