Deadline exceed in DataDog exporter #1409

3miliano · 2024-06-29T20:57:25Z

Describe the bug
I am experiencing a deadline exceed issue on the DataDog exporter, as evidenced by the logs. This issue results in failed export attempts and subsequent retries.

Steps to reproduce
1. Configure the custom Docker image with the custom collector based on opentelemetry-lambda that includes the DataDog exporter.
2. Initiate data export (traces, logs, metrics).
3. Observe the logs for errors related to context deadlines being exceeded.

What did you expect to see?
I expected the data to be exported successfully to DataDog without any timeout errors.

What did you see instead?
The export requests failed with “context deadline exceeded” errors, resulting in retries and eventual dropping of the payloads. Here are some excerpts from the logs:

1719687286935 {"level":"warn","ts":1719687286.9350078,"caller":"batchprocessor@v0.103.0/batch_processor.go:263","msg":"Sender failed","kind":"processor","name":"batch","pipeline":"logs","error":"no more retries left: Post \"https://http-intake.logs.datadoghq.com/api/v2/logs?ddtags=service%3Akognitos.book.yaml%2Cenv%3Amain%2Cregion%3Aus-west-2%2Ccloud_provider%3Aaws%2Cos.type%3Alinux%2Cotel_source%3Adatadog_exporter\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"}
1719687286936 {"level":"error","ts":1719687286.9363096,"caller":"datadogexporter@v0.103.0/traces_exporter.go:181","msg":"Error posting hostname/tags series","kind":"exporter","data_type":"traces","name":"datadog","error":"max elapsed time expired Post \"https://api.datadoghq.com/api/v2/series\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)","stacktrace":"github.com/open-telemetry/opentelemetry-collector-contrib/exporter/datadogexporter.(*traceExporter).exportUsageMetrics\n\t/root/go/pkg/mod/github.com/open-telemetry/opentelemetry-collector-contrib/exporter/datadogexporter@v0.103.0/traces_exporter.go:181\ngh.neting.cc/open-telemetry/opentelemetry-collector-contrib/exporter/datadogexporter.(*traceExporter).consumeTraces\n\t/root/go/pkg/mod/github.com/open-telemetry/opentelemetry-collector-contrib/exporter/datadogexporter@v0.103.0/traces_exporter.go:139\ngo.opentelemetry.io/collector/exporter/exporterhelper.(*tracesRequest).Export\n\t/root/go/pkg/mod/go.opentelemetry.io/collector/exporter@v0.103.0/exporterhelper/traces.go:59\ngo.opentelemetry.io/collector/exporter/exporterhelper.(*timeoutSender).send\n\t/root/go/pkg/mod/go.opentelemetry.io/collector/exporter@v0.103.0/exporterhelper/timeout_sender.go:43\ngo.opentelemetry.io/collector/exporter/exporterhelper.(*baseRequestSender).send\n\t/root/go/pkg/mod/go.opentelemetry.io/collector/exporter@v0.103.0/exporterhelper/common.go:37\ngo.opentelemetry.io/collector/exporter/exporterhelper.(*tracesExporterWithObservability).send\n\t/root/go/pkg/mod/go.opentelemetry.io/collector/exporter@v0.103.0/exporterhelper/traces.go:159\ngo.opentelemetry.io/collector/exporter/exporterhelper.(*baseRequestSender).send\n\t/root/go/pkg/mod/go.opentelemetry.io/collector/exporter@v0.103.0/exporterhelper/common.go:37\ngo.opentelemetry.io/collector/exporter/exporterhelper.(*baseRequestSender).send\n\t/root/go/pkg/mod/go.opentelemetry.io/collector/exporter@v0.103.0/exporterhelper/common.go:37\ngo.opentelemetry.io/collector/exporter/exporterhelper.(*baseExporter).send\n\t/root/go/pkg/mod/go.opentelemetry.io/collector/exporter@v0.103.0/exporterhelper/common.go:294\ngo.opentelemetry.io/collector/exporter/exporterhelper.NewTracesRequestExporter.func1\n\t/root/go/pkg/mod/go.opentelemetry.io/collector/exporter@v0.103.0/exporterhelper/traces.go:134\ngo.opentelemetry.io/collector/consumer.ConsumeTracesFunc.ConsumeTraces\n\t/root/go/pkg/mod/go.opentelemetry.io/collector/consumer@v0.103.0/traces.go:25\ngo.opentelemetry.io/collector/processor/batchprocessor.(*batchTraces).export\n\t/root/go/pkg/mod/go.opentelemetry.io/collector/processor/batchprocessor@v0.103.0/batch_processor.go:414\ngo.opentelemetry.io/collector/processor/batchprocessor.(*shard).sendItems\n\t/root/go/pkg/mod/go.opentelemetry.io/collector/processor/batchprocessor@v0.103.0/batch_processor.go:261\ngo.opentelemetry.io/collector/processor/batchprocessor.(*shard).startLoop\n\t/root/go/pkg/mod/go.opentelemetry.io/collector/processor/batchprocessor@v0.103.0/batch_processor.go:223"}

What version of collector/language SDK version did you use?
Version: Custom layer-collector/0.8.0 + datadogexporter from v0.103.0

What language layer did you use?
Config: None. It is a custom runtime that includes the binary in extensions.

Additional context
Here is my configuration file:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: "127.0.0.1:4317"
  hostmetrics:
    collection_interval: 60s
    scrapers:
      paging:
        metrics:
          system.paging.utilization:
            enabled: true
      cpu:
        metrics:
          system.cpu.utilization:
            enabled: true
      disk:
      filesystem:
        metrics:
          system.filesystem.utilization:
            enabled: true
      load:
      memory:
      network:
      processes:

exporters:
  datadog:
    api:
      key: ${secretsmanager:infrastructure/datadog_api_key}
    sending_queue:
      enabled: false
    tls:
      insecure: true
      insecure_skip_verify: true

connectors:
  datadog/connector:
      
processors:
  resourcedetection:
    detectors: ["lambda", "system"]
    system:
      hostname_sources: ["os"]
  transform:
    log_statements:
      - context: resource
        statements:
          - delete_key(attributes, "service.version")
          - set(attributes["service"], attributes["service.name"])
          - delete_key(attributes, "service.name")
      - context: log
        statements:
          - set(body, attributes["exception.message"]) where attributes["exception.message"] != nil
          - set(attributes["error.stack"], attributes["exception.stacktrace"]) where attributes["exception.stacktrace"] != nil
          - set(attributes["error.message"], attributes["exception.message"]) where attributes["exception.message"] != nil
          - set(attributes["error.kind"], attributes["exception.kind"]) where attributes["exception.kind"] != nil
service:
  telemetry:
    logs:
      level: "debug"
  pipelines:
    traces:
      receivers: [otlp]
      processors: [resourcedetection]
      exporters: [datadog/connector]
    traces/2:
      receivers: [datadog/connector]
      exporters: [datadog]
    metrics:
      receivers: [hostmetrics, otlp]
      processors: [resourcedetection]
      exporters: [datadog]
    logs:
      receivers: [otlp]
      processors: [resourcedetection, transform]
      exporters: [datadog]

Enabling/disabling sending_queue does not seem to do anything to prevent the errors. I did noticed that if I hit the service continuously some traces do get sent, but only a few.

What I discarded as potential solutions:

Connectivity issues. DataDog has an API Key validation API calls that succeeds. If the service is hit constantly some traces get thru.

The text was updated successfully, but these errors were encountered:

tylerbenson · 2024-07-11T22:11:18Z

Any reason you're not using the batch processor? That would probably help.

serkan-ozal · 2024-08-28T22:01:52Z

@3miliano I think it is because of container freeze right after invocation complete and with those configs you have shared, collector is not aware of Lambda lifecycle. So as @tylerbenson suggested, using batch processor (so it will activate decouple processor by default) right before Datadog exported should resolve your problem.

3miliano added the bug Something isn't working label Jun 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deadline exceed in DataDog exporter #1409

Deadline exceed in DataDog exporter #1409

3miliano commented Jun 29, 2024 •

edited

Loading

tylerbenson commented Jul 11, 2024

serkan-ozal commented Aug 28, 2024

Deadline exceed in DataDog exporter #1409

Deadline exceed in DataDog exporter #1409

Comments

3miliano commented Jun 29, 2024 • edited Loading

tylerbenson commented Jul 11, 2024

serkan-ozal commented Aug 28, 2024

3miliano commented Jun 29, 2024 •

edited

Loading