Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continuous Messages of exporting failed, dropped data, sender failed in the aws-otel-collector.log #551

Closed
georges-git opened this issue Jun 23, 2021 · 16 comments
Assignees
Labels
bug Something isn't working stale

Comments

@georges-git
Copy link

Hello @mxiamxia and AWS group - Keep getting following messages in the OTEL collector log. How to fix it?

{2021-06-23 09:52:04.43621453 -0400 EDT m=+120.147512184, Level:error, Caller:go.opentelemetry.io/collector@v0.27.0/exporter/exporterhelper/queued_retry.go:173, Message:Exporting failed. Dropping data. Try enabling sending_queue to survive temporary failures., Stack:go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).send
go.opentelemetry.io/collector@v0.27.0/exporter/exporterhelper/queued_retry.go:173
go.opentelemetry.io/collector/exporter/exporterhelper.NewMetricsExporter.func2
go.opentelemetry.io/collector@v0.27.0/exporter/exporterhelper/metrics.go:103
go.opentelemetry.io/collector/consumer/consumerhelper.ConsumeMetricsFunc.ConsumeMetrics
go.opentelemetry.io/collector@v0.27.0/consumer/consumerhelper/metrics.go:29
go.opentelemetry.io/collector/service/internal/fanoutconsumer.metricsConsumer.ConsumeMetrics
go.opentelemetry.io/collector@v0.27.0/service/internal/fanoutconsumer/consumer.go:51
go.opentelemetry.io/collector/processor/batchprocessor.(*batchMetrics).export
go.opentelemetry.io/collector@v0.27.0/processor/batchprocessor/batch_processor.go:285
go.opentelemetry.io/collector/processor/batchprocessor.(*batchProcessor).sendItems
go.opentelemetry.io/collector@v0.27.0/processor/batchprocessor/batch_processor.go:183
go.opentelemetry.io/collector/processor/batchprocessor.(*batchProcessor).startProcessingCycle
go.opentelemetry.io/collector@v0.27.0/processor/batchprocessor/batch_processor.go:144}
{2021-06-23 09:52:04.436239353 -0400 EDT m=+120.147537026, Level:warn, Caller:go.opentelemetry.io/collector@v0.27.0/processor/batchprocessor/batch_processor.go:184, Message:Sender failed, Stack:}

@mxiamxia
Copy link
Member

could you attach more collector logs?

@georges-git
Copy link
Author

Hello @mxiamxia - This is all the information in the collector logs. Let me know how to fix this.

go.opentelemetry.io/collector/consumer/consumerhelper.ConsumeMetricsFunc.ConsumeMetrics
go.opentelemetry.io/collector@v0.27.0/consumer/consumerhelper/metrics.go:29
go.opentelemetry.io/collector/processor/batchprocessor.(*batchMetrics).export
go.opentelemetry.io/collector@v0.27.0/processor/batchprocessor/batch_processor.go:285
go.opentelemetry.io/collector/processor/batchprocessor.(*batchProcessor).sendItems
go.opentelemetry.io/collector@v0.27.0/processor/batchprocessor/batch_processor.go:183
go.opentelemetry.io/collector/processor/batchprocessor.(*batchProcessor).startProcessingCycle
go.opentelemetry.io/collector@v0.27.0/processor/batchprocessor/batch_processor.go:144}
{2021-06-25 17:47:42.118233801 -0400 EDT m=+18856.163846182, Level:error, Caller:go.opentelemetry.io/collector@v0.27.0/exporter/exporterhelper/queued_retry.go:173, Message:Exporting failed. Dropping data. Try enabling sending_queue to survive temporary failures., Stack:go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).send
go.opentelemetry.io/collector@v0.27.0/exporter/exporterhelper/queued_retry.go:173
go.opentelemetry.io/collector/exporter/exporterhelper.NewMetricsExporter.func2
go.opentelemetry.io/collector@v0.27.0/exporter/exporterhelper/metrics.go:103
go.opentelemetry.io/collector/consumer/consumerhelper.ConsumeMetricsFunc.ConsumeMetrics
go.opentelemetry.io/collector@v0.27.0/consumer/consumerhelper/metrics.go:29
go.opentelemetry.io/collector/processor/batchprocessor.(*batchMetrics).export
go.opentelemetry.io/collector@v0.27.0/processor/batchprocessor/batch_processor.go:285
go.opentelemetry.io/collector/processor/batchprocessor.(*batchProcessor).sendItems
go.opentelemetry.io/collector@v0.27.0/processor/batchprocessor/batch_processor.go:183
go.opentelemetry.io/collector/processor/batchprocessor.(*batchProcessor).startProcessingCycle
go.opentelemetry.io/collector@v0.27.0/processor/batchprocessor/batch_processor.go:144}
{2021-06-25 17:47:42.118273569 -0400 EDT m=+18856.163885957, Level:warn, Caller:go.opentelemetry.io/collector@v0.27.0/processor/batchprocessor/batch_processor.go:184, Message:Sender failed, Stack:}

@sethAmazon
Copy link
Member

Can you turn on debug logs please. echo "loggingLevel=DEBUG" | sudo tee -a /opt/aws/aws-otel-collector/etc/extracfg.txt then restart the collector if running on ec2

@georges-git
Copy link
Author

Attaching again the zipped log file with DEBUG logging level. Could you escalate and help resolve this as it is being going on for last few weeks.
aws-otel-collector.zip

@sethAmazon
Copy link
Member

sethAmazon commented Jun 30, 2021

{2021-06-28 12:12:53.950912958 -0400 EDT m=+60.073175088, Level:debug, Caller:github.com/open-telemetry/opentelemetry-collector-contrib/exporter/awsemfexporter@v0.22.0/cwlog_client.go:158, Message:cwlog_client: creating stream fail, Stack:}
{2021-06-28 12:12:53.950976095 -0400 EDT m=+60.073238214, Level:debug, Caller:github.com/open-telemetry/opentelemetry-collector-contrib/exporter/awsemfexporter@v0.22.0/cwlog_client.go:176, Message:CreateLogStream / CreateLogGroup has errors., Stack:}
{2021-06-28 12:12:53.950990885 -0400 EDT m=+60.073252994, Level:warn, Caller:github.com/open-telemetry/opentelemetry-collector-contrib/exporter/awsemfexporter@v0.22.0/pusher.go:280, Message:Failed to create stream token, Stack:github.com/open-telemetry/opentelemetry-collector-contrib/exporter/awsemfexporter.(*pusher).pushLogEventBatch
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/awsemfexporter@v0.22.0/pusher.go:280

I saw a similar error when I ran on ec2 but not when run inn a docker container. I think there might be a problem with the way we are getting creds. Can you try running this in a docker container and passing in the access key manually to the container with "-e AWS_ACCESS_KEY_ID={your access key here} -e AWS_SECRET_ACCESS_KEY={secret key here}" guide on how to run with docker is https://github.com/aws-observability/aws-otel-collector/blob/main/docs/developers/docker-demo.md

@georges-git
Copy link
Author

Hello @sethAmazon - Can you expand on what your are saying above? My app is running in a docker. Are you asking to run the OTEL daemon process in a docker container?

@sethAmazon
Copy link
Member

How are you passing in the credentials for otel?

@georges-git
Copy link
Author

georges-git commented Jun 30, 2021

I am not passing any credentials. I just run on my linux host - sudo /opt/aws/aws-otel-collector/bin/aws-otel-collector-ctl -c /opt/aws/aws-otel-collector/etc/config.yaml -a start.

This "aws-otel-collector" is not running in a docker. Each of my EC2 linux host has a aws role attached to it.

How would you like me to pass the credentials to this "aws-otel-collector" process running on the linux hosts.

@alolita alolita added the bug Something isn't working label Aug 4, 2021
@sahilsapolia
Copy link

I am facing the similar problem while exporting spans to aws xray. Metrics works fine for me.

{2021-08-27 00:47:35.104061377 +0000 GMT m=+761.062074882, Level:error, Caller:go.opentelemetry.io/collector@v0.29.1-0.20210630003519-14d917479ef3/exporter/exporterhelper/queued_retry.go:245, Message:Exporting failed. Try enabling retry_on_failure config option., Stack:go.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send
go.opentelemetry.io/collector@v0.29.1-0.20210630003519-14d917479ef3/exporter/exporterhelper/queued_retry.go:245
go.opentelemetry.io/collector/exporter/exporterhelper.(*tracesExporterWithObservability).send
go.opentelemetry.io/collector@v0.29.1-0.20210630003519-14d917479ef3/exporter/exporterhelper/traces.go:118
go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).send
go.opentelemetry.io/collector@v0.29.1-0.20210630003519-14d917479ef3/exporter/exporterhelper/queued_retry.go:173
go.opentelemetry.io/collector/exporter/exporterhelper.NewTracesExporter.func2
go.opentelemetry.io/collector@v0.29.1-0.20210630003519-14d917479ef3/exporter/exporterhelper/traces.go:97
go.opentelemetry.io/collector/consumer/consumerhelper.ConsumeTracesFunc.ConsumeTraces
go.opentelemetry.io/collector@v0.29.1-0.20210630003519-14d917479ef3/consumer/consumerhelper/traces.go:29
go.opentelemetry.io/collector/receiver/otlpreceiver/internal/trace.(*Receiver).Export
go.opentelemetry.io/collector@v0.29.1-0.20210630003519-14d917479ef3/receiver/otlpreceiver/internal/trace/otlp.go:62
go.opentelemetry.io/collector/model/otlpgrpc.rawTracesServer.Export
go.opentelemetry.io/collector@v0.29.1-0.20210630003519-14d917479ef3/model/otlpgrpc/traces.go:85
go.opentelemetry.io/collector/model/internal/data/protogen/collector/trace/v1._TraceService_Export_Handler
go.opentelemetry.io/collector@v0.29.1-0.20210630003519-14d917479ef3/model/internal/data/protogen/collector/trace/v1/trace_service.pb.go:210
google.golang.org/grpc.(*Server).processUnaryRPC
google.golang.org/grpc@v1.38.0/server.go:1286
google.golang.org/grpc.(*Server).handleStream
google.golang.org/grpc@v1.38.0/server.go:1609
google.golang.org/grpc.(*Server).serveStreams.func1.2
google.golang.org/grpc@v1.38.0/server.go:934}^M
{2021-08-27 00:47:35.104247064 +0000 GMT m=+761.062260500, Level:error, Caller:go.opentelemetry.io/collector@v0.29.1-0.20210630003519-14d917479ef3/exporter/exporterhelper/queued_retry.go:175, Message:Exporting failed. Dropping data. Try enabling sending_queue to survive temporary failures., Stack:go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).send
go.opentelemetry.io/collector@v0.29.1-0.20210630003519-14d917479ef3/exporter/exporterhelper/queued_retry.go:175
go.opentelemetry.io/collector/exporter/exporterhelper.NewTracesExporter.func2
go.opentelemetry.io/collector@v0.29.1-0.20210630003519-14d917479ef3/exporter/exporterhelper/traces.go:97
go.opentelemetry.io/collector/consumer/consumerhelper.ConsumeTracesFunc.ConsumeTraces
go.opentelemetry.io/collector@v0.29.1-0.20210630003519-14d917479ef3/consumer/consumerhelper/traces.go:29
go.opentelemetry.io/collector/receiver/otlpreceiver/internal/trace.(*Receiver).Export
go.opentelemetry.io/collector@v0.29.1-0.20210630003519-14d917479ef3/receiver/otlpreceiver/internal/trace/otlp.go:62
go.opentelemetry.io/collector/model/otlpgrpc.rawTracesServer.Export
go.opentelemetry.io/collector@v0.29.1-0.20210630003519-14d917479ef3/model/otlpgrpc/traces.go:85
go.opentelemetry.io/collector/model/internal/data/protogen/collector/trace/v1._TraceService_Export_Handler
go.opentelemetry.io/collector@v0.29.1-0.20210630003519-14d917479ef3/model/internal/data/protogen/collector/trace/v1/trace_service.pb.go:210
google.golang.org/grpc.(*Server).processUnaryRPC
google.golang.org/grpc@v1.38.0/server.go:1286
google.golang.org/grpc.(*Server).handleStream

@sahilsapolia
Copy link

sahilsapolia commented Aug 27, 2021

I ran a debug and found that exporter is sending the request with the trace segment but getting the response error with empty line.
Shortening the trace string to make it more readable.

{​​​​​​​2021-08-27 19:31:33.259092001 +0000 GMT m=+68199.217105694, Level:debug, Caller:github.com/open-telemetry/opentelemetry-collector-contrib/exporter/awsxrayexporter@v0.29.1-0.20210630203112-81d57601b1bc/awsxray.go:54, Message:TracesExporter, Stack:}​​​​​​​^M
{​​​​​​​2021-08-27 19:31:33.259609948 +0000 GMT m=+68199.217624182, Level:debug, Caller:github.com/open-telemetry/opentelemetry-collector-contrib/exporter/awsxrayexporter@v0.29.1-0.20210630203112-81d57601b1bc/awsxray.go:78, Message:request: {​​​​​​​
TraceSegmentDocuments: ["{​​​​​​​"name":"test- .......
......
.......}​​​​​​​, Stack:}​​​​​​​^M
{​​​​​​​2021-08-27 19:31:33.380405084 +0000 GMT m=+68199.338418868, Level:debug, Caller:github.com/open-telemetry/opentelemetry-collector-contrib/exporter/awsxrayexporter@v0.29.1-0.20210630203112-81d57601b1bc/awsxray.go:81, Message:response error, Stack:}​​​​​​​^M
{​​​​​​​2021-08-27 19:31:33.38056959 +0000 GMT m=+68199.338583498, Level:debug, Caller:github.com/open-telemetry/opentelemetry-collector-contrib/exporter/awsxrayexporter@v0.29.1-0.20210630203112-81d57601b1bc/awsxray.go:85, **Message:response: {​​​​​​​

}​​​​​​​, Stack:}​​​​​​​**^M

@github-actions
Copy link
Contributor

github-actions bot commented Jan 2, 2022

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.

@sethAmazon
Copy link
Member

Hi @georges-git can we do a live debug session sometime on this?

@aureliomarcoag
Copy link

I had this problem on a Python Lambda function. After tweaking the code, I could reproduce the problem by simply using boto3.client("s3").download_file() inside a tempfile context manager, so something like this:

s3_client = boto3.client("s3")
with tempfile.NamedTemporaryFile() as tmp_file:
    boto3.client("s3").download_file(Bucket="mybucket", Key="key/object", Filename=tmp_file.name)

I eventually stumbled upon aws-observability/aws-otel-lambda#10 so I assume this might be related. I switched from download_file to get_object and the error went away. I also tested with upload_file and that caused the same error to happen.

@github-actions
Copy link
Contributor

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.

@github-actions github-actions bot added the stale label Mar 20, 2022
@alolita alolita removed the stale label Apr 4, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Jun 5, 2022

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.

@github-actions github-actions bot added the stale label Jun 5, 2022
@github-actions
Copy link
Contributor

This issue was closed because it has been marked as stall for 30 days with no activity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale
Projects
None yet
Development

No branches or pull requests

7 participants