boto3 S3 upload_file method does not have the correct trace context #298

wangzlei · 2021-01-27T05:01:23Z

opentelemetry-instrumentation-botocore bug:
X-Ray spans generated by AWS boto3 S3 upload_file method requests do not have the same trace id as their parent and sibling spans and the parent_id is set to null which causes broken traces.

Steps to reproduce

BotocoreInstrumentor().instrument(tracer_provider=trace.get_tracer_provider())

with tracer.start_as_current_span("foo"):
    s3 = boto3.resource('s3')
    s3.meta.client.upload_file('fromFile', 'myS3Bucket', 'toFile')    # upload a file to s3
    s3.bucket.all()    # list bucket

What is the expected behavior?
upload_file span should be the child of foo, get one trace: (foo -> (s3.upload_file, s3.list))

What is the actual behavior?
Get 2 traces, upload_file is not under foo:
trace1 (foo -> s3.list)
trace2 (s3.upload_file)

Additional context
boto3 s3 upload_file uses multi-part to increase throughput, botocore instrumentation does not cover this special case. Refer an analysis.

The text was updated successfully, but these errors were encountered:

github-actions · 2021-04-06T03:59:50Z

This issue was marked stale due to lack of activity. It will be closed in 30 days.

s4ke · 2022-01-05T22:23:29Z

Is there any information why this happens? We are currently facing this issue.

ericksoen · 2022-10-14T00:59:35Z

Had to debug a similar issue today with the download_file method and wanted to share some of my analysis. I've previously encountered the type of broken trace issue that @wangzlei describes in this issue when working with apps that use multi-threading.

In that scenario, the auto-instrumentation libraries correctly instrument outgoing requests but can't correctly associate the trace context to the main thread. We were able to solve those issues using code similar to the following:

import time
import threading

from opentelemetry import context 

def thread_worker(thread_id, context):
  with tracer.start_as_current_span('thread-worker', context) as span:
    span.set_attribute('thread_id', thread_id)
    time.sleep(0.5)

threads = list()
trace_context = context.get_context()

for i in range(4):
  thread = threading.Thread(target=thread_worker, args = (i, trace_context))
  threads.append(thread)
  thread.start()

for t in threads:
  t.join()

Based on the similar broken trace behavior and the observation that the status code for s3:GetObject executed via download_file returns a 206/Partial Content status code, it seems plausible that the download_file/upload_file makes similar use of background threads to reduce request latency.

Since the spans in question are from auto-instrumented method, there's no way to explicitly pass the trace context the way I did in my sample code.

What we did to try to work around this issue:

Add a request/response hook when configuring BotocoreInstrumentor to drop specific spans, e.g., s3:GetObject (issue: outside of the sampler, the opentelemetry API/SDK doesn't provide a method to drop spans)
Add a request/response hook to mark the span as not recording since these are not transmitted (issue: the span class provides a read-only is_recording() property but no way to change the value
Add a request/response hook to associate the span with the correct parent thread (issue: trace context is immutable)

What we did that actually worked:

BotocoreInstrumentor().uninstrument() before invoking s3.download_file followed by BotocoreInstrumentor().instrument() after download completes.
** Note: for our use case, removing extraneous, loud, and difficult to understand data was more desirable than seeing the broken traces for 900+ s3:getObject calls
** Note: we are still able to observe performance and error codes by wrapping the s3.download_file in its own trace as shown in the next code sample

with tracer.start_as_current_span('download-file-from-s3') as download_span:
    # Note: see GitHub issue https://github.com/open-telemetry/opentelemetry-python-contrib/issues/298
    BotocoreInstrumentor().uninstrument()
    s3.download_file(bucket_name, key_name, output_file_name)
    BotocoreInstrumentor().instrument()
    download_span.set_attributes({
      "s3.bucket_name": bucket_name,
      "s3.key_name": key_name,
    })

s4ke · 2022-10-14T07:06:43Z

Thanks @ericksoen this looks like a practicable work around to put into a library function. Tyvm for this investigation!

wangzlei added the bug Something isn't working label Jan 27, 2021

wangzlei mentioned this issue Jan 27, 2021

python S3 upload_file method aws-observability/aws-otel-lambda#10

Closed

wangzlei mentioned this issue Mar 15, 2021

REQUEST: New membership for wangzlei open-telemetry/community#679

Closed

6 tasks

marcellp mentioned this issue Apr 1, 2021

Worker to support AWS X-Ray hms-dbmi-cellenics/worker#93

Merged

9 tasks

github-actions bot added the backlog label Apr 6, 2021

srikanthccv added help wanted Extra attention is needed triaged and removed backlog labels May 30, 2021

bobthemighty mentioned this issue Nov 27, 2023

Add instrumentation for threading module #1582

Closed

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

boto3 S3 upload_file method does not have the correct trace context #298

boto3 S3 upload_file method does not have the correct trace context #298

wangzlei commented Jan 27, 2021

github-actions bot commented Apr 6, 2021

s4ke commented Jan 5, 2022 •

edited

Loading

ericksoen commented Oct 14, 2022

s4ke commented Oct 14, 2022

boto3 S3 upload_file method does not have the correct trace context #298

boto3 S3 upload_file method does not have the correct trace context #298

Comments

wangzlei commented Jan 27, 2021

github-actions bot commented Apr 6, 2021

s4ke commented Jan 5, 2022 • edited Loading

ericksoen commented Oct 14, 2022

s4ke commented Oct 14, 2022

s4ke commented Jan 5, 2022 •

edited

Loading