Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

boto3 S3 upload_file method does not have the correct trace context #298

Open
wangzlei opened this issue Jan 27, 2021 · 4 comments
Open
Labels
bug Something isn't working help wanted Extra attention is needed triaged

Comments

@wangzlei
Copy link

opentelemetry-instrumentation-botocore bug:
X-Ray spans generated by AWS boto3 S3 upload_file method requests do not have the same trace id as their parent and sibling spans and the parent_id is set to null which causes broken traces.

Steps to reproduce

BotocoreInstrumentor().instrument(tracer_provider=trace.get_tracer_provider())

with tracer.start_as_current_span("foo"):
    s3 = boto3.resource('s3')
    s3.meta.client.upload_file('fromFile', 'myS3Bucket', 'toFile')    # upload a file to s3
    s3.bucket.all()    # list bucket

What is the expected behavior?
upload_file span should be the child of foo, get one trace: (foo -> (s3.upload_file, s3.list))

What is the actual behavior?
Get 2 traces, upload_file is not under foo:
trace1 (foo -> s3.list)
trace2 (s3.upload_file)

Additional context
boto3 s3 upload_file uses multi-part to increase throughput, botocore instrumentation does not cover this special case. Refer an analysis.

@github-actions
Copy link

github-actions bot commented Apr 6, 2021

This issue was marked stale due to lack of activity. It will be closed in 30 days.

@srikanthccv srikanthccv added help wanted Extra attention is needed triaged and removed backlog labels May 30, 2021
@s4ke
Copy link

s4ke commented Jan 5, 2022

Is there any information why this happens? We are currently facing this issue.

@ericksoen
Copy link

Had to debug a similar issue today with the download_file method and wanted to share some of my analysis. I've previously encountered the type of broken trace issue that @wangzlei describes in this issue when working with apps that use multi-threading.

In that scenario, the auto-instrumentation libraries correctly instrument outgoing requests but can't correctly associate the trace context to the main thread. We were able to solve those issues using code similar to the following:

import time
import threading

from opentelemetry import context 

def thread_worker(thread_id, context):
  with tracer.start_as_current_span('thread-worker', context) as span:
    span.set_attribute('thread_id', thread_id)
    time.sleep(0.5)

threads = list()
trace_context = context.get_context()

for i in range(4):
  thread = threading.Thread(target=thread_worker, args = (i, trace_context))
  threads.append(thread)
  thread.start()

for t in threads:
  t.join()

Based on the similar broken trace behavior and the observation that the status code for s3:GetObject executed via download_file returns a 206/Partial Content status code, it seems plausible that the download_file/upload_file makes similar use of background threads to reduce request latency.

Since the spans in question are from auto-instrumented method, there's no way to explicitly pass the trace context the way I did in my sample code.

What we did to try to work around this issue:

  • Add a request/response hook when configuring BotocoreInstrumentor to drop specific spans, e.g., s3:GetObject (issue: outside of the sampler, the opentelemetry API/SDK doesn't provide a method to drop spans)
  • Add a request/response hook to mark the span as not recording since these are not transmitted (issue: the span class provides a read-only is_recording() property but no way to change the value
  • Add a request/response hook to associate the span with the correct parent thread (issue: trace context is immutable)

What we did that actually worked:

  • BotocoreInstrumentor().uninstrument() before invoking s3.download_file followed by BotocoreInstrumentor().instrument() after download completes.
    ** Note: for our use case, removing extraneous, loud, and difficult to understand data was more desirable than seeing the broken traces for 900+ s3:getObject calls
    ** Note: we are still able to observe performance and error codes by wrapping the s3.download_file in its own trace as shown in the next code sample
with tracer.start_as_current_span('download-file-from-s3') as download_span:
    # Note: see GitHub issue https://github.com/open-telemetry/opentelemetry-python-contrib/issues/298
    BotocoreInstrumentor().uninstrument()
    s3.download_file(bucket_name, key_name, output_file_name)
    BotocoreInstrumentor().instrument()
    download_span.set_attributes({
      "s3.bucket_name": bucket_name,
      "s3.key_name": key_name,
    })

@s4ke
Copy link

s4ke commented Oct 14, 2022

Thanks @ericksoen this looks like a practicable work around to put into a library function. Tyvm for this investigation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed triaged
Projects
None yet
Development

No branches or pull requests

4 participants