Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom emitter based on boto3 creates an infinite loop in the SDK #379

Open
nspsngh opened this issue Jan 23, 2023 · 3 comments
Open

Custom emitter based on boto3 creates an infinite loop in the SDK #379

nspsngh opened this issue Jan 23, 2023 · 3 comments

Comments

@nspsngh
Copy link

nspsngh commented Jan 23, 2023

Hello,

I have a requirement for instrumenting a python application (specifically, an AWS Glue ETL application) using AWS X-Ray SDK but cannot have the X-Ray daemon running. Consequently, I must provide a custom emitter that can stand in place of the default UDPEmitter. I can go into more details of the implementation as required, the gist is that the implementation works as intended until patch_all method is called. The patcher will patch up the botocore and other lower level libraries, more pertinently httplib.

After having run the patcher, when PutTraceSegments API call is attempted, the application goes into infinite loop when SSO is enabled. The infinity manifests more specifically after having made a call to GetRoleCredentials API. Although not tested, I suspect this condition will occur even if SSO is not enabled and other types of credentials are used.

Goes something like below (my emitter is called HttpEmitter):

AWSXRayRecorder.capture() -> AWSXRayRecorder.record_subsegment() -> HttpEmitter.send_entity() -> GetRoleCredentials (API) -> AWSXRayRecorder.capture() -> AWSXRayRecorder.record_subsegment() -> HttpEmitter.send_entity() ...

Referencing the patcher module for botocore at aws_xray_sdk/ext/botocore/patch.py, the issue is resolved by excluding GetRoleCredentials from tracing. In other words, if GetRoleCredentials is excluded from patching, the custom emitter based on boto3 works as expected.

def _xray_traced_botocore(wrapped, instance, args, kwargs):
    service = instance._service_model.metadata["endpointPrefix"]
    if service == 'xray':
        # skip tracing for SDK built-in sampling pollers
        if ('GetSamplingRules' in args or
            'GetSamplingTargets' in args or
                'PutTraceSegments' in args):
            return wrapped(*args, **kwargs)

    if 'GetRoleCredentials' in args:
        return wrapped(*args, **kwargs)
        
    return xray_recorder.record_subsegment(
        wrapped, instance, args, kwargs,
        name=service,
        namespace='aws',
        meta_processor=aws_meta_processor,
    )

Specifcally, this if is added:

if 'GetRoleCredentials' in args:
        return wrapped(*args, **kwargs)

However, I am not certain if this is in fact the correct solution or symptomatic of a more fundamental problem elsewhere, hence the ticket. I say this because the infinite loop can be made to appear just by patching httplib alone but it may well be for the same reason as above. In order to move forward, I have, for the moment, replaced the patcher for botocore with a custom implementation that includes the shown exclusion.

Further guidance is appreciated.

@srprash
Copy link
Contributor

srprash commented Feb 6, 2023

Hi @nsp-aws
I think your solution to exclude the GetRoleCredentials is correct in this case. From the SDK side itself, it may be difficult to identify all the possible AWS operations that may be called in a custom emitter. One solution that can possibly work is to let users provide a set of operations they want to ignore when botocore is patched. Similar to what has been done for the httplib patch.

Also, not sure how feasible it would be for you to use OpenTelemetry, but you can try writing your own SpanExporter (refer to the ConsoleSpanExporter) if that works for you.

@nspsngh
Copy link
Author

nspsngh commented Feb 9, 2023

Appreciate the response. Yes, having the ability to configure the operations that should be excluded would be adequate. OpenTelemetry is not being used on the current project so writing a custom exporter is not quite feasible. The core idea here was not have an external compute that would host either the collector or the X-RAY daemon. We needed to run the entire tracing system in-process.

Are you willing to accept a PR for configuring the operations for exclusion when patching boto?

@srprash
Copy link
Contributor

srprash commented Feb 9, 2023

Are you willing to accept a PR for configuring the operations for exclusion when patching boto?

Absolutely! I will be happy to review such a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants