Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No retries for ServerTimeoutError #876

Closed
4 of 6 tasks
mariaines opened this issue Aug 19, 2021 · 7 comments · Fixed by #877
Closed
4 of 6 tasks

No retries for ServerTimeoutError #876

mariaines opened this issue Aug 19, 2021 · 7 comments · Fixed by #877

Comments

@mariaines
Copy link

Describe the bug
Hi, we seem to be experiencing no retries for ServerTimeoutErrors, despite passing in a config with standard retry mode and total_max_attempts=5. We see occasional ServerTimeoutErrors across various clients (dynamodb, sqs, apigateway). We have a separate production app that uses just boto3 and it never sees these ServerTimeoutErrors (using the default botocore config which I understand has standard retry mode and total_max_attempts=3).

Including a screenshot of a Sentry error that shows attempts = 1 by the time we raise the exception, indicating it was not retried. We also have timing logs that show the dynamo call fails after 10s, which is our configured connect_timeout.
Screen Shot 2021-08-19 at 1 15 24 PM

sample code:

from contextlib import AsyncExitStack
import aiobotocore
import asyncio

session = aiobotocore.get_session()
_exit_stack = AsyncExitStack()
config = aiobotocore.config.AioConfig(
    connect_timeout=10,
    read_timeout=10,
    retries={
        "mode": "standard",
        "total_max_attempts": 3,
    },
)

dynamodb_client = None

async def get_dynamodb_client():
    global dynamodb_client
    if dynamodb_client is None:
        dynamodb_client = await _exit_stack.enter_async_context(session.create_client("dynamodb", config=config, region_name="us-west-2"))
    return dynamodb_client


def lambda_handler(event, context):
    result = asyncio.get_event_loop().run_until_complete(_lambda_handler(event, context))
    return result

async def _lambda_handler(event, context):
    dynamodb_client = await get_dynamodb_client()
    await dynamodb_client.get_item(...)

Checklist

  • I have reproduced in environment where pip check passes without errors
  • I have provided pip freeze results
  • I have provided sample code or detailed way to reproduce
  • I have tried the same code in botocore to ensure this is an aiobotocore specific issue
  • I have tried similar code in aiohttp to ensure this is is an aiobotocore specific issue
    • Retry logic happens at the botocore layer, not aiohttp
  • I have checked the latest and older versions of aiobotocore/aiohttp/python to see if this is a regression / injection
    • I'm not able to upgrade (or downgrade) in production without some significant testing, and since this only happens occasionally I don't think I'll be able to reproduce in a test environment. Will try though.

pip freeze results

aiobotocore==1.2.1
aiodataloader==0.2.0
aiohttp==3.7.3
aioitertools==0.7.1
aioredis==1.3.1
aioredis-cluster==1.5.2
ariadne==0.12.0
async-timeout==3.0.1
attrs==20.3.0
boto3==1.16.52
botocore==1.19.52
certifi==2020.12.5
chardet==3.0.4
elasticsearch==6.8.1
elasticsearch-dsl==6.3.1
expiringdict==1.1.4
graphql-core==3.0.5
hiredis==1.1.0
idna==2.10
itsdangerous==0.24
Jinja2==3.0.0
jmespath==0.10.0
launchdarkly-server-sdk==7.0.1
mangum==0.10.0
MarkupSafe==2.0.1
mmh3==2.5.1
multidict==5.1.0
pycryptodome==3.10.1
PyJWT==1.7.1
pyRFC3339==1.1
python-dateutil==2.8.1
pytz==2021.1
requests==2.25.1
s3transfer==0.3.4
semver==2.13.0
sentry-sdk==1.3.1
six==1.15.0
starlette==0.13.8
twilio==6.57.0
typing-extensions==3.7.4.3
ujson==4.0.2
ulid-py==1.1.0
urllib3==1.26.3
wrapt==1.12.1
yarl==1.6.3

Environment:

  • Python Version: 3.8
  • OS name and version: AWS Lambda

Additional context
Add any other context about the problem here.

@thehesiod
Copy link
Collaborator

taking a look

@thehesiod
Copy link
Collaborator

interesting, it will retry if you don't specify retries in the config

@thehesiod
Copy link
Collaborator

I'm going to refactor aiohttp to be behind httpsession.py like in botocore for more parallelism and to have the exception wrappers behave similarly

@mariaines
Copy link
Author

mariaines commented Aug 20, 2021

Ooh, interesting. Thanks for taking a look! Sounds like a workaround for now would be to not specify retries - we'll get the default 3 instead of 5 for total_max_attempts, but 2 retries is better than 0 :)

Thanks again, will look forward to the fix.

@thehesiod thehesiod linked a pull request Aug 20, 2021 that will close this issue
@thehesiod
Copy link
Collaborator

@mariaines if you wouldn't mind testing the fix that would be great, thanks! I added a integration test so it won't happen again

@mariaines
Copy link
Author

mariaines commented Aug 20, 2021

Sure, happy to, but will need to figure out how to install from github via aws sam... investigating

Edit: can install from git in requirements.txt :), deploying...

@thehesiod
Copy link
Collaborator

to all those following this, you now have to set the AIOBOTOCORE_DEPRECATED_1_4_0_APIS env var for the retries to fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants