Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3Storage causing slowdowns due to frequent reinitializations of connections #1472

Open
jleclanche opened this issue Nov 15, 2024 · 0 comments

Comments

@jleclanche
Copy link
Contributor

I've been debugging an issue in my app that was causing severe slowdowns in a django-ninja based API that exposes django-storages s3 urls for various fields.

The bottleneck I found was here:

self._connections.connection = session.resource(

Basically the ._connections.connection is being recreated quite frequently. This is super super slow because in boto3, initializing the connection "does things" with a ton of different aws services. Optimizing boto3 is out of my depth but I found that the real culprit is this:

self._connections = threading.local()
self._unsigned_connections = threading.local()

Those threading.local() objects were kinda suspicious to my initially. I did some digging, they were added back in 2017, in commit 142e822 by @tomkins - the commit message nicely documents the reasoning, and the documentation still backs it up:
https://boto3.amazonaws.com/v1/documentation/api/latest/guide/resources.html#multithreading-multiprocessing

I did some testing and making these thread-unsafe removes the issue. I wondered also if this were related to the setstate/getstate added in commit 82e18aa but I couldn't validate either way.

Now to be clear, this is being reset multiple times per request. I am not sure why this was not caught before. I'm using Django 5.2 -- Maybe Django's underlying threading structure changed in recent years with the addition of async support and it's surfacing now. But either way it's massively slowing down some queries.

I've managed to reproduce egregious instances of it with eg. an OrganizationMember model which has an avatar, and displaying 10 org members in a django-ninja api. This makes the class instanciate 10 times and each time took ~500ms on a t2.micro.

A super nasty bug. Gut feelings:

  1. It's possible this is being slower than normal for some other reason, which would be why it's not been caught until now.
  2. Regardless of whether it's reset multiple times per connection, this adds a crazy high amount of latency to each request.
  3. It may be worth re-testing in recent versions of boto3 whether this issue still happens and, if it does, figuring out what makes it happen. I don't have my hopes up on fixing boto3 to be thread-safe but it could at least help surface whether the way the threadlocal state is being stored is overkill or not.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant