-
-
Notifications
You must be signed in to change notification settings - Fork 863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Threaded parallel uploads on S3 #1337
Comments
Thanks, clearly my answer in the other thread is wrong. The correct |
Would definitely take documentation (or a feature!) for this one. It's session config vs transfer config. |
Thanks! The best way to override the class MyS3Storage(S3Storage):
def __init__(self, **settings):
# I don't have an easy access to settings before the init
super().__init__(**settings)
# Is there any side effect if I change the config after its first definition
self.config = Config(
s3={"addressing_style": self.addressing_style},
signature_version=self.signature_version,
proxies=self.proxies,
max_pool_connections=70,
)
class MyS3Storage(S3Storage):
# No easy access to settings here
config = Config(
s3={"addressing_style": "virtual"},
signature_version=5,
proxies=None,
max_pool_connections=70,
) What do you think about something like: class S3Storage(CompressStorageMixin, BaseStorage):
"""
Amazon Simple Storage Service using Boto3
This storage backend supports opening files in read or write
mode and supports streaming(buffering) data in chunks to S3
when writing.
"""
def __init__(self, **settings):
omitted = object()
super().__init__(**settings)
...
self.config = self.get_config()
...
def get_config(self):
# If config provided in subclass, signature_version and addressing_style
# settings/args are ignored.
if hasattr(self, 'config'):
# Keep retro-compatibility
return self.config
return Config(
s3={"addressing_style": self.addressing_style},
signature_version=self.signature_version,
proxies=self.proxies,
) |
Or the same API than |
am I right in understanding that using a thread pool TransferConfig will benefit upload of large files (they get turned into concurrent multipart upload) but not really for "lots of small files" case? seems like underlying Django storage implementation wants to upload each file one-by-one, though there is a workaround for threading multiple files outlined in comments to this old issue: https://code.djangoproject.com/ticket/23517 |
I'm uploading a lot of small files. I can thread the uploads using
ThreadPool
and it works great.If I use more than 10 threads, I have the warning
Connection pool is full, discarding connection: my.s3.server.url. Connection pool size: 10
.I found 2 issues about this problem (#1106 and #610).
I tried to use
TransferConfig
, without success.I tried with
use_threads=False
anduse_threads=True
.What should I do to increase the max number of threads?
The text was updated successfully, but these errors were encountered: