-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry settings not defined on s3 filesystem #2788
Comments
@jmurray-clarify this is good call out! I don't think we propagate the retry settings to the pyarrow S3Filesystem for writes. Do you want to try taking a stab at it and making a contribution? |
@samster25 , I created a PR for this. Thanks. |
samster25
pushed a commit
that referenced
this issue
Sep 6, 2024
Addresses: #2788 Propogates the S3Config.num_tries config to the pyarrow S3 filesystem. Note that the other relevant parameters on S3Config, `retry_mode` and `retry_initial_backoff_ms`, are ignored as pyarrow's [S3RetryStrategy](https://github.com/apache/arrow/blob/ab0a40ee34217070f14027776682074c55d0b507/python/pyarrow/_s3fs.pyx#L112) only has one parameter `max_attempts`. Note that this only addresses S3. GCSConfig and AzureConfig do not have retry settings.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug
When writing a high-cardinality (16k) partitioned table, we will frequent encounter throttling errors from S3:
Looking at the S3Config doc, we have some retry settings:
retry_mode
,num_tries
,retry_initial_backoff_ms
. However, these are not used for this call.This uses the S3 filesystem created here:
Daft/daft/filesystem.py
Lines 214 to 227 in 60ebf82
We should be defining a
retry_strategy
here.Expected behavior
Respect retry settings from S3Config.
The text was updated successfully, but these errors were encountered: