Retry settings not defined on s3 filesystem #2788

jmurray-clarify · 2024-09-04T15:20:44Z

Describe the bug
When writing a high-cardinality (16k) partitioned table, we will frequent encounter throttling errors from S3:

ray.exceptions.RayTaskError(OSError): �[36mray::WriteFile [Stage:16]()�[39m (pid=15273, ip=10.1.71.222)
  File "/mnt/tmp/ray/session_2024-09-04_14-28-17_937685_23627/runtime_resources/pip/17241f6ccdf1c37bf076255dab7d5e63265f7e4d/virtualenv/lib64/python3.9/site-packages/daft/runners/ray_runner.py", line 356, in single_partition_pipeline
    return build_partitions(instruction_stack, partial_metadatas, *inputs)
  File "/tmp/ray/session_2024-09-04_14-28-17_937685_23627/runtime_resources/pip/17241f6ccdf1c37bf076255dab7d5e63265f7e4d/virtualenv/lib64/python3.9/site-packages/daft/runners/ray_runner.py", line 334, in build_partitions
    partitions = instruction.run(partitions)
  File "/tmp/ray/session_2024-09-04_14-28-17_937685_23627/runtime_resources/pip/17241f6ccdf1c37bf076255dab7d5e63265f7e4d/virtualenv/lib64/python3.9/site-packages/daft/execution/execution_step.py", line 343, in run
    return self._write_file(inputs)
  File "/tmp/ray/session_2024-09-04_14-28-17_937685_23627/runtime_resources/pip/17241f6ccdf1c37bf076255dab7d5e63265f7e4d/virtualenv/lib64/python3.9/site-packages/daft/execution/execution_step.py", line 347, in _write_file
    partition = self._handle_file_write(
  File "/tmp/ray/session_2024-09-04_14-28-17_937685_23627/runtime_resources/pip/17241f6ccdf1c37bf076255dab7d5e63265f7e4d/virtualenv/lib64/python3.9/site-packages/daft/execution/execution_step.py", line 362, in _handle_file_write
    return table_io.write_tabular(
  File "/tmp/ray/session_2024-09-04_14-28-17_937685_23627/runtime_resources/pip/17241f6ccdf1c37bf076255dab7d5e63265f7e4d/virtualenv/lib64/python3.9/site-packages/daft/table/table_io.py", line 490, in write_tabular
    _write_tabular_arrow_table(
  File "/tmp/ray/session_2024-09-04_14-28-17_937685_23627/runtime_resources/pip/17241f6ccdf1c37bf076255dab7d5e63265f7e4d/virtualenv/lib64/python3.9/site-packages/daft/table/table_io.py", line 886, in _write_tabular_arrow_table
    _retry_with_backoff(
  File "/tmp/ray/session_2024-09-04_14-28-17_937685_23627/runtime_resources/pip/17241f6ccdf1c37bf076255dab7d5e63265f7e4d/virtualenv/lib64/python3.9/site-packages/daft/table/table_io.py", line 819, in _retry_with_backoff
    return func()
  File "/tmp/ray/session_2024-09-04_14-28-17_937685_23627/runtime_resources/pip/17241f6ccdf1c37bf076255dab7d5e63265f7e4d/virtualenv/lib64/python3.9/site-packages/daft/table/table_io.py", line 867, in write_dataset
    pads.write_dataset(
  File "/tmp/ray/session_2024-09-04_14-28-17_937685_23627/runtime_resources/pip/17241f6ccdf1c37bf076255dab7d5e63265f7e4d/virtualenv/lib64/python3.9/site-packages/pyarrow/dataset.py", line 1030, in write_dataset
    _filesystemdataset_write(
  File "pyarrow/_dataset.pyx", line 4010, in pyarrow._dataset._filesystemdataset_write
  File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status
OSError: When completing multiple part upload for key '<REDACTED>' in bucket '<REDACTED>': AWS Error SLOW_DOWN during CompleteMultipartUpload operation: Please reduce your request rate.

Looking at the S3Config doc, we have some retry settings: retry_mode , num_tries , retry_initial_backoff_ms. However, these are not used for this call.

This uses the S3 filesystem created here:

Daft/daft/filesystem.py

Lines 214 to 227 in 60ebf82

    
           if protocol == "s3": 
        
               translated_kwargs = {} 
        
               if io_config is not None and io_config.s3 is not None: 
        
                   s3_config = io_config.s3 
        
                   _set_if_not_none(translated_kwargs, "endpoint_override", s3_config.endpoint_url) 
        
                   _set_if_not_none(translated_kwargs, "access_key", s3_config.key_id) 
        
                   _set_if_not_none(translated_kwargs, "secret_key", s3_config.access_key) 
        
                   _set_if_not_none(translated_kwargs, "session_token", s3_config.session_token) 
        
                   _set_if_not_none(translated_kwargs, "region", s3_config.region_name) 
        
                   _set_if_not_none(translated_kwargs, "anonymous", s3_config.anonymous) 
        
               resolved_filesystem = S3FileSystem(**translated_kwargs) 
        
               resolved_path = resolved_filesystem.normalize_path(_unwrap_protocol(path)) 
        
               return resolved_path, resolved_filesystem

We should be defining a retry_strategy here.

Expected behavior
Respect retry settings from S3Config.

The text was updated successfully, but these errors were encountered:

samster25 · 2024-09-06T08:07:54Z

@jmurray-clarify this is good call out! I don't think we propagate the retry settings to the pyarrow S3Filesystem for writes. Do you want to try taking a stab at it and making a contribution?

jmurray-clarify · 2024-09-06T18:19:00Z

@samster25 , I created a PR for this. Thanks.

#2800

Addresses: #2788 Propogates the S3Config.num_tries config to the pyarrow S3 filesystem. Note that the other relevant parameters on S3Config, `retry_mode` and `retry_initial_backoff_ms`, are ignored as pyarrow's [S3RetryStrategy](https://github.com/apache/arrow/blob/ab0a40ee34217070f14027776682074c55d0b507/python/pyarrow/_s3fs.pyx#L112) only has one parameter `max_attempts`. Note that this only addresses S3. GCSConfig and AzureConfig do not have retry settings.

jmurray-clarify mentioned this issue Sep 6, 2024

[BUG] Propogate S3Config.num_tries to pyarrow S3 filesystem #2800

Merged

jmurray-clarify closed this as completed Sep 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retry settings not defined on s3 filesystem #2788

Retry settings not defined on s3 filesystem #2788

jmurray-clarify commented Sep 4, 2024

samster25 commented Sep 6, 2024

jmurray-clarify commented Sep 6, 2024

Retry settings not defined on s3 filesystem #2788

Retry settings not defined on s3 filesystem #2788

Comments

jmurray-clarify commented Sep 4, 2024

samster25 commented Sep 6, 2024

jmurray-clarify commented Sep 6, 2024