Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] AWS SDK 1.11 support in pyarrow wheels? #42154

Open
johnkerl opened this issue Jun 14, 2024 · 14 comments
Open

[Python] AWS SDK 1.11 support in pyarrow wheels? #42154

johnkerl opened this issue Jun 14, 2024 · 14 comments

Comments

@johnkerl
Copy link

johnkerl commented Jun 14, 2024

Describe the bug, including details regarding any error messages, version, and platform.

With regard to #40262, is there a plan to update pyarrow's AWS SDK dependency from 1.10 to 1.11? We believe from

ARROW_AWSSDK_BUILD_VERSION=1.10.55

that pyarrow is currently using 1.10:

ARROW_AWSSDK_BUILD_VERSION=1.10.55

It appears that a mitigation for #40262 is in AWS SDK 1.11:
aws/aws-sdk-cpp#2710

(There's significant backstory on single-cell-data/TileDB-SOMA#2692 and on TileDB-Inc/tiledbsoma-feedstock#171, if backstory is desired. A repro is here: single-cell-data/TileDB-SOMA#2692 (comment).)

cc @pitrou

Component(s)

Python

@johnkerl
Copy link
Author

johnkerl commented Jun 14, 2024

Also cc @ihnorton @ivirshup @h-vetinari

@h-vetinari
Copy link
Contributor

Conda-forge has been building against aws 1.11 for a long time already, and this also got synched back to the conda tests in arrow itself (which have bitrotted in the meantime, but there are efforts to revive them):

In any case, we run the full test suite on the python side (not the C++ side yet, c.f. #35587), in every feedstock build, and it passes on osx. So I don't see the immediate incompatibility, which I assume is restricted to some corner cases.

You should provide a stacktrace (or ideally, a reproducer) of what fails.

PS. In the past there was once something that kept arrow stuck on aws 1.8 for a long time (which might help for context): aws/aws-sdk-cpp#1809

@kou kou changed the title AWS SDK 1.11 support? [C++] AWS SDK 1.11 support? Jun 15, 2024
@h-vetinari
Copy link
Contributor

Looks like TileDB-Inc/TileDB-Py#1990 is relevant, but again, you should really provide an example where arrow crashes or does something wrong, not another downstream project. The fact that the import order seems to matter is already ground for suspecting that there's something else going on here.

@ihnorton
Copy link

ihnorton commented Jun 17, 2024

The specific question here is if/when will arrow wheels update to AWS SDK 1.11? The reason for the question is to understand whether the mitigation for the issue described below will be available "soon", or we need to work around it (rename symbols, further patch the AWS SDK, etc?).

For more background on the issue:

1   libarrow.1601.dylib                 0x000000010e08b8d0 aws_fatal_assert + 80
2   libarrow.1601.dylib                 0x000000010e08ab98 aws_mem_acquire + 64
3   libarrow.1601.dylib                 0x000000010e09dd68 aws_string_new_from_cursor + 76
4   libarrow.1601.dylib                 0x000000010e0975e4 aws_json_value_get_from_object + 44
5   libarrow.1601.dylib                 0x000000010e083970 aws_endpoints_ruleset_new_from_string + 120
6   libarrow.1601.dylib                 0x000000010e01d5a4 _ZN3Aws3Crt9Endpoints10RuleEngineC2ERK15aws_byte_cursorS5_P13aws_allocator + 48
7   libarrow.1601.dylib                 0x000000010ddf7180 _ZN3Aws8Endpoint23DefaultEndpointProviderINS_2S321S3ClientConfigurationENS2_8Endpoint19S3BuiltInParametersENS4_25S3ClientContextParametersEEC2EPKcm + 116
8   libtiledb.dylib                     0x0000000162bf367c _ZN3Aws2S38S3ClientC2ERKNS_6Client19ClientConfigurationENS2_15AWSAuthV4Signer20PayloadSigningPolicyEbNS0_34US_EAST_1_REGIONAL_ENDPOINT_OPTIONE + 980
9   libtiledb.dylib                     0x000000016241a4bc _ZN6tiledb6common11make_sharedINS_2sm14TileDBS3ClientELi66EJRKNS2_12S3ParametersERN3Aws6Client19ClientConfigurationENS8_15AWSAuthV4Signer20PayloadSigningPolicyERKbEEENSt3__110shared_ptrIT_EERAT0__KcDpOT1_ + 92
10  libtiledb.dylib                     0x0000000162403624 _ZNK6tiledb2sm2S311init_clientEv + 3428

Summarizing: AWS has released a mitigation for the abort, implemented here: aws/aws-sdk-cpp#2710. The mitigation is available in AWS SDK 1.11. TileDB wheels have updated to AWS SDK 1.11, but AFAICT all packages need to be updated for the mitigation to work.

This issue will likely impact any other library that bundles the AWS SDK in a wheel and is loaded at the same time as pyarrow.

@johnkerl johnkerl changed the title [C++] AWS SDK 1.11 support? [C++] AWS SDK 1.11 support in pyarrow wheels? Jun 17, 2024
@kou kou changed the title [C++] AWS SDK 1.11 support in pyarrow wheels? [Python] AWS SDK 1.11 support in pyarrow wheels? Jun 18, 2024
@kou
Copy link
Member

kou commented Jun 18, 2024

PyArrow wheels don't use bundled AWS SDK for C++. It uses vcpkg's one:

https://github.com/ursacomputing/crossbow/actions/runs/9544500563/job/26303310398#step:7:559

-- Found AWS SDK for C++, Version: 1.11.201, Install Root:/opt/vcpkg/installed/amd64-linux-static-release, Platform Prefix:, Platform Dependent Libraries: pthread;crypto;ssl;z;curl

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Jun 18, 2024

The VCPKG version is currently pinned at 2023.11.20:

arrow/.env

Line 92 in eec6f17

VCPKG="a42af01b72c28a8e1d7b48107b33e4f286a55ef6" # 2023.11.20 Release

(last updated in #39622)

This could certainly use another update to a more recent vcpkg state (EDIT: this is currently being done in #42171), but so that release (as @kou also showed from the logs) already included AWS SDK 1.11 (https://github.com/microsoft/vcpkg/releases/tag/2023.11.20, it updated it from 1.11.169#2 to 1.11.201)

@jorisvandenbossche
Copy link
Member

FWIW, this also means that the latest pyarrow wheels for 16.0.0 should actually already include AWS SDK 1.11

When pyarrow and tiledb-py are installed from PyPI, and imported in the same process, making S3 requests (which happens via the AWS SDK) causes an abort on some platforms.

@ihnorton the crashes you see, is that with the latest pyarrow release from PyPI?

@ihnorton
Copy link

@ihnorton the crashes you see, is that with the latest pyarrow release from PyPI?

Yes:

pyarrow                   16.1.0                   pypi_0    pypi
python                    3.12.4          h30c5eda_0_cpython    conda-forge
readline                  8.2                  h92ec313_1    conda-forge
setuptools                70.0.0             pyhd8ed1ab_0    conda-forge
tiledb                    0.30.0                   pypi_0    pypi

@ihnorton
Copy link

This could certainly use another update to a more recent vcpkg state already included AWS SDK 1.11 (https://github.com/microsoft/vcpkg/releases/tag/2023.11.20, it updated it from 1.11.169#2 to 1.11.201)

@jorisvandenbossche thanks for the explanation. It looks like the commit I referenced didn't actually make it in to the SDK until 1.11.179: aws/aws-sdk-cpp@1f49f91

(EDIT: this is currently being done in #42171), but so that release (as @kou also showed from the logs)

Thanks for the pointer! We'll sit tight and try this again after wheels are released with that update. Much appreciated.

@jorisvandenbossche
Copy link
Member

It looks like the commit I referenced didn't actually make it in to the SDK until 1.11.179: aws/aws-sdk-cpp@1f49f91

That should still mean this is included in the pyarrow 16.0.0 wheels, AFAIK (because it should have used 1.11.201)

@johnkerl
Copy link
Author

We have tested 16.0 and 17.0-rc and we still see the issue observed in #40262 -- which appears to be waiting for user confirmation. I'll comment there to indicate we believe the referenced AWS SDK commit does not fix the issue.

@teo-tsirpanis
Copy link
Contributor

Is it possible that the multiple AWS SDK confusion would be resolved if the AWS SDKs inside the wheels were compiled with -fvisibility=hidden?

@teo-tsirpanis
Copy link
Contributor

teo-tsirpanis commented Jul 30, 2024

Is it possible that the multiple AWS SDK confusion would be resolved if the AWS SDKs inside the wheels were compiled with -fvisibility=hidden?

The answer is almost definitely yes. Building a custom pyarrow wheel is quite hard for me, but I verified it with the opposite case by following these steps:

  • Build tiledb wheels with the AWS SDK compiled with -fvisibility=hidden.
  • pip install stock tiledb and stock pyarrow.
  • Run the script provided by @ihnorton.
  • The script aborts as expected.
  • pip uninstall tiledb and install the custom wheel built in the first step.
  • Run the script again.
  • The script no longer aborts.

@teo-tsirpanis
Copy link
Contributor

Turns out just updating TileDB fixes this issue, but it still will be good to update Arrow to hide the symbols from the AWS SDK, to avoid potentially clashing with another library in the future. I am not planning to do that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants