[SPARK-48374][PYTHON] Support additional PyArrow Table column types #46688

ianmcook · 2024-05-21T13:36:29Z

What changes were proposed in this pull request?

This is a small follow-up to #46529. This adds support for some more Arrow data types:

fixed-size binary
fixed-size list
large list

Why are the changes needed?

Users who are creating Spark DataFrames from PyArrow Tables will expect it to work if their Tables contain these types of columns

Does this PR introduce any user-facing change?

It will prevent an error in the case where the user has one of these types of columns in their PyArrow Table. There are no other user-facing changes.

How was this patch tested?

Tests are included.

Was this patch authored or co-authored using generative AI tooling?

No

ianmcook · 2024-06-02T14:42:09Z

cc @HyukjinKwon @xinrong-meng @zhengruifeng

This is a small improvement that I split out from #46529. Thank you.

HyukjinKwon · 2024-06-03T00:23:42Z

Merged to master.

HyukjinKwon · 2024-06-03T00:24:06Z

Thank you for fixing those.

### What changes were proposed in this pull request? This is a small follow-up to apache#46529. This adds support for some more Arrow data types: - fixed-size binary - fixed-size list - large list ### Why are the changes needed? Users who are creating Spark DataFrames from PyArrow Tables will expect it to work if their Tables contain these types of columns ### Does this PR introduce _any_ user-facing change? It will prevent an error in the case where the user has one of these types of columns in their PyArrow Table. There are no other user-facing changes. ### How was this patch tested? Tests are included. ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#46688 from ianmcook/SPARK-48374. Authored-by: Ian Cook <ianmcook@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

github-actions bot added SQL PYTHON labels May 21, 2024

ianmcook added a commit to ianmcook/spark that referenced this pull request May 21, 2024

Break out changes into separate PR apache#46688

77d169c

ianmcook mentioned this pull request May 21, 2024

[SPARK-48220][PYTHON] Allow passing PyArrow Table to createDataFrame() #46529

Closed

ianmcook added a commit to ianmcook/spark that referenced this pull request May 30, 2024

Break out changes into separate PR apache#46688

9b93e3b

ianmcook added 3 commits June 2, 2024 09:09

Add implementation

5f87f8b

Add tests

e1de848

Add comment

743245a

ianmcook force-pushed the SPARK-48374 branch from fceb1ec to 743245a Compare June 2, 2024 13:09

ianmcook marked this pull request as ready for review June 2, 2024 13:13

HyukjinKwon approved these changes Jun 3, 2024

View reviewed changes

HyukjinKwon closed this in e208b77 Jun 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-48374][PYTHON] Support additional PyArrow Table column types #46688

[SPARK-48374][PYTHON] Support additional PyArrow Table column types #46688

ianmcook commented May 21, 2024

ianmcook commented Jun 2, 2024

HyukjinKwon commented Jun 3, 2024

HyukjinKwon commented Jun 3, 2024

[SPARK-48374][PYTHON] Support additional PyArrow Table column types #46688

[SPARK-48374][PYTHON] Support additional PyArrow Table column types #46688

Conversation

ianmcook commented May 21, 2024

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

ianmcook commented Jun 2, 2024

HyukjinKwon commented Jun 3, 2024

HyukjinKwon commented Jun 3, 2024