Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(python)!: allow from_arrow to take a generator of RecordBatches, change error type to TypeError #10529

Merged

Conversation

alexander-beedie
Copy link
Collaborator

@alexander-beedie alexander-beedie commented Aug 16, 2023

Slightly extends pl.from_arrow constructor to allow a generator of RecordBatch as input (which the underlying pa.Table.from_batches method accepts).


As well as being convenient, might offer some modest memory advantages if receiving multiple batches (started looking into allowing read_database to take connection objects, and snowflake has a fetch_arrow_batches method, for example) vs requiring up-front single-batch materialisation. Need to check the pyarrow source to see how it's handled there to confirm that though.

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars labels Aug 16, 2023
Copy link
Member

@stinodego stinodego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just accept an Iterable and let pa.Table.from_batches do the work? I'm taking a closer look here.

py-polars/polars/convert.py Outdated Show resolved Hide resolved
@stinodego
Copy link
Member

I added a commit with what I think is a simpler implementation. What do you think? Feel free to reset / do things differently, of course, but I thought this would be the simplest way to show my thinking.

@alexander-beedie
Copy link
Collaborator Author

alexander-beedie commented Aug 17, 2023

Can't we just accept an Iterable and let pa.Table.from_batches do the work?

Not if we want to generate the same error when a generator containing an incorrect type is passed-in; if we don't mind being a bit inconsistent then yes.

Having said that, technically we're actually raising the wrong class of error here; should be a TypeError (as pa.Table.from_batches would raise), not a ValueError, but I was going to fix that as a breaking update afterwards 😅

How about I fix it up (making it a TypeError) and simplify as Iterable at the same time, and we push it out with 0.19? Two birds, one stone... The error text in this edge-case would be slightly different when given an incorrectly-typed generator, but at least the errors (raised by us or raised by pyarrow) will both be of the same class.

Update:
Ahh, I see you already changed the exception type in your commit - shall we tag as breaking? Otherwise the update/simplification looks A-Ok to me 👌

@stinodego stinodego changed the title feat(python): allow from_arrow frame constructor to take a generator of RecordBatches feat(python)!: allow from_arrow frame constructor to take a generator of RecordBatches, change error type to TypeError Aug 17, 2023
@stinodego stinodego changed the title feat(python)!: allow from_arrow frame constructor to take a generator of RecordBatches, change error type to TypeError feat(python)!: allow from_arrow to take a generator of RecordBatches, change error type to TypeError Aug 17, 2023
@github-actions github-actions bot added the breaking Change that breaks backwards compatibility label Aug 17, 2023
@stinodego
Copy link
Member

stinodego commented Aug 17, 2023

All right then, this can be merged! I'll let you do the honors :)

Related: I think we can improve error types in many areas, I'll make a round sometime to do some updates.

@alexander-beedie alexander-beedie merged commit a212b61 into pola-rs:main Aug 17, 2023
14 checks passed
@alexander-beedie alexander-beedie deleted the from-record-batch-generator branch August 19, 2023 06:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Change that breaks backwards compatibility enhancement New feature or an improvement of an existing feature python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants