-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(python)!: allow from_arrow
to take a generator of RecordBatches, change error type to TypeError
#10529
feat(python)!: allow from_arrow
to take a generator of RecordBatches, change error type to TypeError
#10529
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we just accept an Iterable and let pa.Table.from_batches
do the work? I'm taking a closer look here.
I added a commit with what I think is a simpler implementation. What do you think? Feel free to reset / do things differently, of course, but I thought this would be the simplest way to show my thinking. |
77c7f58
to
19a4243
Compare
Not if we want to generate the same error when a generator containing an incorrect type is passed-in; if we don't mind being a bit inconsistent then yes. Having said that, technically we're actually raising the wrong class of error here; should be a TypeError (as How about I fix it up (making it a TypeError) and simplify as Iterable at the same time, and we push it out with Update: |
from_arrow
frame constructor to take a generator of RecordBatchesfrom_arrow
frame constructor to take a generator of RecordBatches, change error type to TypeError
from_arrow
frame constructor to take a generator of RecordBatches, change error type to TypeError
from_arrow
to take a generator of RecordBatches, change error type to TypeError
All right then, this can be merged! I'll let you do the honors :) Related: I think we can improve error types in many areas, I'll make a round sometime to do some updates. |
Slightly extends
pl.from_arrow
constructor to allow a generator ofRecordBatch
as input (which the underlyingpa.Table.from_batches
method accepts).As well as being convenient, might offer some modest memory advantages if receiving multiple batches (started looking into allowing
read_database
to take connection objects, and snowflake has a fetch_arrow_batches method, for example) vs requiring up-front single-batch materialisation. Need to check the pyarrow source to see how it's handled there to confirm that though.