Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add Iterators.partition #3212

Merged
merged 7 commits into from
Dec 2, 2022
Merged

add Iterators.partition #3212

merged 7 commits into from
Dec 2, 2022

Conversation

bkamins
Copy link
Member

@bkamins bkamins commented Oct 30, 2022

Having this function would be convenient in cases like https://discourse.julialang.org/t/arrow-stream-usage-clarification/89508.

With this PR adding new functionalities for 1.5 release starts. I will manage patches to 1.4.x releases in a separate branch if they are needed.

@bkamins bkamins added this to the 1.5 milestone Oct 30, 2022
@bkamins
Copy link
Member Author

bkamins commented Oct 30, 2022

CI failure is unrelated.

src/abstractdataframe/abstractdataframe.jl Outdated Show resolved Hide resolved
src/abstractdataframe/abstractdataframe.jl Outdated Show resolved Hide resolved
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
@bkamins
Copy link
Member Author

bkamins commented Nov 2, 2022

Thank you!

I will wait with merging this. First I will finalize #3213 and make a patch release (the problem #3213 fixes is pretty common in combination with CSV.jl).

Copy link
Member

@quinnj quinnj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great addition!

@bkamins
Copy link
Member Author

bkamins commented Nov 2, 2022

@quinnj - as you probably know I want to add it to allow easy splitting of a data frame into record batches for Arrow.jl.

@quinnj
Copy link
Member

quinnj commented Nov 2, 2022

Yeah, I saw the discourse post. I wonder if we should add a note about Tables.partitions as well. I.e. doing Arrow.write(filename, Tables.partitions(Iterators.partition(df, n))) will work; is that the recommended invocation you have in mind?

@bkamins
Copy link
Member Author

bkamins commented Nov 2, 2022

Ah - you are right. I need to change the implementation so that Iterations.PartitionIterator is returned, as I see that Tables.partition does not recognize any Generator, but requires a specific one. Thank you!

@bkamins
Copy link
Member Author

bkamins commented Nov 2, 2022

After this change it will be enough to write Arrow.write(filename, Iterators.partition(df, n)), as Arrow.write calls Tables.partitions internally anyway.

@quinnj
Copy link
Member

quinnj commented Nov 2, 2022

After this change it will be enough to write Arrow.write(filename, Iterators.partition(df, n)), as Arrow.write calls Tables.partitions internally anyway.

Ah yes, you're right. Nice; that's very simple/clean.

@bkamins
Copy link
Member Author

bkamins commented Nov 2, 2022

OK - I have made the changes and tested that Arrow.write works as expected. Thank you!

@nalimilan - as usual (no rush) - it would be great if you had a look at the PR again.

src/abstractdataframe/abstractdataframe.jl Show resolved Hide resolved
src/abstractdataframe/abstractdataframe.jl Show resolved Hide resolved
src/abstractdataframe/abstractdataframe.jl Outdated Show resolved Hide resolved
src/abstractdataframe/abstractdataframe.jl Outdated Show resolved Hide resolved
@bkamins
Copy link
Member Author

bkamins commented Nov 26, 2022

Thank you! I will merge this after we make DataFrames.jl 1.4.4 release.

@bkamins bkamins merged commit 2bbcd57 into main Dec 2, 2022
@bkamins bkamins deleted the bk/partition branch December 2, 2022 10:34
@bkamins
Copy link
Member Author

bkamins commented Dec 2, 2022

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants