Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PERF] Use to_arrow_iter in to_arrow to avoid unnecessary array concats #2780

Merged
merged 6 commits into from
Sep 9, 2024

Conversation

jaychia
Copy link
Contributor

@jaychia jaychia commented Sep 3, 2024

Fixes to_arrow() to use Table.from_batches for performance

Driveby: fix args results_buffer_size documentation by using a num_cpus literal

Copy link

codspeed-hq bot commented Sep 3, 2024

CodSpeed Performance Report

Merging #2780 will not alter performance

Comparing jay/use-iter-to_arrow (f741a64) with main (58d1856)

Summary

✅ 16 untouched benchmarks

@@ -286,7 +286,9 @@ def iter_rows(self, results_buffer_size: Optional[int] = NUM_CPUS) -> Iterator[D
yield row

@DataframePublicAPI
def to_arrow_iter(self, results_buffer_size: Optional[int] = 1) -> Iterator["pyarrow.RecordBatch"]:
def to_arrow_iter(
self, results_buffer_size: Optional[int] = 1, cast_tensors_to_ray_tensor_dtype: bool = False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we document these arguments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good point... I kind of want to remove them at some point though they're not really useful and add a bunch of tech debt

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually for this case I'll make them "private" then by adding an underscore

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Removed in #2802 )

Copy link

github-actions bot commented Sep 9, 2024

@github-actions github-actions bot temporarily deployed to pull request September 9, 2024 00:59 Inactive
@github-actions github-actions bot temporarily deployed to pull request September 9, 2024 01:35 Inactive
@github-actions github-actions bot temporarily deployed to pull request September 9, 2024 01:47 Inactive
@Eventual-Inc Eventual-Inc deleted a comment from netlify bot Sep 9, 2024
@github-actions github-actions bot temporarily deployed to pull request September 9, 2024 02:11 Inactive
@jaychia jaychia enabled auto-merge (squash) September 9, 2024 02:35
@jaychia jaychia merged commit 40d7fc4 into main Sep 9, 2024
38 checks passed
@jaychia jaychia deleted the jay/use-iter-to_arrow branch September 9, 2024 02:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants