Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Parquet sink support #483

Merged
merged 15 commits into from
Apr 11, 2024
Merged

Add Parquet sink support #483

merged 15 commits into from
Apr 11, 2024

Conversation

caufieldjh
Copy link
Collaborator

@caufieldjh caufieldjh commented Apr 1, 2024

Related to #482

@caufieldjh
Copy link
Collaborator Author

still a work in progress

@caufieldjh caufieldjh linked an issue Apr 3, 2024 that may be closed by this pull request
@caufieldjh caufieldjh changed the title Add Parquet sink/source support Add Parquet sink support Apr 10, 2024
@caufieldjh
Copy link
Collaborator Author

Having the sink appears much more useful than having the source, so I'm going to restrict this to writing the sink for now.

@caufieldjh caufieldjh marked this pull request as ready for review April 10, 2024 21:01
@caufieldjh caufieldjh requested a review from sierra-moxon April 10, 2024 21:01
@caufieldjh
Copy link
Collaborator Author

Parquet supports arranging whole datasets of files, each potentially with multiple groups but containing one table at minimum
https://arrow.apache.org/docs/python/parquet.html#partitioned-datasets-multiple-files
This doesn't do anything with that functionality, but it wouldn't be too much of a lift to implement faceted or filtered outputs.

@caufieldjh caufieldjh merged commit 95d471c into master Apr 11, 2024
4 checks passed
@caufieldjh caufieldjh deleted the parquet branch April 11, 2024 15:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Provide source or sink as Parquet
2 participants