Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented a ReadOptions trait for cleaner code. #5025

Merged
merged 5 commits into from
Jan 26, 2023

Conversation

saikrishna1-bidgely
Copy link
Contributor

The read_csv, read_parquet, read_json and read_avro methods can not use a single method to read the locations.

Which issue does this PR close?

Closes #5024.

Rationale for this change

This will make for a simpler code in #4908.

What changes are included in this PR?

Implemented a new trait ReadOptions. The code for method to_listing_options has been moved from the respective structs into this trait. Also the code from read_csv/json/avro/parquet to get the resolved schema has been moved here in get_resolved_schema.
Once this trait is implemented, the remaining code in the methods read_csv/json/avro/parquet is the same and has been moved to a private method.

Are these changes tested?

This is only a refactor and the existing tests should cover the refactored code.

Are there any user-facing changes?

No

…read_csv` and other read funcitons can now read using the same method.
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @saikrishna1-bidgely -- i agree this looks like a nice API change. I had a few small comments but otherwise I think it is ready go to


#[async_trait]
///
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please fill in this docstring?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the docs.

/// Helper to convert these user facing options to `ListingTable` options
pub fn to_listing_options(&self, target_partitions: usize) -> ListingOptions {
fn to_listing_options(&self, target_partitions: usize) -> ListingOptions;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fn to_listing_options(&self, target_partitions: usize) -> ListingOptions;
fn to_listing_options(&self, config: &SessionConfig) -> ListingOptions;

While I realize that the target_partitions is the only thing actually used at the moment, since this is part of the public API I think providing &SessionConfig would be more future proof (aka avoid changes to this trait over time)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really nice @saikrishna1-bidgely - thank you again ❤️

cc @tustvold / @andygrove / @thinkharderdev / @Dandandan

datafusion/core/src/execution/context.rs Outdated Show resolved Hide resolved
@alamb alamb added the api change Changes the API exposed to users of the crate label Jan 26, 2023
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
@alamb alamb merged commit 556fffb into apache:master Jan 26, 2023
@alamb
Copy link
Contributor

alamb commented Jan 26, 2023

Thanks agian @saikrishna1-bidgely

@ursabot
Copy link

ursabot commented Jan 26, 2023

Benchmark runs are scheduled for baseline = bc9b78d and contender = 556fffb. 556fffb is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api change Changes the API exposed to users of the crate core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cleaner code for Read Options in reader methdos.
3 participants