-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor how we create listing tables #4227
Conversation
@avantgardnerio Is it possible that "with_listing_schema_provider" implementation/test is missing something? |
I have updated the semantics of catalog.type such that it indicates the fileformat.. Only difference is now that by default we assume CSV does not have a header line. |
Nice work, ty @timvw ! |
Definitely possible... what makes you ask? |
That's what I was thinking... |
Just because it's not obvious, I'd like to leave a note here to say that the real functionality of the Factories is to allow other repos to register TableProviders we don't even know about at compile time: delta-io/delta-rs#892 |
It is still possible to register other/unknown tableproviderfactories. This happens in eg: test sql::create_drop::create_external_table_with_ddl. Because the filetype is passed in uppercase to CreateExternalTableCommand it's needed to register the factor with an uppercase name (example) There is a slight change in how ListingSchemaProvider (this is very new functionality anyway) behaves (as it now requires a fileformat and (optional) has_header values). (In a next step i'd like to move the has_header, delimiter etc values into options of CreateExternalTable command) |
I am looking forward to reviewing this tomorrow -- thank you @timvw |
Co-authored-by: Andy Grove <andygrove73@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks great -- thank you so much for the cleanup @timvw
let object_store_provider = DatafusionCliObjectStoreProvider {}; | ||
let object_store_registry = | ||
ObjectStoreRegistry::new_with_provider(Some(Arc::new(object_store_provider))); | ||
let rn_config = RuntimeConfig::new() | ||
.with_object_store_registry(Arc::new(object_store_registry)) | ||
.with_table_factories(table_factories); | ||
.with_object_store_registry(Arc::new(object_store_registry)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
@@ -59,18 +61,24 @@ impl ListingSchemaProvider { | |||
/// `path`: The root path that contains subfolders which represent tables | |||
/// `factory`: The `TableProviderFactory` to use to instantiate tables for each subfolder | |||
/// `store`: The `ObjectStore` containing the table data | |||
/// `format`: The `FileFormat` of the tables |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure it matters, but this says FileFormat
but the actual argument is a String
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @timvw
Benchmark runs are scheduled for baseline = a0581dc and contender = e18f7ba. e18f7ba is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
🚀 |
Which issue does this PR close?
Closes #.
Rationale for this change
Currently the creation of External tables is somewhat strange. Some types are hardcoded, others are only supported via TableProviderFactories..
This PR tries to simplify the logic by always using a TableProviderFactory.
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?