-
Notifications
You must be signed in to change notification settings - Fork 996
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add delta format to FileSource
, add support for it in ibis/duckdb
#4123
Conversation
Signed-off-by: tokoko <togurg14@freeuni.edu.ge>
Signed-off-by: tokoko <togurg14@freeuni.edu.ge>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice.
I was wondering how we can use delta lake. One question, how is it possible to use the delta format with the spark offline store or spark materialization?
SparkSource already supports it afaik, SparkSource can also be used to basically just describe the source as a table name without actually caring about the format (I think). while that is probably fine for spark offline store, one downside is that it's tied to just a single implementation. What I'm hoping to achieve with FileSource is that we can have a single source type that can be read by multiple offline stores (dask, duckdb, spark and others) so you can define a set of sources in your feature repository that won't lock you to use a specific offline store only. |
Make sense. I guess for the pyspark, in order to use delta format, it needs to install a delta spark plugin? |
LGTM! |
Yeah, delta data source needs to be configured beforehand in some way. |
@tokoko nice PR! I kind of had put this on a back burner : P and focused my energy more on delta-rs. |
thanks, this wouldn't be possible without delta-rs. |
# [0.38.0](v0.37.0...v0.38.0) (2024-05-24) ### Bug Fixes * Add vector database doc ([#4165](#4165)) ([37f36b6](37f36b6)) * Change checkout action back to v3 from v5 which isn't released yet ([#4147](#4147)) ([9523fff](9523fff)) * Change numpy version <1.25 dependency to <2 in setup.py ([#4085](#4085)) ([2ba71ff](2ba71ff)), closes [#4084](#4084) * Changed the code the way mysql container is initialized. ([#4140](#4140)) ([8b5698f](8b5698f)), closes [#4126](#4126) * Correct nightly install command, move all installs to uv ([#4164](#4164)) ([c86d594](c86d594)) * Default value is not set in Redis connection string using environment variable ([#4136](#4136)) ([95acfb4](95acfb4)), closes [#3669](#3669) * Get container host addresses from testcontainers (java) ([#4125](#4125)) ([9184dde](9184dde)) * Get rid of empty string `name_alias` during feature view projection deserialization ([#4116](#4116)) ([65056ce](65056ce)) * Helm chart `feast-feature-server`, improve Service template name ([#4161](#4161)) ([dedc164](dedc164)) * Improve the code related to on-demand-featureview. ([#4203](#4203)) ([d91d7e0](d91d7e0)) * Integration tests for async sdk method ([#4201](#4201)) ([08c44ae](08c44ae)) * Make sure schema is used when calling `get_table_query_string` method for Snowflake datasource ([#4131](#4131)) ([c1579c7](c1579c7)) * Make sure schema is used when generating `from_expression` for Snowflake ([#4177](#4177)) ([5051da7](5051da7)) * Pass native input values to `get_online_features` from feature server ([#4117](#4117)) ([60756cb](60756cb)) * Pass region to S3 client only if set (Java) ([#4151](#4151)) ([b8087f7](b8087f7)) * Pgvector patch ([#4108](#4108)) ([ad45bb4](ad45bb4)) * Update doc ([#4153](#4153)) ([e873636](e873636)) * Update master-only benchmark bucket name due to credential update ([#4183](#4183)) ([e88f1e3](e88f1e3)) * Updating the instructions for quickstart guide. ([#4120](#4120)) ([0c30e96](0c30e96)) * Upgrading the test container so that local tests works with updated d… ([#4155](#4155)) ([93ddb11](93ddb11)) ### Features * Add a Kubernetes Operator for the Feast Feature Server ([#4145](#4145)) ([4a696dc](4a696dc)) * Add delta format to `FileSource`, add support for it in ibis/duckdb ([#4123](#4123)) ([2b6f1d0](2b6f1d0)) * Add materialization support to ibis/duckdb ([#4173](#4173)) ([369ca98](369ca98)) * Add optional private key params to Snowflake config ([#4205](#4205)) ([20f5419](20f5419)) * Add s3 remote storage export for duckdb ([#4195](#4195)) ([6a04c48](6a04c48)) * Adding DatastoreOnlineStore 'database' argument. ([#4180](#4180)) ([e739745](e739745)) * Adding get_online_features_async to feature store sdk ([#4172](#4172)) ([311efc5](311efc5)) * Adding support for dictionary writes to online store ([#4156](#4156)) ([abfac01](abfac01)) * Elasticsearch vector database ([#4188](#4188)) ([bf99640](bf99640)) * Enable other distance metrics for Vector DB and Update docs ([#4170](#4170)) ([ba9f4ef](ba9f4ef)) * Feast/IKV datetime edgecase errors ([#4211](#4211)) ([bdae562](bdae562)) * Feast/IKV documenation language changes ([#4149](#4149)) ([690a621](690a621)) * Feast/IKV online store contrib plugin integration ([#4068](#4068)) ([f2b4eb9](f2b4eb9)) * Feast/IKV online store documentation ([#4146](#4146)) ([73601e4](73601e4)) * Feast/IKV upgrade client version ([#4200](#4200)) ([0e42150](0e42150)) * Incorporate substrait ODFVs into ibis-based offline store queries ([#4102](#4102)) ([c3a102f](c3a102f)) * Isolate input-dependent calculations in `get_online_features` ([#4041](#4041)) ([2a6edea](2a6edea)) * Make arrow primary interchange for online ODFV execution ([#4143](#4143)) ([3fdb716](3fdb716)) * Move data source validation entrypoint to offline store ([#4197](#4197)) ([a17725d](a17725d)) * Upgrading python version to 3.11, adding support for 3.11 as well. ([#4159](#4159)) ([4b1634f](4b1634f)), closes [#4152](#4152) [#4114](#4114) ### Reverts * Reverts "fix: Using version args to install the correct feast version" ([#4112](#4112)) ([b66baa4](b66baa4)), closes [#3953](#3953)
# [0.38.0](v0.37.0...v0.38.0) (2024-05-24) ### Bug Fixes * Add vector database doc ([#4165](#4165)) ([37f36b6](37f36b6)) * Change checkout action back to v3 from v5 which isn't released yet ([#4147](#4147)) ([9523fff](9523fff)) * Change numpy version <1.25 dependency to <2 in setup.py ([#4085](#4085)) ([2ba71ff](2ba71ff)), closes [#4084](#4084) * Changed the code the way mysql container is initialized. ([#4140](#4140)) ([8b5698f](8b5698f)), closes [#4126](#4126) * Correct nightly install command, move all installs to uv ([#4164](#4164)) ([c86d594](c86d594)) * Default value is not set in Redis connection string using environment variable ([#4136](#4136)) ([95acfb4](95acfb4)), closes [#3669](#3669) * Get container host addresses from testcontainers (java) ([#4125](#4125)) ([9184dde](9184dde)) * Get rid of empty string `name_alias` during feature view projection deserialization ([#4116](#4116)) ([65056ce](65056ce)) * Helm chart `feast-feature-server`, improve Service template name ([#4161](#4161)) ([dedc164](dedc164)) * Improve the code related to on-demand-featureview. ([#4203](#4203)) ([d91d7e0](d91d7e0)) * Integration tests for async sdk method ([#4201](#4201)) ([08c44ae](08c44ae)) * Make sure schema is used when calling `get_table_query_string` method for Snowflake datasource ([#4131](#4131)) ([c1579c7](c1579c7)) * Make sure schema is used when generating `from_expression` for Snowflake ([#4177](#4177)) ([5051da7](5051da7)) * Pass native input values to `get_online_features` from feature server ([#4117](#4117)) ([60756cb](60756cb)) * Pass region to S3 client only if set (Java) ([#4151](#4151)) ([b8087f7](b8087f7)) * Pgvector patch ([#4108](#4108)) ([ad45bb4](ad45bb4)) * Update doc ([#4153](#4153)) ([e873636](e873636)) * Update master-only benchmark bucket name due to credential update ([#4183](#4183)) ([e88f1e3](e88f1e3)) * Updating the instructions for quickstart guide. ([#4120](#4120)) ([0c30e96](0c30e96)) * Upgrading the test container so that local tests works with updated d… ([#4155](#4155)) ([93ddb11](93ddb11)) ### Features * Add a Kubernetes Operator for the Feast Feature Server ([#4145](#4145)) ([4a696dc](4a696dc)) * Add delta format to `FileSource`, add support for it in ibis/duckdb ([#4123](#4123)) ([2b6f1d0](2b6f1d0)) * Add materialization support to ibis/duckdb ([#4173](#4173)) ([369ca98](369ca98)) * Add optional private key params to Snowflake config ([#4205](#4205)) ([20f5419](20f5419)) * Add s3 remote storage export for duckdb ([#4195](#4195)) ([6a04c48](6a04c48)) * Adding DatastoreOnlineStore 'database' argument. ([#4180](#4180)) ([e739745](e739745)) * Adding get_online_features_async to feature store sdk ([#4172](#4172)) ([311efc5](311efc5)) * Adding support for dictionary writes to online store ([#4156](#4156)) ([abfac01](abfac01)) * Elasticsearch vector database ([#4188](#4188)) ([bf99640](bf99640)) * Enable other distance metrics for Vector DB and Update docs ([#4170](#4170)) ([ba9f4ef](ba9f4ef)) * Feast/IKV datetime edgecase errors ([#4211](#4211)) ([bdae562](bdae562)) * Feast/IKV documenation language changes ([#4149](#4149)) ([690a621](690a621)) * Feast/IKV online store contrib plugin integration ([#4068](#4068)) ([f2b4eb9](f2b4eb9)) * Feast/IKV online store documentation ([#4146](#4146)) ([73601e4](73601e4)) * Feast/IKV upgrade client version ([#4200](#4200)) ([0e42150](0e42150)) * Incorporate substrait ODFVs into ibis-based offline store queries ([#4102](#4102)) ([c3a102f](c3a102f)) * Isolate input-dependent calculations in `get_online_features` ([#4041](#4041)) ([2a6edea](2a6edea)) * Make arrow primary interchange for online ODFV execution ([#4143](#4143)) ([3fdb716](3fdb716)) * Move data source validation entrypoint to offline store ([#4197](#4197)) ([a17725d](a17725d)) * Upgrading python version to 3.11, adding support for 3.11 as well. ([#4159](#4159)) ([4b1634f](4b1634f)), closes [#4152](#4152) [#4114](#4114) ### Reverts * Reverts "fix: Using version args to install the correct feast version" ([#4112](#4112)) ([b66baa4](b66baa4)), closes [#3953](#3953)
What this PR does / why we need it:
This PR adds delta format to
FileSource
(only parquet was supported before), also implements logic for handling delta sources in ibis offline store. It also adds integration tests for duckdb offline store with delta sources, dukdb tests are effectively run twice, once for parquet and then for delta.