Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Rename OnDemandTransformations to Transformations #4038

Merged
merged 27 commits into from
Mar 25, 2024

Conversation

franciscojavierarceo
Copy link
Member

@franciscojavierarceo franciscojavierarceo commented Mar 24, 2024

What this PR does / why we need it:

This PR renames:

Which issue(s) this PR fixes:

This PR is a follow up to #4018

Fixes #

@@ -662,10 +663,16 @@ def to_dict(self, project: str) -> Dict[str, List[Any]]:
key=lambda on_demand_feature_view: on_demand_feature_view.name,
):
odfv_dict = self._message_to_sorted_dict(on_demand_feature_view.to_proto())

odfv_dict["spec"]["userDefinedFunction"][
# We are logging a warning because the registry object may be read from a proto that is not updated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This to_dict method really annoys me, always complicates proto changes 😄 Do we actually need it anywhere? Let me look through the project to see if it's being used. Why can't we just convert the whole proto message to json and be done with it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For services that are python backed, they will likely use it. We do at Affirm.

Signed-off-by: Francisco Javier Arceo <franciscojavierarceo@users.noreply.github.com>
Signed-off-by: Francisco Javier Arceo <franciscojavierarceo@users.noreply.github.com>
Signed-off-by: Francisco Javier Arceo <franciscojavierarceo@users.noreply.github.com>
Signed-off-by: Francisco Javier Arceo <franciscojavierarceo@users.noreply.github.com>
Signed-off-by: Francisco Javier Arceo <franciscojavierarceo@users.noreply.github.com>
Signed-off-by: Francisco Javier Arceo <franciscojavierarceo@users.noreply.github.com>
Signed-off-by: Francisco Javier Arceo <franciscojavierarceo@users.noreply.github.com>
Signed-off-by: Francisco Javier Arceo <franciscojavierarceo@users.noreply.github.com>
Signed-off-by: Francisco Javier Arceo <franciscojavierarceo@users.noreply.github.com>
Signed-off-by: Francisco Javier Arceo <franciscojavierarceo@users.noreply.github.com>
Signed-off-by: Francisco Javier Arceo <franciscojavierarceo@users.noreply.github.com>
Signed-off-by: Francisco Javier Arceo <franciscojavierarceo@users.noreply.github.com>
Signed-off-by: Francisco Javier Arceo <franciscojavierarceo@users.noreply.github.com>
Signed-off-by: Francisco Javier Arceo <franciscojavierarceo@users.noreply.github.com>
Signed-off-by: Francisco Javier Arceo <franciscojavierarceo@users.noreply.github.com>
Signed-off-by: Francisco Javier Arceo <franciscojavierarceo@users.noreply.github.com>
Signed-off-by: Francisco Javier Arceo <franciscojavierarceo@users.noreply.github.com>
Signed-off-by: Francisco Javier Arceo <franciscojavierarceo@users.noreply.github.com>
Signed-off-by: Francisco Javier Arceo <franciscojavierarceo@users.noreply.github.com>
@@ -21,13 +21,12 @@ message UserDefinedFunctionV2 {

// A feature transformation executed as a user-defined function
message FeatureTransformationV2 {
// Note this Transformation starts at 5 for backwards compatibility
oneof transformation {
UserDefinedFunctionV2 user_defined_function = 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this isn't ready, but let me suggest some names. What if we call this PythonTransformation instead of UserDefinedFunctionV2. We could reuse that message type both for pandas_transformation and upcoming python_transformation fields and V2 in the naming (I think) will no longer be necessary. wdyt?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to like stating V2 so that people understand it's a replacement for the deprecated proto. Are you thinking of making PythonTransformation an enum as well with Pandas and Python as elements? Feel free to suggest what you're thinking to make it a little more concret if you want.

My guess is something like

message FeatureTransformationV2 {
    oneof PythonTransformation {
        NativePython native_python = 1;
        Pandas pandas = 2;
    }
    SubstraitTransformationV2 substrait_transformation = 3;
}

Or something else?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that would leave the possibility of having both python and substrait fields set, so probably not the best approach. I was thinking more like this (I'll omit V2s here just for brevity).

message FeatureTransformation {
    oneof transformation {
        PythonTransformation pandas_transformation  = 1;
        SubstraitTransformation substrait_transformation = 2;
        PythonTransformation python_transformation  = 3;
    }
}

note that pandas_transformation and python_transformation fields share the message type but that's just incidental because it just so happens that they need same type of information. If in the future we see that that's no longer the case, we could introduce PandasTransformation message as well and the first field of transformation will become PandasTransformation pandas_transformation = 1;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the UDF structure as it's a common industry pattern/convention especially for Spark.

@HaoXuAI any thoughts?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not specifically against UDFs, but the way I like to think about it all these options are sort of udfs anyway, so calling the message just UDF without any quilifier seems redundant, if it was called PythonUserDefinedFunction then it would be okay. I guess what I'm saying is I'm equally okay with the trio of (PythonTransformation, SubstraitTransformation, PandasTransformation) and with that of (PythonUDF, SubstraitUDF and PandasUDF).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeremyary @etirelli any opinions here? I am in favor of user_defined_function and the code for this PR is ready otherwise.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lazy consensus will win here. I'm going to merge as is since everything's covered now.

Signed-off-by: Francisco Javier Arceo <franciscojavierarceo@users.noreply.github.com>
Signed-off-by: Francisco Javier Arceo <franciscojavierarceo@users.noreply.github.com>
@franciscojavierarceo franciscojavierarceo marked this pull request as ready for review March 24, 2024 18:37
@franciscojavierarceo franciscojavierarceo changed the title feat: Rename transformations feat: Rename OnDemandTransformations to Transformations Mar 24, 2024
Copy link
Collaborator

@HaoXuAI HaoXuAI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@HaoXuAI
Copy link
Collaborator

HaoXuAI commented Mar 25, 2024

also can update the document now :)

Signed-off-by: Francisco Javier Arceo <franciscojavierarceo@users.noreply.github.com>
Signed-off-by: Francisco Javier Arceo <franciscojavierarceo@users.noreply.github.com>
Signed-off-by: Francisco Javier Arceo <franciscojavierarceo@users.noreply.github.com>
@franciscojavierarceo franciscojavierarceo merged commit 9b98eaf into master Mar 25, 2024
15 checks passed
franciscojavierarceo pushed a commit that referenced this pull request Apr 16, 2024
# [0.36.0](v0.35.0...v0.36.0) (2024-04-16)

### Bug Fixes

* Add __eq__, __hash__ to SparkSource for correct comparison ([#4028](#4028)) ([e703b40](e703b40))
* Add conn.commit() to Postgresonline_write_batch.online_write_batch ([#3904](#3904)) ([7d75fc5](7d75fc5))
* Add missing __init__.py to embedded_go ([#4051](#4051)) ([6bb4c73](6bb4c73))
* Add missing init files in infra utils ([#4067](#4067)) ([54910a1](54910a1))
* Added registryPath parameter documentation in WebUI reference ([#3983](#3983)) ([5e0af8f](5e0af8f)), closes [#3974](#3974) [#3974](#3974)
* Adding missing init files in materialization modules ([#4052](#4052)) ([df05253](df05253))
* Allow trancated timestamps when converting ([#3861](#3861)) ([bdd7dfb](bdd7dfb))
* Azure blob storage support in Java feature server ([#2319](#2319)) ([#4014](#4014)) ([b9aabbd](b9aabbd))
* Bugfix for grabbing historical data from Snowflake with array type features. ([#3964](#3964)) ([1cc94f2](1cc94f2))
* Bytewax materialization engine fails when loading feature_store.yaml ([#3912](#3912)) ([987f0fd](987f0fd))
* CI unittest warnings ([#4006](#4006)) ([0441b8b](0441b8b))
* Correct the returning class proto type of StreamFeatureView to StreamFeatureViewProto instead of FeatureViewProto. ([#3843](#3843)) ([86d6221](86d6221))
* Create index only if not exists during MySQL online store update ([#3905](#3905)) ([2f99a61](2f99a61))
* Disable minio tests in workflows on master and nightly ([#4072](#4072)) ([c06dda8](c06dda8))
* Disable the Feast Usage feature by default. ([#4090](#4090)) ([b5a7013](b5a7013))
* Dump repo_config by alias ([#4063](#4063)) ([e4bef67](e4bef67))
* Extend SQL registry config with a sqlalchemy_config_kwargs key ([#3997](#3997)) ([21931d5](21931d5))
* Feature Server image startup in OpenShift clusters ([#4096](#4096)) ([9efb243](9efb243))
* Fix copy method for StreamFeatureView ([#3951](#3951)) ([cf06704](cf06704))
* Fix for materializing entityless feature views in Snowflake ([#3961](#3961)) ([1e64c77](1e64c77))
* Fix type mapping spark ([#4071](#4071)) ([3afa78e](3afa78e))
* Fix typo as the cli does not support shortcut-f option. ([#3954](#3954)) ([dd79dbb](dd79dbb))
* Get container host addresses from testcontainers ([#3946](#3946)) ([2cf1a0f](2cf1a0f))
* Handle ComplexFeastType to None comparison ([#3876](#3876)) ([fa8492d](fa8492d))
* Hashlib md5 errors in FIPS for python 3.9+ ([#4019](#4019)) ([6d9156b](6d9156b))
* Making the query_timeout variable as optional int because upstream is considered to be optional ([#4092](#4092)) ([fd5b620](fd5b620))
* Move gRPC dependencies to an extra ([#3900](#3900)) ([f93c5fd](f93c5fd))
* Prevent spamming pull busybox from dockerhub ([#3923](#3923)) ([7153cad](7153cad))
* Quickstart notebook example ([#3976](#3976)) ([b023aa5](b023aa5))
* Raise error when not able read of file source spark source ([#4005](#4005)) ([34cabfb](34cabfb))
* remove not use input parameter in spark source ([#3980](#3980)) ([7c90882](7c90882))
* Remove parentheses in pull_latest_from_table_or_query ([#4026](#4026)) ([dc4671e](dc4671e))
* Remove proto-plus imports ([#4044](#4044)) ([ad8f572](ad8f572))
* Remove unnecessary dependency on mysqlclient ([#3925](#3925)) ([f494f02](f494f02))
* Restore label check for all actions using pull_request_target ([#3978](#3978)) ([591ba4e](591ba4e))
* Revert mypy config ([#3952](#3952)) ([6b8e96c](6b8e96c))
* Rewrite Spark materialization engine to use mapInPandas ([#3936](#3936)) ([dbb59ba](dbb59ba))
* Run feature server w/o gunicorn on windows ([#4024](#4024)) ([584e9b1](584e9b1))
* SqlRegistry _apply_object update statement ([#4042](#4042)) ([ef62def](ef62def))
* Substrait ODFVs for online ([#4064](#4064)) ([26391b0](26391b0))
* Swap security label check on the PR title validation job to explicit permissions instead ([#3987](#3987)) ([f604af9](f604af9))
* Transformation server doesn't generate files from proto ([#3902](#3902)) ([d3a2a45](d3a2a45))
* Trino as an OfflineStore Access Denied when BasicAuthenticaion ([#3898](#3898)) ([49d2988](49d2988))
* Trying to import pyspark lazily to avoid the dependency on the library ([#4091](#4091)) ([a05cdbc](a05cdbc))
* Typo Correction in Feast UI Readme ([#3939](#3939)) ([c16e5af](c16e5af))
* Update actions/setup-python from v3 to v4 ([#4003](#4003)) ([ee4c4f1](ee4c4f1))
* Update typeguard version to >=4.0.0 ([#3837](#3837)) ([dd96150](dd96150))
* Upgrade sqlalchemy from 1.x to 2.x regarding PVE-2022-51668. ([#4065](#4065)) ([ec4c15c](ec4c15c))
* Use CopyFrom() instead of __deepycopy__() for creating a copy of protobuf object. ([#3999](#3999)) ([5561b30](5561b30))
* Using version args to install the correct feast version ([#3953](#3953)) ([b83a702](b83a702))
* Verify the existence of Registry tables in snowflake before calling CREATE sql command. Allow read-only user to call feast apply. ([#3851](#3851)) ([9a3590e](9a3590e))

### Features

* Add duckdb offline store ([#3981](#3981)) ([161547b](161547b))
* Add Entity df in format of a Spark Dataframe instead of just pd.DataFrame or string for SparkOfflineStore ([#3988](#3988)) ([43b2c28](43b2c28))
* Add gRPC Registry Server ([#3924](#3924)) ([373e624](373e624))
* Add local tests for s3 registry using minio ([#4029](#4029)) ([d82d1ec](d82d1ec))
* Add python bytes to array type conversion support proto ([#3874](#3874)) ([8688acd](8688acd))
* Add python client for remote registry server ([#3941](#3941)) ([42a7b81](42a7b81))
* Add Substrait-based ODFV transformation ([#3969](#3969)) ([9e58bd4](9e58bd4))
* Add support for arrays in snowflake ([#3769](#3769)) ([8d6bec8](8d6bec8))
* Added delete_table to redis online store ([#3857](#3857)) ([03dae13](03dae13))
* Adding support for Native Python feature transformations for ODFVs ([#4045](#4045)) ([73bc853](73bc853))
* Bumping requirements ([#4079](#4079)) ([1943056](1943056))
* Decouple transformation types from ODFVs ([#3949](#3949)) ([0a9fae8](0a9fae8))
* Dropping Python 3.8 from local integration tests and integration tests ([#3994](#3994)) ([817995c](817995c))
* Dropping python 3.8 requirements files from the project. ([#4021](#4021)) ([f09c612](f09c612))
* Dropping the support for python 3.8 version from feast ([#4010](#4010)) ([a0f7472](a0f7472))
* Dropping unit tests for Python 3.8 ([#3989](#3989)) ([60f24f9](60f24f9))
* Enable Arrow-based columnar data transfers  ([#3996](#3996)) ([d8d7567](d8d7567))
* Enable Vector database and retrieve_online_documents API ([#4061](#4061)) ([ec19036](ec19036))
* Kubernetes materialization engine written based on bytewax ([#4087](#4087)) ([7617bdb](7617bdb))
* Lint with ruff ([#4043](#4043)) ([7f1557b](7f1557b))
* Make arrow primary interchange for offline ODFV execution ([#4083](#4083)) ([9ed0a09](9ed0a09))
* Pandas v2 compatibility ([#3957](#3957)) ([64459ad](64459ad))
* Pull duckdb from contribs, add to CI ([#4059](#4059)) ([318a2b8](318a2b8))
* Refactor ODFV schema inference ([#4076](#4076)) ([c50a9ff](c50a9ff))
* Refactor registry caching logic into a separate class ([#3943](#3943)) ([924f944](924f944))
* Rename OnDemandTransformations to Transformations ([#4038](#4038)) ([9b98eaf](9b98eaf))
* Revert updating dependencies so that feast can be run on 3.11. ([#3968](#3968)) ([d3c68fb](d3c68fb)), closes [#3958](#3958)
* Rewrite ibis point-in-time-join w/o feast abstractions ([#4023](#4023)) ([3980e0c](3980e0c))
* Support s3gov schema by snowflake offline store during materialization ([#3891](#3891)) ([ea8ad17](ea8ad17))
* Update odfv test ([#4054](#4054)) ([afd52b8](afd52b8))
* Update pyproject.toml to use Python 3.9 as default ([#4011](#4011)) ([277b891](277b891))
* Update the Pydantic from v1 to v2 ([#3948](#3948)) ([ec11a7c](ec11a7c))
* Updating dependencies so that feast can be run on 3.11. ([#3958](#3958)) ([59639db](59639db))
* Updating protos to separate transformation ([#4018](#4018)) ([c58ef74](c58ef74))

### Reverts

* Reverting bumping requirements ([#4081](#4081)) ([1ba65b4](1ba65b4)), closes [#4079](#4079)
* Verify the existence of Registry tables in snowflake… ([#3907](#3907)) ([c0d358a](c0d358a)), closes [#3851](#3851)
@tokoko tokoko deleted the rename-transformations branch July 16, 2024 12:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants