Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inferencing of Features in FeatureView and timestamp column of DataSource #1523

Merged

Conversation

mavysavydav
Copy link
Collaborator

@mavysavydav mavysavydav commented Apr 29, 2021

Test cases written and also tested by running with CLI.

What this PR does / why we need it:
Milestone 1 of https://docs.google.com/document/d/1MkWvexE4e5nYWcQLELFnJ5o9OlJDKC2rn_USHMDT9dg/edit

Which issue(s) this PR fixes:

Fixes #1500

Does this PR introduce a user-facing change?:

Feast will be able to infer the features to register if no features are included in the FeatureView definition, and also infer the event timestamp column of DataSources if that parameter is left out. This is milestone 1 of https://docs.google.com/document/d/1MkWvexE4e5nYWcQLELFnJ5o9OlJDKC2rn_USHMDT9dg/edit#heading=h.pqhio4s5uw2s

@feast-ci-bot
Copy link
Collaborator

Hi @mavysavydav. Thanks for your PR.

I'm waiting for a feast-dev member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

…More through testing needed.

Signed-off-by: David Liu <davidl@twitter.com>
@mavysavydav mavysavydav marked this pull request as ready for review April 29, 2021 17:32
@mavysavydav mavysavydav requested review from jklegar, tsotnet, woop and a team as code owners April 29, 2021 17:32
@woop
Copy link
Member

woop commented Apr 29, 2021

/ok-to-test

@woop
Copy link
Member

woop commented Apr 29, 2021

Thanks @mavysavydav, we will have a look.

@woop
Copy link
Member

woop commented Apr 30, 2021

Don't turn columns that start with or end with "__" (double underscore) into a Feature. (any objections?)

Seems reasonable.

Signed-off-by: David Liu <davidl@twitter.com>
Signed-off-by: David Liu <davidl@twitter.com>
@mavysavydav
Copy link
Collaborator Author

About to push up new commit with test cases and some other minor fixes @jklegar. Feel free to wait on review until that commit arrives

David Liu added 3 commits April 30, 2021 14:16
Signed-off-by: David Liu <davidl@twitter.com>
Signed-off-by: David Liu <davidl@twitter.com>
…ion test & added __ rule in inference

Signed-off-by: David Liu <davidl@twitter.com>
@mavysavydav
Copy link
Collaborator Author

Ok, tests have been added and ball is in ur court :D 🏀

Copy link
Collaborator

@jklegar jklegar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you David this is great! Left a few comments, most of them small - let me know if anything doesn't make sense

sdk/python/feast/data_source.py Outdated Show resolved Hide resolved
sdk/python/feast/data_source.py Show resolved Hide resolved
sdk/python/feast/data_source.py Outdated Show resolved Hide resolved
sdk/python/feast/feature_view.py Outdated Show resolved Hide resolved
sdk/python/feast/feature_view.py Outdated Show resolved Hide resolved
sdk/python/feast/type_map.py Outdated Show resolved Hide resolved
… test case for it

Signed-off-by: David Liu <davidl@twitter.com>
@mavysavydav mavysavydav force-pushed the featureRepoSchemaInferencing branch from d097efe to 01fd199 Compare May 1, 2021 23:29
@mavysavydav
Copy link
Collaborator Author

thx for review! i agree w ur suggestions and have made the changes (including adding a test for checking that bigQuerySource is properly inferencing based on query when table_ref is absent.

Copy link
Collaborator

@jklegar jklegar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great - a couple small followup comments and then should be good to go

list(zip(df["COLUMN_NAME"].to_list(), df["DATA_TYPE"].to_list()))
)
else:
bq_columns_query = f"SELECT * FROM {self.query} LIMIT 1"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may need parentheses around {self.query} here

df, event_timestamp_column
)
return BigQuerySource(
query=bq_source_using_table_ref.table_ref,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The query is supposed to be a select statement, so I think something like f"SELECT * FROM {bq_source_using_table_ref.table_ref}" would be better

expected = {
("float_col", ValueType.DOUBLE),
("int64_col", ValueType.INT64),
("string_col", ValueType.STRING),
}
expected2 = { # parsing with the "query" param in bq sources uses different code path
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want both code paths to end up producing the same types, see also comment on types above

"INT64": ValueType.INT64,
"STRING": ValueType.STRING,
"FLOAT": ValueType.FLOAT,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's pretty weird on BigQuery's part that it's giving different types based on whether you query the INFORMATION_SCHEMA table or look at the field_type on schema_field; however, it seems INTEGER and INT64 are equivalent, and same for FLOAT and FLOAT64 - see https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#TableFieldSchema.FIELDS.type.
So I think that here you can just have "INTEGER" map to the int64 value type and "FLOAT" map to the double value type, and this should also fix the mapping to different types in the test below

Signed-off-by: David Y Liu <davidyliuliu@gmail.com>
@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jklegar, mavysavydav

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jklegar
Copy link
Collaborator

jklegar commented May 4, 2021

/kind feature

@feast-ci-bot feast-ci-bot added kind/feature New feature or request and removed needs-kind labels May 4, 2021
@jklegar
Copy link
Collaborator

jklegar commented May 4, 2021

/lgtm

@feast-ci-bot feast-ci-bot merged commit f55b51c into feast-dev:master May 4, 2021
jklegar pushed a commit to jklegar/feast that referenced this pull request May 12, 2021
…urce (feast-dev#1523)

* Implemented the inferencing. Did cursory runs to make sure it works. More through testing needed.

Signed-off-by: David Liu <davidl@twitter.com>

* fixed issue with mutable default argument in FeatureView

Signed-off-by: David Liu <davidl@twitter.com>

* Fix in example_feature_repo_with_inference.py file

Signed-off-by: David Liu <davidl@twitter.com>

* Added test cases and small fixes.

Signed-off-by: David Liu <davidl@twitter.com>

* fixed missing import with handling for lint error

Signed-off-by: David Liu <davidl@twitter.com>

* marked a test that needs bigquery client requesting to be an integration test & added __ rule in inference

Signed-off-by: David Liu <davidl@twitter.com>

* Code review corrections + BQSource Query arg handling + corresponding test case for it

Signed-off-by: David Liu <davidl@twitter.com>

* CR corrections

Signed-off-by: David Y Liu <davidyliuliu@gmail.com>

Co-authored-by: David Liu <davidl@twitter.com>
jklegar pushed a commit that referenced this pull request May 12, 2021
…urce (#1523)

* Implemented the inferencing. Did cursory runs to make sure it works. More through testing needed.

Signed-off-by: David Liu <davidl@twitter.com>

* fixed issue with mutable default argument in FeatureView

Signed-off-by: David Liu <davidl@twitter.com>

* Fix in example_feature_repo_with_inference.py file

Signed-off-by: David Liu <davidl@twitter.com>

* Added test cases and small fixes.

Signed-off-by: David Liu <davidl@twitter.com>

* fixed missing import with handling for lint error

Signed-off-by: David Liu <davidl@twitter.com>

* marked a test that needs bigquery client requesting to be an integration test & added __ rule in inference

Signed-off-by: David Liu <davidl@twitter.com>

* Code review corrections + BQSource Query arg handling + corresponding test case for it

Signed-off-by: David Liu <davidl@twitter.com>

* CR corrections

Signed-off-by: David Y Liu <davidyliuliu@gmail.com>

Co-authored-by: David Liu <davidl@twitter.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add schema inferencing to feature views
4 participants