Skip to content

Commit

Permalink
Set default feature naming to not include feature view name. Add opti…
Browse files Browse the repository at this point in the history
…on to include feature view name in feature naming. (#1641)

* test

Signed-off-by: David Y Liu <davidyliuliu@gmail.com>
Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com>

* refactored existing tests to test full_feature_names feature on data retreival, added new tests also.

Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com>

* removed full_feature_names usage from quickstart and README to have more simple examples. Resolved failing tests.

Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com>

* Update CHANGELOG for Feast v0.10.8

Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com>

* GitBook: [master] 2 pages modified

Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com>

* Schema Inferencing should happen at apply time (#1646)

* wip1

Signed-off-by: David Y Liu <davidyliuliu@gmail.com>

* just need to do clean up

Signed-off-by: David Y Liu <davidyliuliu@gmail.com>

* linted

Signed-off-by: David Y Liu <davidyliuliu@gmail.com>

* improve test coverage

Signed-off-by: David Y Liu <davidyliuliu@gmail.com>

* changed placement of inference methods in repo_operation apply_total

Signed-off-by: David Y Liu <davidyliuliu@gmail.com>

* updated inference method name + changed to void return since it updates in place

Signed-off-by: David Y Liu <davidyliuliu@gmail.com>

* fixed integration test and added comments

Signed-off-by: David Y Liu <davidyliuliu@gmail.com>

* Made DataSource event_timestamp_column optional

Signed-off-by: David Y Liu <davidyliuliu@gmail.com>
Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com>

* GitBook: [master] 80 pages modified

Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com>

* GitBook: [master] 80 pages modified

Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com>

* Provide descriptive error on invalid table reference (#1627)

* Initial commit to catch nonexistent table

Signed-off-by: Cody Lin <codyjlin@yahoo.com>
Signed-off-by: Cody Lin <codyl@twitter.com>

* simplify nonexistent BQ table test

Signed-off-by: Cody Lin <codyl@twitter.com>

* clean up table_exists exception

Signed-off-by: Cody Lin <codyl@twitter.com>

* remove unneeded variable

Signed-off-by: Cody Lin <codyl@twitter.com>

* function name change to _assert_table_exists

Signed-off-by: Cody Lin <codyl@twitter.com>

* Initial commit to catch nonexistent table

Signed-off-by: Cody Lin <codyjlin@yahoo.com>
Signed-off-by: Cody Lin <codyl@twitter.com>

* simplify nonexistent BQ table test

Signed-off-by: Cody Lin <codyl@twitter.com>

* clean up table_exists exception

Signed-off-by: Cody Lin <codyl@twitter.com>

* function name change to _assert_table_exists

Signed-off-by: Cody Lin <codyl@twitter.com>

* fix lint errors and rebase

Signed-off-by: Cody Lin <codyl@twitter.com>

* Fix get_table(None) error

Signed-off-by: Cody Lin <codyl@twitter.com>

* custom exception for both missing file and BQ source

Signed-off-by: Cody Lin <codyl@twitter.com>

* revert FileSource checks

Signed-off-by: Cody Lin <codyl@twitter.com>

* Use DataSourceNotFoundException instead of subclassing

Signed-off-by: Cody Lin <codyl@twitter.com>

* Moved assert_table_exists out of the BQ constructor to apply_total

Signed-off-by: Cody Lin <codyl@twitter.com>

* rename test and test asset

Signed-off-by: Cody Lin <codyl@twitter.com>

* move validate logic back to data_source

Signed-off-by: Cody Lin <codyl@twitter.com>

* fixed tests

Signed-off-by: Cody Lin <codyl@twitter.com>

* Set pytest.integration for tests that access BQ

Signed-off-by: Cody Lin <codyl@twitter.com>

* Import pytest in failed test files

Signed-off-by: Cody Lin <codyl@twitter.com>
Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com>

* Refactor OnlineStoreConfig classes into owning modules (#1649)

* Refactor OnlineStoreConfig classes into owning modules

Signed-off-by: Achal Shah <achals@gmail.com>

* make format

Signed-off-by: Achal Shah <achals@gmail.com>

* Move redis too

Signed-off-by: Achal Shah <achals@gmail.com>

* update test_telemetery

Signed-off-by: Achal Shah <achals@gmail.com>

* add a create_repo_config method that should be called instead of RepoConfig ctor directly

Signed-off-by: Achal Shah <achals@gmail.com>

* fix the table reference in repo_operations

Signed-off-by: Achal Shah <achals@gmail.com>

* reuse create_repo_config

Signed-off-by: Achal Shah <achals@gmail.com>

Remove redis provider reference

* CR comments

Signed-off-by: Achal Shah <achals@gmail.com>

* Remove create_repo_config in favor of __init__

Signed-off-by: Achal Shah <achals@gmail.com>

* make format

Signed-off-by: Achal Shah <achals@gmail.com>

* Remove print statement

Signed-off-by: Achal Shah <achals@gmail.com>
Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com>

* Possibility to specify a project for BigQuery queries (#1656)

Signed-off-by: Matt Delacour <matt.delacour@shopify.com>

Co-authored-by: Achal Shah <achals@gmail.com>
Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com>

* Refactor OfflineStoreConfig classes into their owning modules (#1657)

* Refactor OfflineStoreConfig classes into their owning modules

Signed-off-by: Achal Shah <achals@gmail.com>

* Fix error string

Signed-off-by: Achal Shah <achals@gmail.com>

* Generic error class

Signed-off-by: Achal Shah <achals@gmail.com>

* Merge conflicts

Signed-off-by: Achal Shah <achals@gmail.com>

* make the store type work, and add a test that uses the fully qualified name of the OnlineStore

Signed-off-by: Achal Shah <achals@gmail.com>

* Address comments from previous PR

Signed-off-by: Achal Shah <achals@gmail.com>

* CR updates

Signed-off-by: Achal Shah <achals@gmail.com>
Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com>

* Run python unit tests in parallel (#1652)

Signed-off-by: Achal Shah <achals@gmail.com>
Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com>

* Rename telemetry to usage (#1660)

* Rename telemetry to usage

Signed-off-by: Tsotne Tabidze <tsotne@tecton.ai>

* Update docs

Signed-off-by: Tsotne Tabidze <tsotne@tecton.ai>

* Update .prow and infra

Signed-off-by: Tsotne Tabidze <tsotne@tecton.ai>

* Rename file

Signed-off-by: Tsotne Tabidze <tsotne@tecton.ai>

* Change url

Signed-off-by: Tsotne Tabidze <tsotne@tecton.ai>

* Re-add telemetry.md for backwards-compatibility

Signed-off-by: Tsotne Tabidze <tsotne@tecton.ai>
Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com>

* resolved final comments on PR (variable renaming, refactor tests)

Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com>

* reformatted after merge conflict

Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com>

* Update CHANGELOG for Feast v0.11.0

Signed-off-by: Willem Pienaar <git@willem.co>
Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com>

* Update charts README (#1659)

Adding feast jupyter link to it.

+ Fix the helm 'feast-serving' name in aws/azure terraform.

Signed-off-by: szalai1 <szalaipeti.vagyok@gmail.com>
Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com>

* Added Redis to list of online stores for local provider in providers reference doc. (#1668)

Signed-off-by: Nel Swanepoel <c.swanepoel@ucl.ac.uk>
Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com>

* Grouped inferencing statements together in apply methods for easier readability (#1667)

* grouped inferencing statements together

Signed-off-by: David Y Liu <davidyliuliu@gmail.com>

* update in testing

Signed-off-by: David Y Liu <davidyliuliu@gmail.com>
Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com>

* Add RedshiftDataSource (#1669)

* Add RedshiftDataSource

Signed-off-by: Tsotne Tabidze <tsotne@tecton.ai>

* Call parent __init__ first

Signed-off-by: Tsotne Tabidze <tsotne@tecton.ai>
Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com>

* Provide the user with more options for setting the to_bigquery config (#1661)

* Provide more options for to_bigquery config

Signed-off-by: Cody Lin <codyl@twitter.com>

* Fix default job_config when none; remove excessive testing

Signed-off-by: Cody Lin <codyl@twitter.com>

* Add param type and docstring

Signed-off-by: Cody Lin <codyl@twitter.com>

* add docstrings and typing

Signed-off-by: Cody Lin <codyl@twitter.com>

* Apply docstring suggestions from code review

Co-authored-by: Willem Pienaar <6728866+woop@users.noreply.github.com>
Signed-off-by: Cody Lin <codyjlin@yahoomail.com>

Co-authored-by: Willem Pienaar <6728866+woop@users.noreply.github.com>
Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com>

* Add streaming sources to the FeatureView API (#1664)

* Add a streaming source to the FeatureView API

This diff only updates the API. It is currently up to the providers to actually use this information to spin up resources to consume events from the stream sources.

Signed-off-by: Achal Shah <achals@gmail.com>

* remove stuff from rebase

Signed-off-by: Achal Shah <achals@gmail.com>

* make format

Signed-off-by: Achal Shah <achals@gmail.com>

* Update protos

Signed-off-by: Achal Shah <achals@gmail.com>

* lint

Signed-off-by: Achal Shah <achals@gmail.com>

* format

Signed-off-by: Achal Shah <achals@gmail.com>

* CR

Signed-off-by: Achal Shah <achals@gmail.com>

* fix test

Signed-off-by: Achal Shah <achals@gmail.com>

* lint

Signed-off-by: Achal Shah <achals@gmail.com>
Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com>

* Add to_table() to RetrievalJob object (#1663)

* Add notion of OfflineJob

Signed-off-by: Matt Delacour <matt.delacour@shopify.com>

* Use RetrievalJob instead of creating a new OfflineJob object

Signed-off-by: Matt Delacour <matt.delacour@shopify.com>

* Add to_table() in integration tests

Signed-off-by: Matt Delacour <matt.delacour@shopify.com>

Co-authored-by: Tsotne Tabidze <tsotne@tecton.ai>
Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com>

* Rename to_table to to_arrow (#1671)

Signed-off-by: Matt Delacour <matt.delacour@shopify.com>
Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com>

* Cancel BigQuery job if timeout hits (#1672)

* Cancel BigQuery job if timedout hits

Signed-off-by: Matt Delacour <matt.delacour@shopify.com>

* Fix typo

Signed-off-by: Matt Delacour <matt.delacour@shopify.com>
Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com>

* Fix Feature References example (#1674)

Fix Feature References example by passing `entity_rows` to `get_online_features()`

Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com>

* Allow strings for online/offline store instead of dicts (#1673)

Signed-off-by: Achal Shah <achals@gmail.com>
Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com>

* Remove default list from the FeatureView constructor (#1679)

Signed-off-by: Achal Shah <achals@gmail.com>
Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com>

* made changes requested by @tsotnet

Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com>

* Fix unit tests that got broken by Pandas 1.3.0 release (#1683)

Signed-off-by: Tsotne Tabidze <tsotne@tecton.ai>
Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com>

* Add support for DynamoDB and S3 registry (#1483)

* Add support for DynamoDB and S3 registry

Signed-off-by: lblokhin <lenin133@yandex.ru>

* rcu and wcu as a parameter of dynamodb online store

Signed-off-by: lblokhin <lenin133@yandex.ru>

* fix linter

Signed-off-by: lblokhin <lenin133@yandex.ru>

* aws dependency to extras

Signed-off-by: lblokhin <lenin133@yandex.ru>

* FEAST_S3_ENDPOINT_URL

Signed-off-by: lblokhin <lenin133@yandex.ru>

* tests

Signed-off-by: lblokhin <lenin133@yandex.ru>

* fix signature, after merge

Signed-off-by: lblokhin <lenin133@yandex.ru>

* aws default region name configurable

Signed-off-by: lblokhin <lenin133@yandex.ru>

* add offlinestore config type to test

Signed-off-by: lblokhin <lenin133@yandex.ru>

* review changes

Signed-off-by: lblokhin <lenin133@yandex.ru>

* review requested changes

Signed-off-by: lblokhin <lenin133@yandex.ru>

* integration test for Dynamo

Signed-off-by: lblokhin <lenin133@yandex.ru>

* change the rest of table_name to table_instance (where table_name is actually an instance of DynamoDB Table object)

Signed-off-by: lblokhin <lenin133@yandex.ru>

* fix DynamoDBOnlineStore commit

Signed-off-by: lblokhin <lenin133@yandex.ru>

* move client to _initialize_dynamodb

Signed-off-by: lblokhin <lenin133@yandex.ru>

* rename document_id to entity_id and Row to entity_id

Signed-off-by: lblokhin <lenin133@yandex.ru>

* The default value is None

Signed-off-by: lblokhin <lenin133@yandex.ru>

* Remove Datastore from the docstring.

Signed-off-by: lblokhin <lenin133@yandex.ru>

* get rid of the return call from S3RegistryStore

Signed-off-by: lblokhin <lenin133@yandex.ru>

* merge two exceptions

Signed-off-by: lblokhin <lenin133@yandex.ru>

* For ci requirement

Signed-off-by: lblokhin <lenin133@yandex.ru>

* remove configuration from test

Signed-off-by: lblokhin <lenin133@yandex.ru>

* feast-integration-tests for tests

Signed-off-by: lblokhin <lenin133@yandex.ru>

* change test path

Signed-off-by: lblokhin <lenin133@yandex.ru>

* add fixture feature_store_with_s3_registry to test

Signed-off-by: lblokhin <lenin133@yandex.ru>

* region required

Signed-off-by: lblokhin <lenin133@yandex.ru>

* Address the rest of the comments

Signed-off-by: Tsotne Tabidze <tsotne@tecton.ai>

* Update to_table to to_arrow

Signed-off-by: Tsotne Tabidze <tsotne@tecton.ai>

Co-authored-by: Tsotne Tabidze <tsotne@tecton.ai>
Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com>

* Parallelize integration tests (#1684)

* Parallelize integration tests

Signed-off-by: Tsotne Tabidze <tsotne@tecton.ai>

* Update the usage flag

Signed-off-by: Tsotne Tabidze <tsotne@tecton.ai>
Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com>

* BQ exception should be raised first before we check the timedout (#1675)

Signed-off-by: Matt Delacour <matt.delacour@shopify.com>
Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com>

* Update sdk/python/feast/infra/provider.py

Co-authored-by: Willem Pienaar <6728866+woop@users.noreply.github.com>
Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com>

* Update sdk/python/feast/feature_store.py

Co-authored-by: Willem Pienaar <6728866+woop@users.noreply.github.com>
Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com>

* made error logic/messages more descriptive

Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com>

* made error logic/messages more descriptive.

Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com>

* Simplified error messages

Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com>

* ran formatter, issue in errors.py

Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com>

* python linter issues resolved

Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com>

* removed unnecessary default assignment in get_historical_features. default now set only in feature_store.py

Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com>

* added error message assertion for feature name collisions, and other nitpick changes

Signed-off-by: Mwad22 <51929507+Mwad22@users.noreply.github.com>

Co-authored-by: David Y Liu <davidyliuliu@gmail.com>
Co-authored-by: Tsotne Tabidze <tsotne@tecton.ai>
Co-authored-by: Achal Shah <achals@gmail.com>
Co-authored-by: David Y Liu <7172604+mavysavydav@users.noreply.github.com>
Co-authored-by: Willem Pienaar <github@willem.co>
Co-authored-by: codyjlin <31944154+codyjlin@users.noreply.github.com>
Co-authored-by: Matt Delacour <MattDelac@users.noreply.github.com>
Co-authored-by: Willem Pienaar <git@willem.co>
Co-authored-by: Peter Szalai <szalaipeti.vagyok@gmail.com>
Co-authored-by: Nel Swanepoel <nels@users.noreply.github.com>
Co-authored-by: Willem Pienaar <6728866+woop@users.noreply.github.com>
Co-authored-by: Greg Kuhlmann <greg.kuhlmann@gmail.com>
Co-authored-by: Leonid <lenin133@yandex.ru>
  • Loading branch information
14 people authored Jul 8, 2021
1 parent 6a09d49 commit 2e0113e
Show file tree
Hide file tree
Showing 20 changed files with 356 additions and 139 deletions.
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,11 +75,11 @@ print(training_df.head())
# model = ml.fit(training_df)
```
```commandline
event_timestamp driver_id driver_hourly_stats__conv_rate driver_hourly_stats__acc_rate
2021-04-12 08:12:10 1002 0.497279 0.357702
2021-04-12 10:59:42 1001 0.979747 0.008166
2021-04-12 15:01:12 1004 0.151432 0.551748
2021-04-12 16:40:26 1003 0.951506 0.753572
event_timestamp driver_id conv_rate acc_rate avg_daily_trips
0 2021-04-12 08:12:10+00:00 1002 0.713465 0.597095 531
1 2021-04-12 10:59:42+00:00 1001 0.072752 0.044344 11
2 2021-04-12 15:01:12+00:00 1004 0.658182 0.079150 220
3 2021-04-12 16:40:26+00:00 1003 0.162092 0.309035 959
```

Expand Down
6 changes: 3 additions & 3 deletions docs/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,9 +119,9 @@ pprint(feature_vector)
```python
{
'driver_id': [1001],
'driver_hourly_stats__conv_rate': [0.49274],
'driver_hourly_stats__acc_rate': [0.92743],
'driver_hourly_stats__avg_daily_trips': [72],
'conv_rate': [0.49274],
'acc_rate': [0.92743],
'avg_daily_trips': [72],
}
```

Expand Down
23 changes: 22 additions & 1 deletion sdk/python/feast/errors.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from typing import Set
from typing import List, Set

from colorama import Fore, Style

Expand Down Expand Up @@ -88,6 +88,27 @@ def __init__(self, offline_store_name: str, data_source_name: str):
)


class FeatureNameCollisionError(Exception):
def __init__(self, feature_refs_collisions: List[str], full_feature_names: bool):
if full_feature_names:
collisions = [ref.replace(":", "__") for ref in feature_refs_collisions]
error_message = (
"To resolve this collision, please ensure that the features in question "
"have different names."
)
else:
collisions = [ref.split(":")[1] for ref in feature_refs_collisions]
error_message = (
"To resolve this collision, either use the full feature name by setting "
"'full_feature_names=True', or ensure that the features in question have different names."
)

feature_names = ", ".join(set(collisions))
super().__init__(
f"Duplicate features named {feature_names} found.\n{error_message}"
)


class FeastOnlineStoreInvalidName(Exception):
def __init__(self, online_store_class_name: str):
super().__init__(
Expand Down
107 changes: 70 additions & 37 deletions sdk/python/feast/feature_store.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import sys
from collections import OrderedDict, defaultdict
from collections import Counter, OrderedDict, defaultdict
from datetime import datetime, timedelta
from pathlib import Path
from typing import Any, Dict, List, Optional, Tuple, Union
Expand All @@ -24,7 +23,7 @@

from feast import utils
from feast.entity import Entity
from feast.errors import FeastProviderLoginError, FeatureViewNotFoundException
from feast.errors import FeatureNameCollisionError, FeatureViewNotFoundException
from feast.feature_view import FeatureView
from feast.inference import (
update_data_sources_with_inferred_event_timestamp_col,
Expand Down Expand Up @@ -230,9 +229,11 @@ def apply(
update_entities_with_inferred_types_from_feature_views(
entities_to_update, views_to_update, self.config
)

update_data_sources_with_inferred_event_timestamp_col(
[view.input for view in views_to_update], self.config
)

for view in views_to_update:
view.infer_features_from_input_source(self.config)

Expand All @@ -255,7 +256,10 @@ def apply(

@log_exceptions_and_usage
def get_historical_features(
self, entity_df: Union[pd.DataFrame, str], feature_refs: List[str],
self,
entity_df: Union[pd.DataFrame, str],
feature_refs: List[str],
full_feature_names: bool = False,
) -> RetrievalJob:
"""Enrich an entity dataframe with historical feature values for either training or batch scoring.
Expand All @@ -277,6 +281,9 @@ def get_historical_features(
SQL query. The query must be of a format supported by the configured offline store (e.g., BigQuery)
feature_refs: A list of features that should be retrieved from the offline store. Feature references are of
the format "feature_view:feature", e.g., "customer_fv:daily_transactions".
full_feature_names: A boolean that provides the option to add the feature view prefixes to the feature names,
changing them from the format "feature" to "feature_view__feature" (e.g., "daily_transactions" changes to
"customer_fv__daily_transactions"). By default, this value is set to False.
Returns:
RetrievalJob which can be used to materialize the results.
Expand All @@ -289,32 +296,29 @@ def get_historical_features(
>>> fs = FeatureStore(config=RepoConfig(provider="gcp"))
>>> retrieval_job = fs.get_historical_features(
>>> entity_df="SELECT event_timestamp, order_id, customer_id from gcp_project.my_ds.customer_orders",
>>> feature_refs=["customer:age", "customer:avg_orders_1d", "customer:avg_orders_7d"]
>>> )
>>> feature_refs=["customer:age", "customer:avg_orders_1d", "customer:avg_orders_7d"],
>>> )
>>> feature_data = retrieval_job.to_df()
>>> model.fit(feature_data) # insert your modeling framework here.
"""

all_feature_views = self._registry.list_feature_views(project=self.project)
try:
feature_views = _get_requested_feature_views(
feature_refs, all_feature_views
)
except FeatureViewNotFoundException as e:
sys.exit(e)

_validate_feature_refs(feature_refs, full_feature_names)
feature_views = list(
view for view, _ in _group_feature_refs(feature_refs, all_feature_views)
)

provider = self._get_provider()
try:
job = provider.get_historical_features(
self.config,
feature_views,
feature_refs,
entity_df,
self._registry,
self.project,
)
except FeastProviderLoginError as e:
sys.exit(e)

job = provider.get_historical_features(
self.config,
feature_views,
feature_refs,
entity_df,
self._registry,
self.project,
full_feature_names,
)

return job

Expand Down Expand Up @@ -480,7 +484,10 @@ def tqdm_builder(length):

@log_exceptions_and_usage
def get_online_features(
self, feature_refs: List[str], entity_rows: List[Dict[str, Any]],
self,
feature_refs: List[str],
entity_rows: List[Dict[str, Any]],
full_feature_names: bool = False,
) -> OnlineResponse:
"""
Retrieves the latest online feature data.
Expand Down Expand Up @@ -548,7 +555,8 @@ def get_online_features(
project=self.project, allow_cache=True
)

grouped_refs = _group_refs(feature_refs, all_feature_views)
_validate_feature_refs(feature_refs, full_feature_names)
grouped_refs = _group_feature_refs(feature_refs, all_feature_views)
for table, requested_features in grouped_refs:
entity_keys = _get_table_entity_keys(
table, union_of_entity_keys, entity_name_to_join_key_map
Expand All @@ -565,13 +573,21 @@ def get_online_features(

if feature_data is None:
for feature_name in requested_features:
feature_ref = f"{table.name}__{feature_name}"
feature_ref = (
f"{table.name}__{feature_name}"
if full_feature_names
else feature_name
)
result_row.statuses[
feature_ref
] = GetOnlineFeaturesResponse.FieldStatus.NOT_FOUND
else:
for feature_name in feature_data:
feature_ref = f"{table.name}__{feature_name}"
feature_ref = (
f"{table.name}__{feature_name}"
if full_feature_names
else feature_name
)
if feature_name in requested_features:
result_row.fields[feature_ref].CopyFrom(
feature_data[feature_name]
Expand Down Expand Up @@ -599,7 +615,31 @@ def _entity_row_to_field_values(
return result


def _group_refs(
def _validate_feature_refs(feature_refs: List[str], full_feature_names: bool = False):
collided_feature_refs = []

if full_feature_names:
collided_feature_refs = [
ref for ref, occurrences in Counter(feature_refs).items() if occurrences > 1
]
else:
feature_names = [ref.split(":")[1] for ref in feature_refs]
collided_feature_names = [
ref
for ref, occurrences in Counter(feature_names).items()
if occurrences > 1
]

for feature_name in collided_feature_names:
collided_feature_refs.extend(
[ref for ref in feature_refs if ref.endswith(":" + feature_name)]
)

if len(collided_feature_refs) > 0:
raise FeatureNameCollisionError(collided_feature_refs, full_feature_names)


def _group_feature_refs(
feature_refs: List[str], all_feature_views: List[FeatureView]
) -> List[Tuple[FeatureView, List[str]]]:
""" Get list of feature views and corresponding feature names based on feature references"""
Expand All @@ -612,6 +652,7 @@ def _group_refs(

for ref in feature_refs:
view_name, feat_name = ref.split(":")

if view_name not in view_index:
raise FeatureViewNotFoundException(view_name)
views_features[view_name].append(feat_name)
Expand All @@ -622,14 +663,6 @@ def _group_refs(
return result


def _get_requested_feature_views(
feature_refs: List[str], all_feature_views: List[FeatureView]
) -> List[FeatureView]:
"""Get list of feature views based on feature references"""
# TODO: Get rid of this function. We only need _group_refs
return list(view for view, _ in _group_refs(feature_refs, all_feature_views))


def _get_table_entity_keys(
table: FeatureView, entity_keys: List[EntityKeyProto], join_key_map: Dict[str, str],
) -> List[EntityKeyProto]:
Expand Down
2 changes: 2 additions & 0 deletions sdk/python/feast/infra/aws.py
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,7 @@ def get_historical_features(
entity_df: Union[pandas.DataFrame, str],
registry: Registry,
project: str,
full_feature_names: bool,
) -> RetrievalJob:
job = self.offline_store.get_historical_features(
config=config,
Expand All @@ -137,5 +138,6 @@ def get_historical_features(
entity_df=entity_df,
registry=registry,
project=project,
full_feature_names=full_feature_names,
)
return job
2 changes: 2 additions & 0 deletions sdk/python/feast/infra/gcp.py
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,7 @@ def get_historical_features(
entity_df: Union[pandas.DataFrame, str],
registry: Registry,
project: str,
full_feature_names: bool,
) -> RetrievalJob:
job = self.offline_store.get_historical_features(
config=config,
Expand All @@ -139,5 +140,6 @@ def get_historical_features(
entity_df=entity_df,
registry=registry,
project=project,
full_feature_names=full_feature_names,
)
return job
2 changes: 2 additions & 0 deletions sdk/python/feast/infra/local.py
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,7 @@ def get_historical_features(
entity_df: Union[pd.DataFrame, str],
registry: Registry,
project: str,
full_feature_names: bool,
) -> RetrievalJob:
return self.offline_store.get_historical_features(
config=config,
Expand All @@ -138,6 +139,7 @@ def get_historical_features(
entity_df=entity_df,
registry=registry,
project=project,
full_feature_names=full_feature_names,
)


Expand Down
15 changes: 12 additions & 3 deletions sdk/python/feast/infra/offline_stores/bigquery.py
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,7 @@ def get_historical_features(
entity_df: Union[pandas.DataFrame, str],
registry: Registry,
project: str,
full_feature_names: bool = False,
) -> RetrievalJob:
# TODO: Add entity_df validation in order to fail before interacting with BigQuery
assert isinstance(config.offline_store, BigQueryOfflineStoreConfig)
Expand All @@ -121,7 +122,11 @@ def get_historical_features(

# Build a query context containing all information required to template the BigQuery SQL query
query_context = get_feature_view_query_context(
feature_refs, feature_views, registry, project
feature_refs,
feature_views,
registry,
project,
full_feature_names=full_feature_names,
)

# TODO: Infer min_timestamp and max_timestamp from entity_df
Expand All @@ -132,6 +137,7 @@ def get_historical_features(
max_timestamp=datetime.now() + timedelta(days=1),
left_table_query_string=str(table.reference),
entity_df_event_timestamp_col=entity_df_event_timestamp_col,
full_feature_names=full_feature_names,
)

job = BigQueryRetrievalJob(query=query, client=client, config=config)
Expand Down Expand Up @@ -373,6 +379,7 @@ def get_feature_view_query_context(
feature_views: List[FeatureView],
registry: Registry,
project: str,
full_feature_names: bool = False,
) -> List[FeatureViewQueryContext]:
"""Build a query context containing all information required to template a BigQuery point-in-time SQL query"""

Expand Down Expand Up @@ -432,6 +439,7 @@ def build_point_in_time_query(
max_timestamp: datetime,
left_table_query_string: str,
entity_df_event_timestamp_col: str,
full_feature_names: bool = False,
):
"""Build point-in-time query between each feature view table and the entity dataframe"""
template = Environment(loader=BaseLoader()).from_string(
Expand All @@ -448,6 +456,7 @@ def build_point_in_time_query(
[entity for fv in feature_view_query_contexts for entity in fv.entities]
),
"featureviews": [asdict(context) for context in feature_view_query_contexts],
"full_feature_names": full_feature_names,
}

query = template.render(template_context)
Expand Down Expand Up @@ -521,7 +530,7 @@ def _get_bigquery_client(project: Optional[str] = None):
{{ featureview.created_timestamp_column ~ ' as created_timestamp,' if featureview.created_timestamp_column else '' }}
{{ featureview.entity_selections | join(', ')}},
{% for feature in featureview.features %}
{{ feature }} as {{ featureview.name }}__{{ feature }}{% if loop.last %}{% else %}, {% endif %}
{{ feature }} as {% if full_feature_names %}{{ featureview.name }}__{{feature}}{% else %}{{ feature }}{% endif %}{% if loop.last %}{% else %}, {% endif %}
{% endfor %}
FROM {{ featureview.table_subquery }}
),
Expand Down Expand Up @@ -614,7 +623,7 @@ def _get_bigquery_client(project: Optional[str] = None):
SELECT
entity_row_unique_id,
{% for feature in featureview.features %}
{{ featureview.name }}__{{ feature }},
{% if full_feature_names %}{{ featureview.name }}__{{feature}}{% else %}{{ feature }}{% endif %},
{% endfor %}
FROM {{ featureview.name }}__cleaned
) USING (entity_row_unique_id)
Expand Down
Loading

0 comments on commit 2e0113e

Please sign in to comment.