Skip to content

Commit

Permalink
feat: Adding docs outlining native Python transformations on singleto…
Browse files Browse the repository at this point in the history
…ns (#4741)
  • Loading branch information
franciscojavierarceo authored Nov 7, 2024
1 parent 4a89252 commit 0150278
Showing 1 changed file with 105 additions and 71 deletions.
176 changes: 105 additions & 71 deletions docs/reference/beta-on-demand-feature-view.md
Original file line number Diff line number Diff line change
@@ -1,122 +1,147 @@
# \[Beta] On demand feature view
# [Beta] On Demand Feature Views

**Warning**: This is an experimental feature. To our knowledge, this is stable, but there are still rough edges in the experience. Contributions are welcome!
**Warning**: This is an experimental feature. While it is stable to our knowledge, there may still be rough edges in the experience. Contributions are welcome!

## Overview

On Demand Feature Views (ODFVs) allow data scientists to use existing features and request-time data (features only
available at request time) to transform and create new features. Users define Python transformation logic which is
executed during both historical retrieval and online retrieval. Additionally, ODFVs provide flexibility in
applying transformations either during data ingestion (at write time) or during feature retrieval (at read time),
controlled via the `write_to_online_store` parameter.
On Demand Feature Views (ODFVs) allow data scientists to use existing features and request-time data to transform and
create new features. Users define transformation logic that is executed during both historical and online retrieval.
Additionally, ODFVs provide flexibility in applying transformations either during data ingestion (at write time) or
during feature retrieval (at read time), controlled via the `write_to_online_store` parameter.

By setting `write_to_online_store=True`, transformations are applied during data ingestion, and the transformed
features are stored in the online store. This can improve online feature retrieval performance by reducing computation
during reads. Conversely, if `write_to_online_store=False` (the default if omitted), transformations are applied during
feature retrieval.

### Why use on demand feature views?
### Why Use On Demand Feature Views?

This enables data scientists to easily impact the online feature retrieval path. For example, a data scientist could
ODFVs enable data scientists to easily impact the online feature retrieval path. For example, a data scientist could:

1. Call `get_historical_features` to generate a training dataframe
2. Iterate in notebook on feature engineering in Pandas/Python
3. Copy transformation logic into ODFVs and commit to a development branch of the feature repository
4. Verify with `get_historical_features` (on a small dataset) that the transformation gives expected output over historical data
1. Call `get_historical_features` to generate a training dataset.
2. Iterate in a notebook and do your feature engineering using Pandas or native Python.
3. Copy transformation logic into ODFVs and commit to a development branch of the feature repository.
4. Verify with `get_historical_features` (on a small dataset) that the transformation gives the expected output over historical data.
5. Decide whether to apply the transformation on writes or on reads by setting the `write_to_online_store` parameter accordingly.
6. Verify with `get_online_features` on dev branch that the transformation correctly outputs online features
7. Submit a pull request to the staging / prod branches which impact production traffic
6. Verify with `get_online_features` on the development branch that the transformation correctly outputs online features.
7. Submit a pull request to the staging or production branches, impacting production traffic.

## CLI
## Transformation Modes

There are new CLI commands:
When defining an ODFV, you can specify the transformation mode using the `mode` parameter. Feast supports the following modes:

* `feast on-demand-feature-views list` lists all registered on demand feature view after `feast apply` is run
* `feast on-demand-feature-views describe [NAME]` describes the definition of an on demand feature view
- **Pandas Mode (`mode="pandas"`)**: The transformation function takes a Pandas DataFrame as input and returns a Pandas DataFrame as output. This mode is useful for batch transformations over multiple rows.
- **Native Python Mode (`mode="python"`)**: The transformation function uses native Python and can operate on inputs as lists of values or as single dictionaries representing a singleton (single row).

## Example
### Singleton Transformations in Native Python Mode

Native Python mode supports transformations on singleton dictionaries by setting `singleton=True`. This allows you to
write transformation functions that operate on a single row at a time, making the code more intuitive and aligning with
how data scientists typically think about data transformations.

## Example
See [https://github.com/feast-dev/on-demand-feature-views-demo](https://github.com/feast-dev/on-demand-feature-views-demo) for an example on how to use on demand feature views.

### **Registering transformations**

On Demand Transformations support transformations using Pandas and native Python. Note, Native Python is much faster
but not yet tested for offline retrieval.
## Registering Transformations

When defining an ODFV, you can control when the transformation is applied using the write_to_online_store parameter:
When defining an ODFV, you can control when the transformation is applied using the `write_to_online_store` parameter:

- `write_to_online_store=True`: The transformation is applied during data ingestion (on write), and the transformed features are stored in the online store.
- `write_to_online_store=False` (default when omitted): The transformation is applied during feature retrieval (on read).
- `write_to_online_store=False` (default): The transformation is applied during feature retrieval (on read).

We register `RequestSource` inputs and the transform in `on_demand_feature_view`:
### Examples

## Example of an On Demand Transformation on Read
#### Example 1: On Demand Transformation on Read Using Pandas Mode

```python
from feast import Field, RequestSource
from feast import Field, RequestSource, on_demand_feature_view
from feast.types import Float64, Int64
from typing import Any, Dict
import pandas as pd

# Define a request data source which encodes features / information only
# available at request time (e.g. part of the user initiated HTTP request)
# Define a request data source for request-time features
input_request = RequestSource(
name="vals_to_add",
schema=[
Field(name='val_to_add', dtype=Int64),
Field(name='val_to_add_2', dtype=Int64)
]
Field(name="val_to_add", dtype=Int64),
Field(name="val_to_add_2", dtype=Int64),
],
)

# Use the input data and feature view features to create new features Pandas mode
# Use input data and feature view features to create new features in Pandas mode
@on_demand_feature_view(
sources=[
driver_hourly_stats_view,
input_request
],
schema=[
Field(name='conv_rate_plus_val1', dtype=Float64),
Field(name='conv_rate_plus_val2', dtype=Float64)
],
mode="pandas",
sources=[driver_hourly_stats_view, input_request],
schema=[
Field(name="conv_rate_plus_val1", dtype=Float64),
Field(name="conv_rate_plus_val2", dtype=Float64),
],
mode="pandas",
)
def transformed_conv_rate(features_df: pd.DataFrame) -> pd.DataFrame:
df = pd.DataFrame()
df['conv_rate_plus_val1'] = (features_df['conv_rate'] + features_df['val_to_add'])
df['conv_rate_plus_val2'] = (features_df['conv_rate'] + features_df['val_to_add_2'])
df["conv_rate_plus_val1"] = features_df["conv_rate"] + features_df["val_to_add"]
df["conv_rate_plus_val2"] = features_df["conv_rate"] + features_df["val_to_add_2"]
return df
```

#### Example 2: On Demand Transformation on Read Using Native Python Mode (List Inputs)

```python
from feast import Field, on_demand_feature_view
from feast.types import Float64
from typing import Any, Dict

# Use the input data and feature view features to create new features Python mode
# Use input data and feature view features to create new features in Native Python mode
@on_demand_feature_view(
sources=[
driver_hourly_stats_view,
input_request
],
sources=[driver_hourly_stats_view, input_request],
schema=[
Field(name='conv_rate_plus_val1_python', dtype=Float64),
Field(name='conv_rate_plus_val2_python', dtype=Float64),
Field(name="conv_rate_plus_val1_python", dtype=Float64),
Field(name="conv_rate_plus_val2_python", dtype=Float64),
],
mode="python",
)
def transformed_conv_rate_python(inputs: Dict[str, Any]) -> Dict[str, Any]:
output: Dict[str, Any] = {
output = {
"conv_rate_plus_val1_python": [
conv_rate + val_to_add
for conv_rate, val_to_add in zip(
inputs["conv_rate"], inputs["val_to_add"]
)
for conv_rate, val_to_add in zip(inputs["conv_rate"], inputs["val_to_add"])
],
"conv_rate_plus_val2_python": [
conv_rate + val_to_add
for conv_rate, val_to_add in zip(
inputs["conv_rate"], inputs["val_to_add_2"]
)
]
],
}
return output
```

#### **New** Example 3: On Demand Transformation on Read Using Native Python Mode (Singleton Input)

```python
from feast import Field, on_demand_feature_view
from feast.types import Float64
from typing import Any, Dict

# Use input data and feature view features to create new features in Native Python mode with singleton input
@on_demand_feature_view(
sources=[driver_hourly_stats_view, input_request],
schema=[
Field(name="conv_rate_plus_acc_singleton", dtype=Float64),
],
mode="python",
singleton=True,
)
def transformed_conv_rate_singleton(inputs: Dict[str, Any]) -> Dict[str, Any]:
output = {
"conv_rate_plus_acc_singleton": inputs["conv_rate"] + inputs["acc_rate"]
}
return output
```

## Example of an On Demand Transformation on Write
In this example, `inputs` is a dictionary representing a single row, and the transformation function returns a dictionary of transformed features for that single row. This approach is more intuitive and aligns with how data scientists typically process single data records.

#### Example 4: On Demand Transformation on Write Using Pandas Mode

```python
from feast import Field, on_demand_feature_view
Expand All @@ -126,22 +151,22 @@ import pandas as pd
# Existing Feature View
driver_hourly_stats_view = ...

# Define an ODFV without RequestSource
# Define an ODFV applying transformation during write time
@on_demand_feature_view(
sources=[driver_hourly_stats_view],
schema=[
Field(name='conv_rate_adjusted', dtype=Float64),
Field(name="conv_rate_adjusted", dtype=Float64),
],
mode="pandas",
write_to_online_store=True, # Apply transformation during write time
)
def transformed_conv_rate(features_df: pd.DataFrame) -> pd.DataFrame:
df = pd.DataFrame()
df['conv_rate_adjusted'] = features_df['conv_rate'] * 1.1 # Adjust conv_rate by 10%
df["conv_rate_adjusted"] = features_df["conv_rate"] * 1.1 # Adjust conv_rate by 10%
return df
```
Then to ingest the data with the new feature view make sure to include all of the input features required for the
transformations:

To ingest data with the new feature view, include all input features required for the transformations:

```python
from feast import FeatureStore
Expand All @@ -160,17 +185,17 @@ data = pd.DataFrame({

# Ingest data to the online store
store.push("driver_hourly_stats_view", data)
```
```

### **Feature retrieval**
### Feature Retrieval

{% hint style="info" %}
The on demand feature view's name is the function name (i.e. `transformed_conv_rate`).
**Note**: The name of the on demand feature view is the function name (e.g., `transformed_conv_rate`).
{% endhint %}


#### Offline Features
And then to retrieve historical, we can call this in a feature service or reference individual features:

Retrieve historical features by referencing individual features or using a feature service:

```python
training_df = store.get_historical_features(
Expand All @@ -181,14 +206,14 @@ training_df = store.get_historical_features(
"driver_hourly_stats:avg_daily_trips",
"transformed_conv_rate:conv_rate_plus_val1",
"transformed_conv_rate:conv_rate_plus_val2",
"transformed_conv_rate_singleton:conv_rate_plus_acc_singleton",
],
).to_df()

```

#### Online Features

And then to retrieve online, we can call this in a feature service or reference individual features:
Retrieve online features by referencing individual features or using a feature service:

```python
entity_rows = [
Expand All @@ -206,6 +231,15 @@ online_response = store.get_online_features(
"driver_hourly_stats:acc_rate",
"transformed_conv_rate_python:conv_rate_plus_val1_python",
"transformed_conv_rate_python:conv_rate_plus_val2_python",
"transformed_conv_rate_singleton:conv_rate_plus_acc_singleton",
],
).to_dict()
```

## CLI Commands
There are new CLI commands to manage on demand feature views:

feast on-demand-feature-views list: Lists all registered on demand feature views after feast apply is run.
feast on-demand-feature-views describe [NAME]: Describes the definition of an on demand feature view.


0 comments on commit 0150278

Please sign in to comment.