Add write support for On Demand Feature Views #4376

franciscojavierarceo · 2024-08-02T03:16:13Z

The problem

As discussed in #4365, we should add the ability to write an On Demand Feature View (ODFV) to store the output of the calculation.

The solution

The ideal solution would require a boolean to the ODFV decorator as metadata to control the write behavior and another boolean in the get_online_features method to allow for users to force features to be recomputed.

The writes would be done by calling push() or write_to_online_store() with the underlying raw data (inputs) and storing the transformed feature values (outputs) into the online store. The ODFV would be called before executing the writes.

The change for the ODFV definition would be:

@on_demand_feature_view(
    sources=[
        driver_hourly_stats_view,
        input_request
    ],
    schema=[
        Field(name='conv_rate_plus_val1_python', dtype=Float64),
        Field(name='conv_rate_plus_val2_python', dtype=Float64),
    ],
    mode="python",
    write_to_online_store=True,                #  THIS IS THE FIRST NEW BOOLEAN
)
def transformed_conv_rate_python(inputs: Dict[str, Any]) -> Dict[str, Any]:
    output: Dict[str, Any] = {
        "conv_rate_plus_val1_python": [
            conv_rate + val_to_add
            for conv_rate, val_to_add in zip(
                inputs["conv_rate"], inputs["val_to_add"]
            )
        ],
        "conv_rate_plus_val2_python": [
            conv_rate + val_to_add
            for conv_rate, val_to_add in zip(
                inputs["conv_rate"], inputs["val_to_add_2"]
            )
        ]
    }
    return output

And the change for the get_online_features call would be:

entity_rows = [
    {
        "driver_id": 1001,
        "val_to_add": 1,
        "val_to_add_2": 2,
    }
]

online_response = store.get_online_features(
    entity_rows=entity_rows,
    features=[
        "driver_hourly_stats:conv_rate",
        "driver_hourly_stats:acc_rate",
        "transformed_conv_rate_python:conv_rate_plus_val1_python",
        "transformed_conv_rate_python:conv_rate_plus_val2_python",
    ],
    force_compute=False,                         #  THIS IS THE SECOND NEW BOOLEAN
).to_dict()

Again the write_to_online_store: bool parameter would dictate whether this ODFV would write to the online store and the force_compute: bool would dictate whether the ODFV would always recalculate the features. There's an argument to be made that we could skip the write_to_online_store in the FeatureView declaration but this metadata would be useful to have in the registry for users.

The write call would be the standard:

store.push("transformed_conv_rate_python", entity_rows, to=PushMode.ONLINE)
# or alternative
store.write_to_online_store("transformed_conv_rate_python", entity_rows, to=PushMode.ONLINE)

Alternatives

We discussed creating a different feature view for this behavior altogether but using the existing ODFV benefits from reusing a lot of existing code and documentation. Moreover, the industry has adopted this language so adding on top of the language feels more natural than adding entirely new language.

Additional context

After the implementation it would be ideal to add this as an example in the local Credit Scoring tutorial.

The text was updated successfully, but these errors were encountered:

tokoko · 2024-08-02T05:26:07Z

There's an argument to be made that we could skip the write_to_online_store in the FeatureView declaration but this metadata would be useful to have in the registry for users.

I was actually thinking the other way around. If we manage to figure out how to use precomputed values only for identical inputs (plus I'm assuming that transformation function itself is deterministic), do we need a force_recompute field? If a user is completely sure that recompute would yield the same result, what's the point of not using the precomputed value? force_recompute in this example is also a bit awkward because it's configured for the whole request which might contain more than one odfvs, not per an individual odfv.

A couple of other points:

We probably need some sort of a ttl for storage. The hypothetical feast cleanup command could be the one that will clean up stale values afterwards.
Another thing that came to my mind is how we would like to handle scenarios when not all features from a particular odfv are requested. I guess this is less of a problem for pandas and python transforms as they have to recompute the whole thing even if not all features are requested, but is a harder problem for substrait.

franciscojavierarceo · 2024-08-02T12:31:40Z

I was actually thinking the other way around. If we manage to figure out how to use precomputed values only for identical inputs (plus I'm assuming that transformation function itself is deterministic), do we need a force_recompute field? If a user is completely sure that recompute would yield the same result, what's the point of not using the precomputed value? force_recompute in this example is also a bit awkward because it's configured for the whole request which might contain more than one odfvs, not per an individual odfv.

We could use an idempotency key for this, this was something we had planned to add in my last job for this exact use case.

We probably need some sort of a ttl for storage. The hypothetical feast cleanup command could be the one that will clean up stale values afterwards.

Yeah, implementing feast cleanup handles it in general.

Another thing that came to my mind is how we would like to handle scenarios when not all features from a particular odfv are requested. I guess this is less of a problem for pandas and python transforms as they have to recompute the whole thing even if not all features are requested, but is a harder problem for substrait.

Agreed.

franciscojavierarceo · 2024-08-05T17:45:33Z

FYI @tokoko I had created a ticket for this before actually here: #4077

franciscojavierarceo · 2024-08-14T12:35:26Z

@tokoko @HaoXuAI @shuchu once this is done I'll consider ODFVs to be out of beta and a complete feature.

tokoko · 2024-08-14T13:37:31Z

I think interface itself is pretty much stable at this point, so in that sense I agree. Still, from the overall ux perspective, there are still known limitations depending on the use case:

pandas is slow both for offline and online.
python doesn't yet support offline.
substrait/ibis is decent for online, but not ideal for offline unless you're using one of the ibis-backed offline stores. plus, it's itself pretty much still experimental.

franciscojavierarceo · 2024-08-14T13:56:45Z

That's a good point.

I think adding Python offline is the only other necessary item. Substrait/ibis is broader scope and not as used by the community yet.

franciscojavierarceo added the kind/feature New feature or request label Aug 2, 2024

franciscojavierarceo self-assigned this Aug 2, 2024

franciscojavierarceo mentioned this issue Aug 2, 2024

Separate transformation and materialization #4365

Closed

This was referenced Aug 5, 2024

Add idempotency key #4077

Open

Add feature transformation support to FeatureView #4277

Open

chore: Update Feast documentation to add information about write patterns and feature transformations #4400

Merged

franciscojavierarceo mentioned this issue Aug 13, 2024

Refactor external add-ons section kubeflow/website#3836

Merged

franciscojavierarceo mentioned this issue Sep 13, 2024

feat: Adding write capabability to online store to on demand feature … #4418

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add write support for On Demand Feature Views #4376

Add write support for On Demand Feature Views #4376

franciscojavierarceo commented Aug 2, 2024 •

edited

Loading

tokoko commented Aug 2, 2024

franciscojavierarceo commented Aug 2, 2024

franciscojavierarceo commented Aug 5, 2024

franciscojavierarceo commented Aug 14, 2024

tokoko commented Aug 14, 2024

franciscojavierarceo commented Aug 14, 2024

Add write support for On Demand Feature Views #4376

Add write support for On Demand Feature Views #4376

Comments

franciscojavierarceo commented Aug 2, 2024 • edited Loading

The problem

The solution

Alternatives

Additional context

tokoko commented Aug 2, 2024

franciscojavierarceo commented Aug 2, 2024

franciscojavierarceo commented Aug 5, 2024

franciscojavierarceo commented Aug 14, 2024

tokoko commented Aug 14, 2024

franciscojavierarceo commented Aug 14, 2024

franciscojavierarceo commented Aug 2, 2024 •

edited

Loading