diff --git a/docs/README.md b/docs/README.md index b838e5fe5b..f387406c3f 100644 --- a/docs/README.md +++ b/docs/README.md @@ -57,7 +57,7 @@ Many companies have used Feast to power real-world ML use cases such as: ## How can I get started? {% hint style="info" %} -The best way to learn Feast is to use it. Head over to our [Quickstart](getting-started/quickstart.md) and try it out! +The best way to learn Feast is to use it. Join our [Slack channel](http://slack.feast.dev) and head over to our [Quickstart](getting-started/quickstart.md) and try it out! {% endhint %} Explore the following resources to get started with Feast: diff --git a/docs/SUMMARY.md b/docs/SUMMARY.md index 18d1aeb280..6ef1c9c49e 100644 --- a/docs/SUMMARY.md +++ b/docs/SUMMARY.md @@ -1,7 +1,7 @@ # Table of contents * [Introduction](README.md) -* [Community](community.md) +* [Community & getting help](community.md) * [Roadmap](roadmap.md) * [Changelog](https://github.com/feast-dev/feast/blob/master/CHANGELOG.md) @@ -24,7 +24,6 @@ * [Online store](getting-started/architecture-and-components/online-store.md) * [Batch Materialization Engine](getting-started/architecture-and-components/batch-materialization-engine.md) * [Provider](getting-started/architecture-and-components/provider.md) -* [Learning by example](getting-started/feast-workshop.md) * [Third party integrations](getting-started/third-party-integrations.md) * [FAQ](getting-started/faq.md) diff --git a/docs/community.md b/docs/community.md index dc1cc8a0fe..bd01b2a64b 100644 --- a/docs/community.md +++ b/docs/community.md @@ -1,17 +1,19 @@ -# Community +# Community & Getting Help ## Links & Resources -* [Slack](https://slack.feast.dev): Feel free to ask questions or say hello! +* [GitHub Repository](https://github.com/feast-dev/feast/): Find the complete Feast codebase on GitHub. +* [Slack](https://slack.feast.dev): Feel free to ask questions or say hello! This is the main place where maintainers and contributors brainstorm and where users ask questions or discuss best practices. + * Feast users should join `#feast-general` or `#feast-beginners` to ask questions + * Feast developers / contributors should join `#feast-development` * [Mailing list](https://groups.google.com/d/forum/feast-dev): We have both a user and developer mailing list. * Feast users should join [feast-discuss@googlegroups.com](mailto:feast-discuss@googlegroups.com) group by clicking [here](https://groups.google.com/g/feast-discuss). - * Feast developers should join [feast-dev@googlegroups.com](mailto:feast-dev@googlegroups.com) group by clicking [here](https://groups.google.com/d/forum/feast-dev). + * Feast developers / contributors should join [feast-dev@googlegroups.com](mailto:feast-dev@googlegroups.com) group by clicking [here](https://groups.google.com/d/forum/feast-dev). * [Community Calendar](https://calendar.google.com/calendar/u/0?cid=ZTFsZHVhdGM3MDU3YTJucTBwMzNqNW5rajBAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ): Includes community calls and design meetings. * [Google Folder](https://drive.google.com/drive/u/0/folders/1jgMHOPDT2DvBlJeO9LCM79DP4lm4eOrR): This folder is used as a central repository for all Feast resources. For example: * Design proposals in the form of Request for Comments (RFC). * User surveys and meeting minutes. * Slide decks of conferences our contributors have spoken at. -* [Feast GitHub Repository](https://github.com/feast-dev/feast/): Find the complete Feast codebase on GitHub. * [Feast Linux Foundation Wiki](https://wiki.lfaidata.foundation/display/FEAST/Feast+Home): Our LFAI wiki page contains links to resources for contributors and maintainers. ## How can I get help? @@ -22,17 +24,30 @@ ## Community Calls +### General community call (biweekly) We have a user and contributor community call every two weeks (US & EU friendly). {% hint style="info" %} Please join the above Feast user groups in order to see calendar invites to the community calls {% endhint %} -### Frequency (every 2 weeks) +#### Frequency (every 2 weeks) * Tuesday 10:00 am to 10:30 am PST -### Links +#### Links * Zoom: [https://zoom.us/j/6325193230](https://zoom.us/j/6325193230) * Meeting notes (incl recordings): [https://bit.ly/feast-notes](https://bit.ly/feast-notes) + +### Developers call (biweekly) +We also have a `#feast-development` community call every two weeks, where we discuss contributions + brainstorm best practices. + +#### Frequency (every 2 weeks) + +* Tuesday 8:00 am to 8:30 am PST + +#### Links + +* Meeting notes (incl recordings): [Feast Development Biweekly](https://docs.google.com/document/d/1zUbIWFWjaBEVlToOdupnmKQwgAtFYx41sPoEEEdd2io/edit#) +* Zoom: [https://zoom.us/j/93657748160?pwd=K3ZpdzhqejgrcXNhc3BlSjFMdzUxdz09](https://zoom.us/j/93657748160?pwd=K3ZpdzhqejgrcXNhc3BlSjFMdzUxdz09) diff --git a/docs/getting-started/feast-workshop.md b/docs/getting-started/feast-workshop.md deleted file mode 100644 index 0d64845222..0000000000 --- a/docs/getting-started/feast-workshop.md +++ /dev/null @@ -1,44 +0,0 @@ -# Learning by example - -This workshop aims to teach users about Feast. - -We explain concepts & best practices by example, and also showcase how to address common use cases. - -### Pre-requisites - -This workshop assumes you have the following installed: - -* A local development environment that supports running Jupyter notebooks (e.g. VSCode with Jupyter plugin) -* Python 3.7+ -* Java 11 (for Spark, e.g. `brew install java11`) -* pip -* Docker & Docker Compose (e.g. `brew install docker docker-compose`) -* Terraform ([docs](https://learn.hashicorp.com/tutorials/terraform/install-cli#install-terraform)) -* AWS CLI -* An AWS account setup with credentials via `aws configure` (e.g see [AWS credentials quickstart](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-quickstart.html#cli-configure-quickstart-creds)) - -Since we'll be learning how to leverage Feast in CI/CD, you'll also need to fork this workshop repository. - -#### **Caveats** - -* M1 Macbook development is untested with this flow. See also [How to run / develop for Feast on M1 Macs](https://github.com/feast-dev/feast/issues/2105). -* Windows development has only been tested with WSL. You will need to follow this [guide](https://docs.docker.com/desktop/windows/wsl/) to have Docker play nicely. - -### Modules - -_See also:_ [_Feast quickstart_](https://docs.feast.dev/getting-started/quickstart)_,_ [_Feast x Great Expectations tutorial_](https://docs.feast.dev/tutorials/validating-historical-features) - -These are meant mostly to be done in order, with examples building on previous concepts. - -See [https://github.com/feast-dev/feast-workshop](https://github.com/feast-dev/feast-workshop) - -| Time (min) | Description | Module | -| :--------: | ----------------------------------------------------------------------- | -------- | -| 30-45 | Setting up Feast projects & CI/CD + powering batch predictions | Module 0 | -| 15-20 | Streaming ingestion & online feature retrieval with Kafka, Spark, Redis | Module 1 | -| 10-15 | Real-time feature engineering with on demand transformations | Module 2 | -| TBD | Feature server deployment (embed, as a service, AWS Lambda) | TBD | -| TBD | Versioning features / models in Feast | TBD | -| TBD | Data quality monitoring in Feast | TBD | -| TBD | Batch transformations | TBD | -| TBD | Stream transformations | TBD | diff --git a/docs/how-to-guides/feast-snowflake-gcp-aws/README.md b/docs/how-to-guides/feast-snowflake-gcp-aws/README.md index 753650080b..0f6d099349 100644 --- a/docs/how-to-guides/feast-snowflake-gcp-aws/README.md +++ b/docs/how-to-guides/feast-snowflake-gcp-aws/README.md @@ -12,3 +12,7 @@ {% page-ref page="read-features-from-the-online-store.md" %} +{% page-ref page="../scaling-feast.md" %} + +{% page-ref page="../structuring-repos.md" %} + diff --git a/docs/how-to-guides/running-feast-in-production.md b/docs/how-to-guides/running-feast-in-production.md index 278b7edfb1..b7cd108bc3 100644 --- a/docs/how-to-guides/running-feast-in-production.md +++ b/docs/how-to-guides/running-feast-in-production.md @@ -4,7 +4,7 @@ After learning about Feast concepts and playing with Feast locally, you're now ready to use Feast in production. This guide aims to help with the transition from a sandbox project to production-grade deployment in the cloud or on-premise. -Overview of typical production configuration is given below: +A typical production architecture looks like: ![Overview](production-simple.png) @@ -21,10 +21,8 @@ Additionally, please check the how-to guide for some specific recommendations on In this guide we will show you how to: 1. Deploy your feature store and keep your infrastructure in sync with your feature repository -2. Keep the data in your online store up to date +2. Keep the data in your online store up to date (from batch and stream sources) 3. Use Feast for model training and serving -4. Ingest features from a stream source -5. Monitor your production deployment ## 1. Automatically deploying changes to your feature definitions @@ -43,6 +41,7 @@ We recommend typically setting up CI/CD to automatically run `feast plan` and `f ### 1.4 Setting up multiple environments A common scenario when using Feast in production is to want to test changes to Feast object definitions. For this, we recommend setting up a _staging_ environment for your offline and online stores, which mirrors _production_ (with potentially a smaller data set). + Having this separate environment allows users to test changes by first applying them to staging, and then promoting the changes to production after verifying the changes on staging. Different options are presented in the [how-to guide](structuring-repos.md). @@ -79,110 +78,108 @@ batch_engine: key: aws-secret-access-key ``` +### 2.2 Scheduled materialization +> See also [data ingestion](../getting-started/concepts/data-ingestion.md#batch-data-ingestion) for code snippets -### 2.2 Manual materialization - -The simplest way to schedule materialization is to run an **incremental** materialization using the Feast CLI: - -``` -feast materialize-incremental 2022-01-01T00:00:00 -``` - -The above command will load all feature values from all feature view sources into the online store up to the time `2022-01-01T00:00:00`. - -A timestamp is required to set the end date for materialization. If your source is fully up-to-date then the end date would be the current time. However, if you are querying a source where data is not yet available, then you do not want to set the timestamp to the current time. You would want to use a timestamp that ends at a date for which data is available. The next time `materialize-incremental` is run, Feast will load data that starts from the previous end date, so it is important to ensure that the materialization interval does not overlap with time periods for which data has not been made available. This is commonly the case when your source is an ETL pipeline that is scheduled on a daily basis. - -An alternative approach to incremental materialization (where Feast tracks the intervals of data that need to be ingested), is to call Feast directly from your scheduler like Airflow. In this case, Airflow is the system that tracks the intervals that have been ingested. - -``` -feast materialize -v driver_hourly_stats 2020-01-01T00:00:00 2020-01-02T00:00:00 -``` - -In the above example we are materializing the source data from the `driver_hourly_stats` feature view over a day. This command can be scheduled as the final operation in your Airflow ETL, which runs after you have computed your features and stored them in the source location. Feast will then load your feature data into your online store. - -The timestamps above should match the interval of data that has been computed by the data transformation system. +It is up to you to orchestrate and schedule runs of materialization. -### 2.3 Automate periodic materialization +Feast keeps the history of materialization in its registry so that the choice could be as simple as a [unix cron util](https://en.wikipedia.org/wiki/Cron). Cron util should be sufficient when you have just a few materialization jobs (it's usually one materialization job per feature view) triggered infrequently. -It is up to you which orchestration/scheduler to use to periodically run `$ feast materialize`. Feast keeps the history of materialization in its registry so that the choice could be as simple as a [unix cron util](https://en.wikipedia.org/wiki/Cron). Cron util should be sufficient when you have just a few materialization jobs (it's usually one materialization job per feature view) triggered infrequently. However, the amount of work can quickly outgrow the resources of a single machine. That happens because the materialization job needs to repackage all rows before writing them to an online store. That leads to high utilization of CPU and memory. In this case, you might want to use a job orchestrator to run multiple jobs in parallel using several workers. Kubernetes Jobs or Airflow are good choices for more comprehensive job orchestration. +However, the amount of work can quickly outgrow the resources of a single machine. That happens because the materialization job needs to repackage all rows before writing them to an online store. That leads to high utilization of CPU and memory. In this case, you might want to use a job orchestrator to run multiple jobs in parallel using several workers. Kubernetes Jobs or Airflow are good choices for more comprehensive job orchestration. -If you are using Airflow as a scheduler, Feast can be invoked through the [BashOperator](https://airflow.apache.org/docs/apache-airflow/stable/howto/operator/bash.html) after the [Python SDK](https://pypi.org/project/feast/) has been installed into a virtual environment and your feature repo has been synced: +If you are using Airflow as a scheduler, Feast can be invoked through a [PythonOperator](https://airflow.apache.org/docs/apache-airflow/stable/howto/operator/python.html) after the [Python SDK](https://pypi.org/project/feast/) has been installed into a virtual environment and your feature repo has been synced: ```python import datetime - -materialize = BashOperator( - task_id='materialize', - bash_command=f'feast materialize-incremental {datetime.datetime.now().replace(microsecond=0).isoformat()}', +from airflow.operators.python_operator import PythonOperator +from feast import RepoConfig, FeatureStore +from feast.infra.online_stores.dynamodb import DynamoDBOnlineStoreConfig +from feast.repo_config import RegistryConfig + +# Define Python callable +def materialize(): + repo_config = RepoConfig( + registry=RegistryConfig(path="s3://[YOUR BUCKET]/registry.pb"), + project="feast_demo_aws", + provider="aws", + offline_store="file", + online_store=DynamoDBOnlineStoreConfig(region="us-west-2") + ) + store = FeatureStore(config=repo_config) + # Option 1: materialize just one feature view + # store.materialize_incremental(datetime.datetime.now(), feature_views=["my_fv_name"]) + # Option 2: materialize all feature views incrementally + store.materialize_incremental(datetime.datetime.now()) + +# Use Airflow PythonOperator +materialize_python = PythonOperator( + task_id='materialize_python', + python_callable=materialize, ) ``` {% hint style="success" %} -Important note: Airflow worker must have read and write permissions to the registry file on GS / S3 since it pulls configuration and updates materialization history. +Important note: Airflow worker must have read and write permissions to the registry file on GCS / S3 since it pulls configuration and updates materialization history. {% endhint %} +### 2.3 Stream feature ingestion +See more details at [data ingestion](../getting-started/concepts/data-ingestion.md), which shows how to ingest streaming features or 3rd party feature data via a push API. + +This supports pushing feature values into Feast to both online or offline stores. ## 3. How to use Feast for model training -After we've defined our features and data sources in the repository, we can generate training datasets. +### 3.1. Generating training data +After we've defined our features and data sources in the repository, we can generate training datasets. We highly recommend you use a `FeatureService` to version the features that go into a specific model version. -The first thing we need to do in our training code is to create a `FeatureStore` object with a path to the registry. +1. The first thing we need to do in our training code is to create a `FeatureStore` object with a path to the registry. + - One way to ensure your production clients have access to the feature store is to provide a copy of the `feature_store.yaml` to those pipelines. This `feature_store.yaml` file will have a reference to the feature store registry, which allows clients to retrieve features from offline or online stores. -One way to ensure your production clients have access to the feature store is to provide a copy of the `feature_store.yaml` to those pipelines. This `feature_store.yaml` file will have a reference to the feature store registry, which allows clients to retrieve features from offline or online stores. + ```python + from feast import FeatureStore -```python -from feast import FeatureStore + fs = FeatureStore(repo_path="production/") + ``` +2. Then, you need to generate an **entity dataframe**. You have two options + - Create an entity dataframe manually and pass it in + - Use a SQL query to dynamically generate lists of entities (e.g. all entities within a time range) and timestamps to pass into Feast +3. Then, training data can be retrieved as follows: -fs = FeatureStore(repo_path="production/") -``` + ```python + training_retrieval_job = fs.get_historical_features( + entity_df=entity_df_or_sql_string, + features=fs.get_feature_service("driver_activity_v1"), + ) -Then, training data can be retrieved as follows: + # Option 1: In memory model training + model = ml.fit(training_retrieval_job.to_df()) -```python -feature_refs = [ - 'driver_hourly_stats:conv_rate', - 'driver_hourly_stats:acc_rate', - 'driver_hourly_stats:avg_daily_trips' -] - -training_df = fs.get_historical_features( - entity_df=entity_df, - features=feature_refs, -).to_df() + # Option 2: Unloading to blob storage. Further post-processing can occur before kicking off distributed training. + training_retrieval_job.to_remote_storage() + ``` -model = ml.fit(training_df) -``` +### 3.2 Versioning features that power ML models +The most common way to productionize ML models is by storing and versioning models in a "model store", and then deploying these models into production. When using Feast, it is recommended that the feature service name and the model versions have some established convention. -The most common way to productionize ML models is by storing and versioning models in a "model store", and then deploying these models into production. When using Feast, it is recommended that the list of feature references also be saved alongside the model. This ensures that models and the features they are trained on are paired together when being shipped into production: +For example, in MLflow: ```python -import json -# Save model -model.save('my_model.bin') - -# Save features -with open('feature_refs.json', 'w') as f: - json.dump(feature_refs, f) -``` - -To test your model locally, you can simply create a `FeatureStore` object, fetch online features, and then make a prediction: +import mlflow.pyfunc -```python -# Load model -model = ml.load('my_model.bin') - -# Load feature references -with open('feature_refs.json', 'r') as f: - feature_refs = json.load(f) +# Load model from MLflow +model_name = "my-model" +model_version = 1 +model = mlflow.pyfunc.load_model( + model_uri=f"models:/{model_name}/{model_version}" +) -# Create feature store object fs = FeatureStore(repo_path="production/") -# Read online features +# Read online features using the same model name and model version feature_vector = fs.get_online_features( - features=feature_refs, + features=fs.get_feature_service(f"{model_name}_v{model_version}"), entity_rows=[{"driver_id": 1001}] ).to_dict() @@ -196,7 +193,7 @@ It is important to note that both the training pipeline and model serving servic ## 4. Retrieving online features for prediction -Once you have successfully loaded (or in Feast terminology materialized) your data from batch sources into the online store, you can start consuming features for model inference. There are three approaches for that purpose sorted from the most simple one (in an operational sense) to the most performant (benchmarks to be published soon): +Once you have successfully loaded data from batch / streaming sources into the online store, you can start consuming features for model inference. ### 4.1. Use the Python SDK within an existing Python service @@ -219,18 +216,11 @@ feature_vector = fs.get_online_features( ).to_dict() ``` -### 4.2. Consume features via HTTP API from Serverless Feature Server - -If you don't want to add the Feast Python SDK as a dependency, or your feature retrieval service is written in a non-Python language, Feast can deploy a simple feature server on serverless infrastructure (eg, AWS Lambda, Google Cloud Run) for you. This service will provide an HTTP API with JSON I/O, which can be easily used with any programming language. - -[Read more about this feature](../reference/feature-servers/alpha-aws-lambda-feature-server.md) +### 4.2. Deploy Feast feature servers on Kubernetes -### 4.3. Go feature server deployed on Kubernetes - -For users with very latency-sensitive and high QPS use-cases, Feast offers a high-performance [Go feature server](../reference/feature-servers/go-feature-server.md). It can use either HTTP or gRPC. - -The Go feature server can be deployed to a Kubernetes cluster via Helm charts in a few simple steps: +To deploy a Feast feature server on Kubernetes, you can use the included helm chart. +See [helm chart](https://github.com/feast-dev/feast/tree/master/infra/charts/feast-feature-server) for configuration details. 1. Install [kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/) and [helm 3](https://helm.sh/) 2. Add the Feast Helm repository and download the latest charts: @@ -243,43 +233,12 @@ helm repo update ``` helm install feast-release feast-charts/feast-feature-server \ - --set global.registry.path=s3://feast/registries/prod \ - --set global.project= + --set feature_store_yaml_base64=$(base64 feature_store.yaml) ``` -This chart will deploy a single service. The service must have read access to the registry file on cloud storage. It will keep a copy of the registry in their memory and periodically refresh it, so expect some delays in update propagation in exchange for better performance. In order for the Go feature server to be enabled, you should set `go_feature_serving: True` in the `feature_store.yaml`. - -## 5. Ingesting features from a stream source - -Recently Feast added functionality for [stream ingestion](../reference/data-sources/push.md). Please note that this is still in an early phase and new incompatible changes may be introduced. - -### 5.1. Using Python SDK in your Apache Spark / Beam pipeline - -The default option to write features from a stream is to add the Python SDK into your existing PySpark / Beam pipeline. Feast SDK provides writer implementation that can be called from `foreachBatch` stream writer in PySpark like this: - -```python -from feast import FeatureStore - -store = FeatureStore(...) - -spark = SparkSession.builder.getOrCreate() - -streamingDF = spark.readStream.format(...).load() - -def feast_writer(spark_df): - pandas_df = spark_df.to_pandas() - store.push("driver_hourly_stats", pandas_df) - -streamingDF.writeStream.foreachBatch(feast_writer).start() -``` - -### 5.2. Push Service (Alpha) - -Alternatively, if you want to ingest features directly from a broker (eg, Kafka or Kinesis), you can use the "push service", which will write to an online store and/or offline store. This service will expose an HTTP API or when deployed on Serverless platforms like AWS Lambda or Google Cloud Run, this service can be directly connected to Kinesis or PubSub. - -If you are using Kafka, [HTTP Sink](https://docs.confluent.io/kafka-connect-http/current/overview.html) could be utilized as a middleware. In this case, the "push service" can be deployed on Kubernetes or as a Serverless function. +This chart will deploy a single service. The service must have read access to the registry file on cloud storage. It will keep a copy of the registry in their memory and periodically refresh it, so expect some delays in update propagation in exchange for better performance. -## 6. Using environment variables in your yaml configuration +## 5. Using environment variables in your yaml configuration You might want to dynamically set parts of your configuration from your environment. For instance to deploy Feast to production and development with the same configuration, but a different server. Or to inject secrets without exposing them in your git repo. To do this, it is possible to use the `${ENV_VAR}` syntax in your `feature_store.yaml` file. For instance: @@ -307,17 +266,16 @@ online_store: ## Summary -Summarizing it all together we want to show several options of architecture that will be most frequently used in production: +In summary, the overall architecture in production may look like: -### Current Recommendation +* Feast SDK is being triggered by CI (eg, Github Actions). It applies the latest changes from the feature repo to the Feast database-backed registry +* Data ingestion + * **Batch data**: Airflow manages materialization jobs to ingest batch data from DWH to the online store periodically. When working with large datasets to materialize, we recommend using a batch materialization engine + * If your offline and online workloads are in Snowflake, the Snowflake materialization engine is likely the best option. + * If your offline and online workloads are not using Snowflake, but using Kubernetes is an option, the Bytewax materialization engine is likely the best option. + * If none of these engines suite your needs, you may continue using the in-process engine, or write a custom engine (e.g with Spark or Ray). + * **Stream data**: The Feast Push API is used within existing Spark / Beam pipelines to push feature values to offline / online stores -* Feast SDK is being triggered by CI (eg, Github Actions). It applies the latest changes from the feature repo to the Feast registry -* Airflow manages materialization jobs to ingest data from DWH to the online store periodically -* For the stream ingestion Feast Python SDK is used in the existing Spark / Beam pipeline -* For Batch Materialization Engine: - * If your offline and online workloads are in Snowflake, the Snowflake Engine is likely the best option. - * If your offline and online workloads are not using Snowflake, but using Kubernetes is an option, the Bytewax engine is likely the best option. - * If none of these engines suite your needs, you may continue using the in-process engine, or write a custom engine. * Online features are served via the Python feature server over HTTP, or consumed using the Feast Python SDK. * Feast Python SDK is called locally to generate a training dataset diff --git a/docs/reference/data-sources/push.md b/docs/reference/data-sources/push.md index 035ee58360..a585de5688 100644 --- a/docs/reference/data-sources/push.md +++ b/docs/reference/data-sources/push.md @@ -22,7 +22,7 @@ Streaming data sources are important sources of feature values. A typical setup Feast allows users to push features previously registered in a feature view to the online store for fresher features. It also allows users to push batches of stream data to the offline store by specifying that the push be directed to the offline store. This will push the data to the offline store declared in the repository configuration used to initialize the feature store. -## Example +## Example (basic) ### Defining a push source Note that the push schema needs to also include the entity. @@ -59,3 +59,24 @@ fs.push("push_source_name", feature_data_frame, to=PushMode.ONLINE_AND_OFFLINE) See also [Python feature server](../feature-servers/python-feature-server.md) for instructions on how to push data to a deployed feature server. +## Example (Spark Streaming) + +The default option to write features from a stream is to add the Python SDK into your existing PySpark pipeline. + +```python +from feast import FeatureStore + +store = FeatureStore(...) + +spark = SparkSession.builder.getOrCreate() + +streamingDF = spark.readStream.format(...).load() + +def feast_writer(spark_df): + pandas_df = spark_df.to_pandas() + store.push("driver_hourly_stats", pandas_df) + +streamingDF.writeStream.foreachBatch(feast_writer).start() +``` + +This can also be used under the hood by a contrib stream processor (see [Tutorial: Building streaming features](../../tutorials/building-streaming-features.md)) \ No newline at end of file