This repository shows how to
- fetch real-time trade data (aka raw data) from the Coinbase Websocket API
- transform trade data into OHLC data (aka features) in real-time using Bytewax, and
- store these features in a serverless Feature Store like Hopsworks.
This repository is a natural continuation of this previous project where we built a Streamlit app with real-time feature engineering, but lacked state persistence: after each re-load of the Streamlit app, we lost all features generated up to that point.
In this project we add state to our system through a a Feature Store. We use Hopsworks because
- it is serverless, so we do not need to handle infrastructure
- it has a very generous free tier, with up to 25GB of free storage.
-
Create a Python virtual environment with the project dependencies with
$ make init
-
Set your Hopsworks project name and API key as environment variables by running the following script (to generate these head to hopsworks.ai, create a free account, create a project and generate an API key for free)
$ . ./set_environment_variables.sh
-
To run the feature pipeline locally
$ make run
-
To deploy the feature pipeline on an AWS EC2 instance you first need to have an AWS account and the
aws-cli
tool installed in your local system. Then run the following command to deploy your feature pipeline on an EC2 instance$ make deploy
-
Feature pipeline logs are send to AWS CloudWatch. Run the following command to grab the URL where you can see the logs.
$ make list
-
To shutdown the feature pipeline on AWS and free resources run
$ make delete
ℹ️ Implementation details
Check the Real-World ML Program, a hands-on, 3-hour course where you will learn how to design, build, deploy, and monitor complete ML products.