Skip to content

MarkBerkovics/data-fast-api

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Objective

  1. Use FastAPI to create an API for your model
  2. Run that API on your machine
  3. Put it in production

Context

Now that we have a performant model trained in the cloud, we will expose it to the world 🌍

We will create a prediction API for our model, run it on our machine to make sure that everything works correctly, and then we will deploy it in the cloud so that everyone can play with our model!

To do so, we will:

πŸ‘‰ create a prediction API using FastAPI

πŸ‘‰ create a Docker image containing the environment required to run the code of our API

πŸ‘‰ push this image to Google Cloud Run so that it runs inside a Docker container that will allow developers all over the world to use it

1️⃣ Project Setup πŸ› 

❓Instructions

Environment

Copy your .env file from the previous package version:

cp ~/code/<user.github_nickname>/{{local_path_to('07-ML-Ops/03-Automate-model-lifecycle/01-Automate-model-lifecycle')}}/.env .env

OR

Use the provided env.sample, replacing the environment variable values with yours.

API Directory

A new taxifare/api directory has been added to the project to contain the code of the API along with 2 new configuration files, which can be found in your project's root directory:

.
β”œβ”€β”€ Dockerfile          # 🎁 NEW: building instructions
β”œβ”€β”€ Makefile            # good old manual task manager
β”œβ”€β”€ README.md
β”œβ”€β”€ requirements.txt    # all the dependencies you need to run the package
β”œβ”€β”€ setup.py
β”œβ”€β”€ taxifare
β”‚   β”œβ”€β”€ api             # 🎁 NEW: API directory
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   └── fast.py     # 🎁 NEW: where the API lives
β”‚   β”œβ”€β”€ interface       # package entry point
β”‚   └── ml_logic
└── tests

Now, have a look at the requirements.txt. You can see newcomers:

# API
fastapi         # API framework
pytz            # time zone management
uvicorn         # web server
# tests
httpx           # HTTP client
pytest-asyncio  # asynchronous I/O support for pytest

⚠️ Make sure to perform a clean installation of the package.

❓How?

make reinstall_package, of course πŸ˜‰

Running the API with FastAPI and a Uvicorn Server

We provide you with a FastAPI skeleton in the fast.py file.

πŸ’» Try to launch the API now!

πŸ’‘ Hint

You probably want a uvicorn web server with πŸ”₯ hot-reloading...

In case you can't find the proper syntax, and look at your Makefile; we provided you with a new task: run_api.

If you run into the error Address already in use, the port 8000 on your local machine might already be occupied by another application.

You can check this by running lsof -i :8000. If the command returns something, then port 8000 is already in use.

In this case, specify another port in the [0, 65535] range in the run_api command using the --port parameter.

❓ How do you consult your running API?

Answer

πŸ’‘ Your API is available locally on port 8000, unless otherwise specified πŸ‘‰ http://localhost:8000. Go visit it!

You have probably not seen much...yet!

❓ Which endpoints are available?

Answer

There is only one endpoint (partially) implemented at the moment, the root endpoint /. The "unimplemented" root page is a little raw, but remember that you can always find more info on the API using the Swagger endpoint πŸ‘‰ http://localhost:8000/docs

2️⃣ Build the API πŸ“‘

❓Instructions An API is defined by its specifications (see [GitHub repositories API](https://docs.github.com/en/rest/repos/repos)). Below you will find the API specifications you need to implement.

Specifications

Root

  • Denoted by the / character
  • HTTP verb: GET

In order to easily test your root endpoint, use the following response example as a goal:

{
    'greeting': 'Hello'
}
  • πŸ’» Implement the root endpoint /
  • πŸ‘€ Look at your browser πŸ‘‰ http://localhost:8000
  • πŸ› Inspect the server logs and, if needed, add some breakpoint()s to debug

When and only when your API responds as required:

  1. πŸ§ͺ Test your implementation with make test_api_root
  2. πŸ§ͺ Track your progress on Kitt with make test_kitt & push your code!

Prediction

  • Denoted by /predict
  • HTTP verb: GET

It should accepts the following query parameters


Name Type Sample
pickup_datetime DateTime 2013-07-06 17:18:00
pickup_longitude float -73.950655
pickup_latitude float 40.783282
dropoff_longitude float -73.950655
dropoff_latitude float 40.783282
passenger_count int 2

It should return the following JSON

{
    'fare_amount': 5.93
}

❓ How would you proceed to implement the /predict endpoint? Discuss with your buddy πŸ’¬

Ask yourselves the following questions:

  • How should we build X_pred? How to handle timezones ?
  • How can we reuse the taxifare model package in the most lightweight way ?
  • How to render the correct response?
πŸ’‘ Hints
  • Re-use the methods available in the taxifare/ml_logic package rather than the main routes in taxifare/interface; always load the minimum amount of code possible!

πŸ‘€ Inspect the response in your browser, and inspect the server logs while you're at it

πŸ‘‰ Call on your browser http://localhost:8000/predict?pickup_datetime=2014-07-06%2019:18:00&pickup_longitude=-73.950655&pickup_latitude=40.783282&dropoff_longitude=-73.984365&dropoff_latitude=40.769802&passenger_count=2

πŸ‘‰ Or call from your CLI

curl -X 'GET' \
  'http://localhost:8000/predict?pickup_datetime=2014-07-06+19:18:00&pickup_longitude=-73.950655&pickup_latitude=40.783282&dropoff_longitude=-73.984365&dropoff_latitude=40.769802&passenger_count=2' \
  -H 'accept: application/json'

When and only when your API responds as required:

  1. πŸ§ͺ Test your implementation with make test_api_predict
  2. πŸ§ͺ Track your progress on Kitt with make test_kitt & push your code!

πŸ‘ Congrats, you've built your first ML predictive API!


⚑️ Faster Predictions

Did you notice your predictions were a bit slow? Why do you think that is?

The answer is visible in your logs!

We want to avoid loading the heavy Deep Learning model from MLflow at each GET request! The trick is to load the model into memory on startup and store it in a global variable in app.state, which is kept in memory and accessible across all routes!

This will prove very useful for Demo Days!

⚑️ like this ⚑️
app = FastAPI()
app.state.model = ...

@app.get("/predict")
...
app.state.model.predict(...)

3️⃣ Build a Docker Image for our API 🐳

❓ Instructions

We now have a working predictive API that can be queried from our local machine.

We want to make it available to the world. To do that, the first step is to create a Docker image that contains the environment required to run the API and make it run locally on Docker.

❓ What are the 3 steps to run the API on Docker?

Answer
  1. Create a Dockerfile containing the instructions to build the API
  2. Build the image
  3. Run the API on Docker (locally) to ensure that it is responding as required

3.1) Setup

You need to have the Docker daemon running on your machine to be able to build and run the image.

πŸ’» Launch Docker Daemon

macOS

Launch the Docker app, you should see a whale on your menu bar.

verify that Docker Desktop is running

Windows WSL2 & Ubuntu

Launch the Docker app, you should see a whale on your taskbar (Windows).

verify that Docker Desktop is running

βœ… Check whether the Docker daemon is up and running with docker info in your Terminal

A nice stack of logs should print:

3.2) Dockerfile

As a reminder, here is the project directory structure:

.
β”œβ”€β”€ Dockerfile          # πŸ†• Building instructions
β”œβ”€β”€ Makefile
β”œβ”€β”€ README.md
β”œβ”€β”€ requirements.txt    # All the dependencies you need to run the package
β”œβ”€β”€ setup.py            # Package installer
β”œβ”€β”€ taxifare
β”‚   β”œβ”€β”€ api
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   └── fast.py     # βœ… Where the API lays
β”‚   β”œβ”€β”€ interface       # Manual entry points
β”‚   └── ml_logic
└── tests

❓ What are the key ingredients a Dockerfile needs to cook a delicious Docker image?

Answer

Here are the most common instructions for any good Dockerfile:

  • FROM: select a base image for our image (the environment in which we will run our code), this is usually the first instruction
  • COPY: copy files and directories into our image (our package and the associated files, for example)
  • RUN: execute a command inside of the image being built (for example, pip install -r requirements.txt to install package dependencies)
  • CMD: the main command that will be executed when we run our Docker image. There can only be one CMD instruction in a Dockerfile. It is usually the last instruction!

❓ What should the base image contain so we can build our image on top of it?

πŸ’‘ Hints

You can start from a raw Linux (Ubuntu) image, but then you'll have to install Python and pip before installing taxifare!

OR

You can choose an image with Python (and pip) already installed! (recommended) βœ…

πŸ’» In the Dockerfile, write the instructions needed to build the API image following these specifications:
Feel free to use the checkboxes below to help you keep track of what you've already done πŸ˜‰

The image should contain:
the same Python version of your virtual env
all the directories from the /taxifare project needed to run the API
the list of dependencies (don't forget to install them!)

The web server should:
launch when a container is started from the image
listen to the HTTP requests coming from outside the container (see host parameter)
be able to listen to a specific port defined by an environment variable $PORT (see port parameter)

⚑️ Kickstart pack

Here is the skeleton of the Dockerfile:

FROM image
COPY taxifare
COPY dependencies
RUN install dependencies
CMD launch API web server

❓ How do you check if the Dockerfile instructions will execute what you want?

Answer

You can't at this point! 😁 You need to build the image and check if it contains everything required to run the API. Go to the next section: Build the API image.

3.3) Build the API image

Now is the time to build the API image so you can check if it satisfies all requirements, and to be able to run it on Docker.

πŸ’» Choose a Docker image name and add it to your .env. You will be able to reuse it in the docker commands:

GAR_IMAGE=taxifare

πŸ’» Then, make sure you are in the directory of the Dockefile and build . :

docker build --tag=$GAR_IMAGE:dev .

πŸ’» Once built, the image should be visible in the list of images built with the following command:

docker images

πŸ€” The image you are looking for does not appear in the list? Ask for help πŸ™‹β€β™‚οΈ

3.4) Check the API Image

Now that the image is built, let's verify that it satisfies the specifications to run the predictive API. Docker comes with a handy command to interactively communicate with the shell of the image:

docker run -it -e PORT=8000 -p 8000:8000 $GAR_IMAGE:dev sh
πŸ€– Command composition
  • docker run $GAR_IMAGE: run the image
  • -it: enable the interactive mode
  • -e PORT=8000: specify the environment variable $PORT to which the image should listen
  • sh: launch a shell console

A shell console should open, you are now inside the image πŸ‘

πŸ’» Verify that the image is correctly set up:

The python version is the same as in your virtual env
The `/taxifare` directory exists
The `requirements.txt` file exists
The dependencies are all installed
πŸ™ˆ Solution
  • python --version to check the Python version
  • ls to check the presence of the files and directories
  • pip list to check if requirements are installed

Exit the terminal and stop the container at any moment with:

exit

βœ… ❌ All good? If something is missing, you will probably need to fix your Dockerfile and re-build the image

3.5) Run the API Image

In the previous section you learned how to interact with the shell inside the image. Now is the time to run the predictive API image and test if the API responds as it should.

πŸ’» Try to actually run the image

You want to docker run ... without the sh command at the end, so as to trigger the CMD line of your Dockerfile, instead of just opening a shell.

docker run -it -e PORT=8000 -p 8000:8000 $GAR_IMAGE:dev

😱 It is probably crashing with errors involving environment variables

❓ What's wrong? What's the difference between your local environment and your image environment? πŸ’¬ Discuss with your buddy.

Answer

There is no .env in the image! The image has no access to the environment variables 😈

πŸ’» Adapt the run command so the .env is sent to the image (use docker run --help to help you!)

πŸ™ˆ Solution

--env-file to the rescue!

docker run -e PORT=8000 -p 8000:8000 --env-file your/path/to/.env $GAR_IMAGE:dev

❓ How would you check that the image runs correctly?

πŸ’‘ Hints

The API should respond in your browser, go visit it!

Also, you can check if the image runs with docker ps in a new Terminal tab or window

It's alive! 😱 πŸŽ‰


πŸ‘€ Inspect your browser response πŸ‘‰ http://localhost:8000/predict?pickup_datetime=2014-07-06&19:18:00&pickup_longitude=-73.950655&pickup_latitude=40.783282&dropoff_longitude=-73.984365&dropoff_latitude=40.769802&passenger_count=2

πŸ›‘ You can stop your container with docker container stop <CONTAINER_ID>

πŸ‘ Congrats, you've built your first ML predictive API inside a Docker container!


3.6) Optimized image

3.6.1) Smarter image 🧠

πŸ€” How do you avoid rebuilding all pip dependencies each time taxifare code is changed?

🎁 Solution

By leveraging Docker caching layer per layer. If you don't update a deeper layer, docker will not rebuild it!

FROM python:3.8.12-buster

WORKDIR /prod

# First, pip install dependencies
COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt

# Then only, install taxifare!
COPY taxifare taxifare
COPY setup.py setup.py
RUN pip install .

# ...

πŸ€” How do you make use of the local caching mechanism we put in place for CSVs and Models?

🎁 Solution

By recreating the same local storage structure !

# [...]
COPY taxifare taxifare
COPY setup.py setup.py
RUN pip install .

# We already have a make command for that!
COPY Makefile Makefile
RUN make reset_local_files

3.6.2) Lighter image πŸͺΆ

As a responsible ML Engineer, you know that the size of an image is important when it comes to production. Depending on the base image you used in your Dockerfile, the API image could be huge:

  • python:3.8.12-buster πŸ‘‰ 3.9GB
  • python:3.8.12-slim πŸ‘‰ 3.1GB
  • python:3.8.12-alpine πŸ‘‰ 3.1GB

❓ What is the heaviest requirement used by your API?

Answer

No doubt it is tensorflow with 1.1GB! Let's find a base image that is already optimized for it.

πŸ“ Change your base image [If you DO NOT have an Mac Silicon (M-Chip) or ARM CPU]

Instructions

Let's use a tensorflow docker image instead! It's a Ubuntu with Python and Tensorflow already installed!

  • πŸ’» Update your Dockerfile base image with either tensorflow/tensorflow:2.10.0 (if you are on an Intel processor only)
  • πŸ’» Remove tensorflow from your requirements.txt because it is now pre-build with the image.
  • πŸ’» Build a lightweight local image of your API (you can use a tag:'light' on this new image to differentiate it from the heavy one built previously: docker build --tag=$GAR_IMAGE:light .
  • βœ… Make sure the API is still up and running
  • πŸ‘€ Inspect the space saved with docker images and feel happy

3.6.3) Prod-ready image (finally!) ☁️

πŸ‘ Everything runs fine on your local machine. Great. We will now deploy your image on servers that are going to run these containers online for you.

However, note that these servers (Google Cloud Run servers) will be running on AMD/Intel x86 processors, not ARM/M1, as most cloud providers still run on Intel.

🚨 If you have Mac Silicon (M-chips) or ARM CPU, read carefully

The solution is to use one image to test your code locally (you have just done it above), and another one to push your code to production.

  • Tell Docker to build the image specifically for Intel/AMD processors and give it a new tag:'light-intel': docker build --platform linux/amd64 -t $GAR_IMAGE:light-intel .
  • You will not be able to run this image locally, but this is the one you will be able push online to the GCP servers!
  • You should now have 3 images:
    • $GAR_IMAGE:dev
    • $GAR_IMAGE:light
    • $GAR_IMAGE:light-intel

πŸ“ Make a final image tagged "prod", by removing useless python packages

  • Create requirement_prod.txt by stripping-out requirement.txt from anything you will not need in production (e.g pytest, ipykernel, matplotlib etc...)
  • Build your final image and tag it docker build -t $GAR_IMAGE:light-intel .

4️⃣ Deploy the API 🌎

❓Instructions

Now that we have built a predictive API Docker image that we can run on our local machine, we are 2 steps away from deploying; we just need to:

  1. push the Docker image to Google Artifact Registry
  2. deploy the image on Google Cloud Run so that it gets instantiated into a Docker container

4.1) Push our prod image to Google Artifact Registry

❓What is the purpose of Google Artifact Registry?

Answer

Google Artifact Registry is a cloud storage service for Docker (and similar technology) images with the purpose of allowing Cloud Run or Kubernetes Engine to serve them.

It is, in a way, similar to GitHub allowing you to store your git repositories in the cloud β€” except Google Artifact Registry lacks a dedicated user interface and additional services such as forks and pull requests).

Build and Push the Image to GAR

Now we are going to build our image again. This should be pretty fast since Docker is smart and is going to reuse all the building blocks that were previously used to build the prediction API image.

First, let's make sure to enable the Google Artifact Registry API for your project in GCP.

Once this is done, let's allow the docker command to push an image to GCP within our region.

gcloud auth configure-docker $GCP_REGION-docker.pkg.dev

Lets create a repo in that region as well!

gcloud artifacts repositories create taxifare --repository-format=docker \
--location=$GCP_REGION --description="Repository for storing taxifare images"

Lets build our image ready to push to that repo

docker build -t  $GCP_REGION-docker.pkg.dev/$GCP_PROJECT/taxifare/$GAR_IMAGE:prod .

Again, let's make sure that our image runs correctly, so as to avoid wasting time pushing a broken image to the cloud.

docker run -e PORT=8000 -p 8000:8000 --env-file .env $GCP_REGION-docker.pkg.dev/$GCP_PROJECT/taxifare/$GAR_IMAGE:prod

Visit http://localhost:8000/ and check whether the API is running as expected.

We can now push our image to Google Artifact Registry.

docker push $GCP_REGION-docker.pkg.dev/$GCP_PROJECT/taxifare/$GAR_IMAGE:prod

The image should be visible in the GCP console.

4.2) Deploy the Artifact Registry Image to Google Cloud Run

Add a --memory flag to your project configuration and set it to 2Gi (use GAR_MEMORY in .env)

πŸ‘‰ This will allow your container to run with 2GiB (= Gibibyte) of memory

❓ How does Cloud Run know the values of the environment variables to be passed to your container? Discuss with your buddy πŸ’¬

Answer

It does not. You need to provide a list of environment variables to your container when you deploy it 😈

πŸ’» Using the gcloud run deploy --help documentation, identify a parameter that allows you to pass environment variables to your container on deployment

Answer

The --env-vars-file is the correct one!

gcloud run deploy --env-vars-file .env.yaml

Tough luck, the `--env-vars-file` parameter takes as input the name of a YAML (pronounced "yemil") file containing the list of environment variables to be passed to the container.

πŸ’» Create a .env.yaml file containing all the necessary environment variables

You can use the provided .env.sample.yaml file as a source for the syntax (do not forget to update the values of the parameters). All values should be strings

❓ What is the purpose of Cloud Run?

Answer

Cloud Run will instantiate the image into a container and run the CMD instruction inside of the Dockerfile of the image. This last step will start the uvicorn server, thus serving our predictive API to the world 🌍

Let's run one last command 🀞

gcloud run deploy --image $GCP_REGION-docker.pkg.dev/$GCP_PROJECT/taxifare/$GAR_IMAGE:prod --memory $GAR_MEMORY --region $GCP_REGION --env-vars-file .env.yaml

After confirmation, you should see something like this, indicating that the service is live πŸŽ‰

Service name (wagon-data-tpl-image):
Allow unauthenticated invocations to [wagon-data-tpl-image] (y/N)?  y

Deploying container to Cloud Run service [wagon-data-tpl-image] in project [le-wagon-data] region [europe-west1]
βœ“ Deploying new service... Done.
  βœ“ Creating Revision... Revision deployment finished. Waiting for health check to begin.
  βœ“ Routing traffic...
  βœ“ Setting IAM Policy...
Done.
Service [wagon-data-tpl-image] revision [wagon-data-tpl-image-00001-kup] has been deployed and is serving 100 percent of traffic.
Service URL: https://wagon-data-tpl-image-xi54eseqrq-ew.a.run.app

πŸ§ͺ Write down your service URL in your local .env file so we can test it!

SERVICE_URL=https://wagon-data-tpl-image-xi54eseqrq-ew.a.run.app

Then finally,

direnv reload
make test_api_on_prod
make test_kitt

πŸ‘πŸ‘πŸ‘πŸ‘ MASSIVE CONGRATS πŸ‘πŸ‘πŸ‘ You deployed your first ML predictive API! Any developer in the world 🌍 is now able to browse to the deployed url and get a prediction using the API πŸ€–!


4.3) Stop everything and save money πŸ’Έ

⚠️ Keep in mind that you pay for the service as long as it is up πŸ’Έ

You can look for any running cloud run services using

gcloud run services list

You can shut down any instance with

gcloud run services delete $INSTANCE

You can also stop (or kill) your local docker image to free up memory on your local machine

docker stop 152e5b79177b  # ⚠️ use the correct CONTAINER ID
docker kill 152e5b79177b  # ☒️ only if the image refuses to stop (did someone create an ∞ loop?)

Remember to stop the Docker daemon in order to free resources on your machine once you are done using it.

macOS

Stop the Docker.app by clicking on whale > Quit Docker Desktop in the menu bar.

Windows WSL2/Ubuntu

Stop the Docker app by right-clicking the whale on your taskbar.

5️⃣ OPTIONAL

❓ Instructions

1) Create a /POST request to be able to return batch predictions

Let's look at our /GET route format

http://localhost:8000/predict?pickup_datetime=2014-07-06&19:18:00&pickup_longitude=-73.950655&pickup_latitude=40.783282&dropoff_longitude=-73.984365&dropoff_latitude=40.769802&passenger_count=2

🀯 How would you send a prediction request for 1000 rows at once?

The URL query string (everything after ? in the URL above) is not able to send a large volume of data.

Welcome to /POST HTTP Requests

  • Your goal is to be able to send a batch of 1000 new predictions at once!
  • Try to read more about POST in the FastAPI docs, and implement it in your package

2) Read about sending images πŸ“Έ via /POST requests to CNN models

In anticipation of your Demo Day, you might be wondering how to send unstructured data like images (or videos, sounds, etc.) to your Deep Learning model in prod.

πŸ‘‰ Bookmark Le Wagon - data-template, and try to understand & reproduce the project boilerplate called "sending-images-streamlit-fastapi"

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published