- Use FastAPI to create an API for your model
- Run that API on your machine
- Put it in production
Now that we have a performant model trained in the cloud, we will expose it to the world π
We will create a prediction API for our model, run it on our machine to make sure that everything works correctly, and then we will deploy it in the cloud so that everyone can play with our model!
To do so, we will:
π create a prediction API using FastAPI
π create a Docker image containing the environment required to run the code of our API
π push this image to Google Cloud Run so that it runs inside a Docker container that will allow developers all over the world to use it
βInstructions
Copy your .env
file from the previous package version:
cp ~/code/<user.github_nickname>/{{local_path_to('07-ML-Ops/03-Automate-model-lifecycle/01-Automate-model-lifecycle')}}/.env .env
OR
Use the provided env.sample
, replacing the environment variable values with yours.
A new taxifare/api
directory has been added to the project to contain the code of the API along with 2 new configuration files, which can be found in your project's root directory:
.
βββ Dockerfile # π NEW: building instructions
βββ Makefile # good old manual task manager
βββ README.md
βββ requirements.txt # all the dependencies you need to run the package
βββ setup.py
βββ taxifare
β βββ api # π NEW: API directory
β β βββ __init__.py
β β βββ fast.py # π NEW: where the API lives
β βββ interface # package entry point
β βββ ml_logic
βββ tests
Now, have a look at the requirements.txt
. You can see newcomers:
# API
fastapi # API framework
pytz # time zone management
uvicorn # web server
# tests
httpx # HTTP client
pytest-asyncio # asynchronous I/O support for pytest
βHow?
make reinstall_package
, of course π
We provide you with a FastAPI skeleton in the fast.py
file.
π» Try to launch the API now!
π‘ Hint
You probably want a uvicorn
web server with π₯ hot-reloading...
In case you can't find the proper syntax, and look at your Makefile
; we provided you with a new task: run_api
.
If you run into the error Address already in use
, the port 8000
on your local machine might already be occupied by another application.
You can check this by running lsof -i :8000
. If the command returns something, then port 8000
is already in use.
In this case, specify another port in the [0, 65535] range in the run_api
command using the --port
parameter.
β How do you consult your running API?
Answer
π‘ Your API is available locally on port 8000
, unless otherwise specified π http://localhost:8000.
Go visit it!
You have probably not seen much...yet!
β Which endpoints are available?
Answer
There is only one endpoint (partially) implemented at the moment, the root endpoint /
.
The "unimplemented" root page is a little raw, but remember that you can always find more info on the API using the Swagger endpoint π http://localhost:8000/docs
βInstructions
An API is defined by its specifications (see [GitHub repositories API](https://docs.github.com/en/rest/repos/repos)). Below you will find the API specifications you need to implement.- Denoted by the
/
character - HTTP verb:
GET
In order to easily test your root
endpoint, use the following response example as a goal:
{
'greeting': 'Hello'
}
- π» Implement the
root
endpoint/
- π Look at your browser π http://localhost:8000
- π Inspect the server logs and, if needed, add some
breakpoint()
s to debug
When and only when your API responds as required:
- π§ͺ Test your implementation with
make test_api_root
- π§ͺ Track your progress on Kitt with
make test_kitt
& push your code!
- Denoted by
/predict
- HTTP verb:
GET
It should accepts the following query parameters
Name | Type | Sample |
---|---|---|
pickup_datetime | DateTime | 2013-07-06 17:18:00 |
pickup_longitude | float | -73.950655 |
pickup_latitude | float | 40.783282 |
dropoff_longitude | float | -73.950655 |
dropoff_latitude | float | 40.783282 |
passenger_count | int | 2 |
It should return the following JSON
{
'fare_amount': 5.93
}
β How would you proceed to implement the /predict
endpoint? Discuss with your buddy π¬
Ask yourselves the following questions:
- How should we build
X_pred
? How to handle timezones ? - How can we reuse the
taxifare
model package in the most lightweight way ? - How to render the correct response?
π‘ Hints
- Re-use the methods available in the
taxifare/ml_logic
package rather than the main routes intaxifare/interface
; always load the minimum amount of code possible!
π Inspect the response in your browser, and inspect the server logs while you're at it
π Call on your browser http://localhost:8000/predict?pickup_datetime=2014-07-06%2019:18:00&pickup_longitude=-73.950655&pickup_latitude=40.783282&dropoff_longitude=-73.984365&dropoff_latitude=40.769802&passenger_count=2
π Or call from your CLI
curl -X 'GET' \
'http://localhost:8000/predict?pickup_datetime=2014-07-06+19:18:00&pickup_longitude=-73.950655&pickup_latitude=40.783282&dropoff_longitude=-73.984365&dropoff_latitude=40.769802&passenger_count=2' \
-H 'accept: application/json'
When and only when your API responds as required:
- π§ͺ Test your implementation with
make test_api_predict
- π§ͺ Track your progress on Kitt with
make test_kitt
& push your code!
π Congrats, you've built your first ML predictive API!
Did you notice your predictions were a bit slow? Why do you think that is?
The answer is visible in your logs!
We want to avoid loading the heavy Deep Learning model from MLflow at each GET
request! The trick is to load the model into memory on startup and store it in a global variable in app.state
, which is kept in memory and accessible across all routes!
This will prove very useful for Demo Days!
β‘οΈ like this β‘οΈ
app = FastAPI()
app.state.model = ...
@app.get("/predict")
...
app.state.model.predict(...)
β Instructions
We now have a working predictive API that can be queried from our local machine.
We want to make it available to the world. To do that, the first step is to create a Docker image that contains the environment required to run the API and make it run locally on Docker.
β What are the 3 steps to run the API on Docker?
Answer
- Create a
Dockerfile
containing the instructions to build the API - Build the image
- Run the API on Docker (locally) to ensure that it is responding as required
You need to have the Docker daemon running on your machine to be able to build and run the image.
π» Launch Docker Daemon
β
Check whether the Docker daemon is up and running with docker info
in your Terminal
A nice stack of logs should print:
As a reminder, here is the project directory structure:
.
βββ Dockerfile # π Building instructions
βββ Makefile
βββ README.md
βββ requirements.txt # All the dependencies you need to run the package
βββ setup.py # Package installer
βββ taxifare
β βββ api
β β βββ __init__.py
β β βββ fast.py # β
Where the API lays
β βββ interface # Manual entry points
β βββ ml_logic
βββ tests
β What are the key ingredients a Dockerfile
needs to cook a delicious Docker image?
Answer
Here are the most common instructions for any good Dockerfile
:
FROM
: select a base image for our image (the environment in which we will run our code), this is usually the first instructionCOPY
: copy files and directories into our image (our package and the associated files, for example)RUN
: execute a command inside of the image being built (for example,pip install -r requirements.txt
to install package dependencies)CMD
: the main command that will be executed when we run our Docker image. There can only be oneCMD
instruction in aDockerfile
. It is usually the last instruction!
β What should the base image contain so we can build our image on top of it?
π‘ Hints
You can start from a raw Linux (Ubuntu) image, but then you'll have to install Python and pip
before installing taxifare
!
OR
You can choose an image with Python (and pip) already installed! (recommended) β
π» In the Dockerfile
, write the instructions needed to build the API image following these specifications:
Feel free to use the checkboxes below to help you keep track of what you've already done π
The image should contain:
the same Python version of your virtual env
all the directories from the /taxifare
project needed to run the API
the list of dependencies (don't forget to install them!)
The web server should:
launch when a container is started from the image
listen to the HTTP requests coming from outside the container (see host
parameter)
be able to listen to a specific port defined by an environment variable $PORT
(see port
parameter)
β‘οΈ Kickstart pack
Here is the skeleton of the Dockerfile
:
FROM image
COPY taxifare
COPY dependencies
RUN install dependencies
CMD launch API web server
β How do you check if the Dockerfile
instructions will execute what you want?
Answer
You can't at this point! π You need to build the image and check if it contains everything required to run the API. Go to the next section: Build the API image.
Now is the time to build the API image so you can check if it satisfies all requirements, and to be able to run it on Docker.
π» Choose a Docker image name and add it to your .env
.
You will be able to reuse it in the docker
commands:
GAR_IMAGE=taxifare
π» Then, make sure you are in the directory of the Dockefile
and build .
:
docker build --tag=$GAR_IMAGE:dev .
π» Once built, the image should be visible in the list of images built with the following command:
docker images
π€ The image you are looking for does not appear in the list? Ask for help πββοΈ
Now that the image is built, let's verify that it satisfies the specifications to run the predictive API. Docker comes with a handy command to interactively communicate with the shell of the image:
docker run -it -e PORT=8000 -p 8000:8000 $GAR_IMAGE:dev sh
π€ Command composition
docker run $GAR_IMAGE
: run the image-it
: enable the interactive mode-e PORT=8000
: specify the environment variable$PORT
to which the image should listensh
: launch a shell console
A shell console should open, you are now inside the image π
π» Verify that the image is correctly set up:
The python version is the same as in your virtual envThe `/taxifare` directory exists
The `requirements.txt` file exists
The dependencies are all installed
π Solution
python --version
to check the Python versionls
to check the presence of the files and directoriespip list
to check if requirements are installed
Exit the terminal and stop the container at any moment with:
exit
β
β All good? If something is missing, you will probably need to fix your Dockerfile
and re-build the image
In the previous section you learned how to interact with the shell inside the image. Now is the time to run the predictive API image and test if the API responds as it should.
π» Try to actually run the image
You want to docker run ...
without the sh
command at the end, so as to trigger the CMD
line of your Dockerfile, instead of just opening a shell.
docker run -it -e PORT=8000 -p 8000:8000 $GAR_IMAGE:dev
π± It is probably crashing with errors involving environment variables
β What's wrong? What's the difference between your local environment and your image environment? π¬ Discuss with your buddy.
Answer
There is no .env
in the image! The image has no access to the environment variables π
π» Adapt the run command so the .env
is sent to the image (use docker run --help
to help you!)
π Solution
--env-file
to the rescue!
docker run -e PORT=8000 -p 8000:8000 --env-file your/path/to/.env $GAR_IMAGE:dev
β How would you check that the image runs correctly?
π‘ Hints
The API should respond in your browser, go visit it!
Also, you can check if the image runs with docker ps
in a new Terminal tab or window
π Inspect your browser response π http://localhost:8000/predict?pickup_datetime=2014-07-06&19:18:00&pickup_longitude=-73.950655&pickup_latitude=40.783282&dropoff_longitude=-73.984365&dropoff_latitude=40.769802&passenger_count=2
π You can stop your container with docker container stop <CONTAINER_ID>
π Congrats, you've built your first ML predictive API inside a Docker container!
π€ How do you avoid rebuilding all pip dependencies each time taxifare code is changed?
π Solution
By leveraging Docker caching layer per layer. If you don't update a deeper layer, docker will not rebuild it!
FROM python:3.8.12-buster
WORKDIR /prod
# First, pip install dependencies
COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt
# Then only, install taxifare!
COPY taxifare taxifare
COPY setup.py setup.py
RUN pip install .
# ...
π€ How do you make use of the local caching mechanism we put in place for CSVs and Models?
π Solution
By recreating the same local storage structure !
# [...]
COPY taxifare taxifare
COPY setup.py setup.py
RUN pip install .
# We already have a make command for that!
COPY Makefile Makefile
RUN make reset_local_files
As a responsible ML Engineer, you know that the size of an image is important when it comes to production. Depending on the base image you used in your Dockerfile
, the API image could be huge:
python:3.8.12-buster
π3.9GB
python:3.8.12-slim
π3.1GB
python:3.8.12-alpine
π3.1GB
β What is the heaviest requirement used by your API?
Answer
No doubt it is tensorflow
with 1.1GB! Let's find a base image that is already optimized for it.
π Change your base image [If you DO NOT have an Mac Silicon (M-Chip) or ARM CPU]
Instructions
Let's use a tensorflow docker image instead! It's a Ubuntu with Python and Tensorflow already installed!
- π» Update your
Dockerfile
base image with eithertensorflow/tensorflow:2.10.0
(if you are on an Intel processor only) - π» Remove
tensorflow
from yourrequirements.txt
because it is now pre-build with the image. - π» Build a lightweight local image of your API (you can use a tag:'light' on this new image to differentiate it from the heavy one built previously:
docker build --tag=$GAR_IMAGE:light .
- β Make sure the API is still up and running
- π Inspect the space saved with
docker images
and feel happy
π Everything runs fine on your local machine. Great. We will now deploy your image on servers that are going to run these containers online for you.
However, note that these servers (Google Cloud Run servers) will be running on AMD/Intel x86 processors, not ARM/M1, as most cloud providers still run on Intel.
π¨ If you have Mac Silicon (M-chips) or ARM CPU, read carefully
The solution is to use one image to test your code locally (you have just done it above), and another one to push your code to production.
- Tell Docker to build the image specifically for Intel/AMD processors and give it a new tag:'light-intel':
docker build --platform linux/amd64 -t $GAR_IMAGE:light-intel .
- You will not be able to run this image locally, but this is the one you will be able push online to the GCP servers!
- You should now have 3 images:
$GAR_IMAGE:dev
$GAR_IMAGE:light
$GAR_IMAGE:light-intel
π Make a final image tagged "prod", by removing useless python packages
- Create
requirement_prod.txt
by stripping-outrequirement.txt
from anything you will not need in production (e.g pytest, ipykernel, matplotlib etc...) - Build your final image and tag it
docker build -t $GAR_IMAGE:light-intel .
βInstructions
Now that we have built a predictive API Docker image that we can run on our local machine, we are 2 steps away from deploying; we just need to:
- push the Docker image to Google Artifact Registry
- deploy the image on Google Cloud Run so that it gets instantiated into a Docker container
βWhat is the purpose of Google Artifact Registry?
Answer
Google Artifact Registry is a cloud storage service for Docker (and similar technology) images with the purpose of allowing Cloud Run or Kubernetes Engine to serve them.
It is, in a way, similar to GitHub allowing you to store your git repositories in the cloud β except Google Artifact Registry lacks a dedicated user interface and additional services such as forks
and pull requests
).
Now we are going to build our image again. This should be pretty fast since Docker is smart and is going to reuse all the building blocks that were previously used to build the prediction API image.
First, let's make sure to enable the Google Artifact Registry API for your project in GCP.
Once this is done, let's allow the docker
command to push an image to GCP within our region.
gcloud auth configure-docker $GCP_REGION-docker.pkg.dev
Lets create a repo in that region as well!
gcloud artifacts repositories create taxifare --repository-format=docker \
--location=$GCP_REGION --description="Repository for storing taxifare images"
Lets build our image ready to push to that repo
docker build -t $GCP_REGION-docker.pkg.dev/$GCP_PROJECT/taxifare/$GAR_IMAGE:prod .
Again, let's make sure that our image runs correctly, so as to avoid wasting time pushing a broken image to the cloud.
docker run -e PORT=8000 -p 8000:8000 --env-file .env $GCP_REGION-docker.pkg.dev/$GCP_PROJECT/taxifare/$GAR_IMAGE:prod
Visit http://localhost:8000/ and check whether the API is running as expected.
We can now push our image to Google Artifact Registry.
docker push $GCP_REGION-docker.pkg.dev/$GCP_PROJECT/taxifare/$GAR_IMAGE:prod
The image should be visible in the GCP console.
Add a --memory
flag to your project configuration and set it to 2Gi
(use GAR_MEMORY
in .env
)
π This will allow your container to run with 2GiB (= Gibibyte) of memory
β How does Cloud Run know the values of the environment variables to be passed to your container? Discuss with your buddy π¬
Answer
It does not. You need to provide a list of environment variables to your container when you deploy it π
π» Using the gcloud run deploy --help
documentation, identify a parameter that allows you to pass environment variables to your container on deployment
Answer
The --env-vars-file
is the correct one!
gcloud run deploy --env-vars-file .env.yaml
Tough luck, the `--env-vars-file` parameter takes as input the name of a YAML (pronounced "yemil") file containing the list of environment variables to be passed to the container.
π» Create a .env.yaml
file containing all the necessary environment variables
You can use the provided .env.sample.yaml
file as a source for the syntax (do not forget to update the values of the parameters). All values should be strings
β What is the purpose of Cloud Run?
Answer
Cloud Run will instantiate the image into a container and run the CMD
instruction inside of the Dockerfile
of the image. This last step will start the uvicorn
server, thus serving our predictive API to the world π
Let's run one last command π€
gcloud run deploy --image $GCP_REGION-docker.pkg.dev/$GCP_PROJECT/taxifare/$GAR_IMAGE:prod --memory $GAR_MEMORY --region $GCP_REGION --env-vars-file .env.yaml
After confirmation, you should see something like this, indicating that the service is live π
Service name (wagon-data-tpl-image):
Allow unauthenticated invocations to [wagon-data-tpl-image] (y/N)? y
Deploying container to Cloud Run service [wagon-data-tpl-image] in project [le-wagon-data] region [europe-west1]
β Deploying new service... Done.
β Creating Revision... Revision deployment finished. Waiting for health check to begin.
β Routing traffic...
β Setting IAM Policy...
Done.
Service [wagon-data-tpl-image] revision [wagon-data-tpl-image-00001-kup] has been deployed and is serving 100 percent of traffic.
Service URL: https://wagon-data-tpl-image-xi54eseqrq-ew.a.run.app
π§ͺ Write down your service URL in your local .env
file so we can test it!
SERVICE_URL=https://wagon-data-tpl-image-xi54eseqrq-ew.a.run.app
Then finally,
direnv reload
make test_api_on_prod
make test_kitt
ππππ MASSIVE CONGRATS πππ You deployed your first ML predictive API! Any developer in the world π is now able to browse to the deployed url and get a prediction using the API π€!
You can look for any running cloud run services using
gcloud run services list
You can shut down any instance with
gcloud run services delete $INSTANCE
You can also stop (or kill) your local docker image to free up memory on your local machine
docker stop 152e5b79177b # β οΈ use the correct CONTAINER ID
docker kill 152e5b79177b # β’οΈ only if the image refuses to stop (did someone create an β loop?)
Remember to stop the Docker daemon in order to free resources on your machine once you are done using it.
macOS
Stop the Docker.app
by clicking on whale > Quit Docker Desktop in the menu bar.
Windows WSL2/Ubuntu
Stop the Docker app by right-clicking the whale on your taskbar.
β Instructions
Let's look at our /GET
route format
http://localhost:8000/predict?pickup_datetime=2014-07-06&19:18:00&pickup_longitude=-73.950655&pickup_latitude=40.783282&dropoff_longitude=-73.984365&dropoff_latitude=40.769802&passenger_count=2
π€― How would you send a prediction request for 1000 rows at once?
The URL query string (everything after ?
in the URL above) is not able to send a large volume of data.
- Your goal is to be able to send a batch of 1000 new predictions at once!
- Try to read more about POST in the FastAPI docs, and implement it in your package
In anticipation of your Demo Day, you might be wondering how to send unstructured data like images (or videos, sounds, etc.) to your Deep Learning model in prod.
π Bookmark Le Wagon - data-template, and try to understand & reproduce the project boilerplate called "sending-images-streamlit-fastapi"