TIP: Make sure you have enough quota in your GCP project.
Open the cloud console quotas page: https://console.cloud.google.com/iam-admin/quotas. Make sure your project is selected in the top left.There are 3 critical quotas you will need to verify for this guide. The minimum value here is assuming that you have nothing else running in your project.
Quota | Location | Min Value |
---|---|---|
Preemptible NVIDIA L4 GPUs | <your-region> |
2 |
GPUs (all regions) | - | 2 |
CPUs (all regions) | - | 24 |
See the following screenshot examples of how these quotas appear in the console:
Create an Autopilot cluster (replace us-central1
with a region that you have quota).
gcloud container clusters create-auto cluster-1 \
--location=us-central1
TODO: Reference gcloud commands for creating a GKE standard cluster.
Add KubeAI Helm repository.
helm repo add kubeai https://www.kubeai.org
helm repo update
Make sure you have a HuggingFace Hub token set in your environment (HUGGING_FACE_HUB_TOKEN
).
Install KubeAI with Helm.
cat <<EOF > kubeai.yaml
resourceProfiles:
nvidia-gpu-l4:
nodeSelector:
cloud.google.com/gke-accelerator: "nvidia-l4"
cloud.google.com/gke-spot: "true"
EOF
helm upgrade --install kubeai kubeai/kubeai \
-f ./kubeai.yaml \
--set secrets.huggingface.token=$HUGGING_FACE_HUB_TOKEN \
--wait
Optionally install preconfigured models.
cat <<EOF > kubeai-models.yaml
catalog:
llama-3.1-8b-instruct-fp8-l4:
enabled: true
EOF
helm install kubeai-models kubeai/models \
-f ./kubeai-models.yaml