Skip to content

Latest commit

 

History

History
81 lines (56 loc) · 2.21 KB

gke.md

File metadata and controls

81 lines (56 loc) · 2.21 KB

Install on GKE

TIP: Make sure you have enough quota in your GCP project. Open the cloud console quotas page: https://console.cloud.google.com/iam-admin/quotas. Make sure your project is selected in the top left.

There are 3 critical quotas you will need to verify for this guide. The minimum value here is assuming that you have nothing else running in your project.

Quota Location Min Value
Preemptible NVIDIA L4 GPUs <your-region> 2
GPUs (all regions) - 2
CPUs (all regions) - 24

See the following screenshot examples of how these quotas appear in the console:

Regional Preemptible L4 Quota Screenshot

Global GPUs Quota Screenshot

Global CPUs Quota Screenshot

1. Create a cluster

Option: GKE Autopilot

Create an Autopilot cluster (replace us-central1 with a region that you have quota).

gcloud container clusters create-auto cluster-1 \
    --location=us-central1

Option: GKE Standard

TODO: Reference gcloud commands for creating a GKE standard cluster.

2. Install KubeAI

Add KubeAI Helm repository.

helm repo add kubeai https://www.kubeai.org
helm repo update

Make sure you have a HuggingFace Hub token set in your environment (HUGGING_FACE_HUB_TOKEN).

Install KubeAI with Helm.

cat <<EOF > kubeai.yaml
resourceProfiles:
  nvidia-gpu-l4:
    nodeSelector:
      cloud.google.com/gke-accelerator: "nvidia-l4"
      cloud.google.com/gke-spot: "true"
EOF

helm upgrade --install kubeai kubeai/kubeai \
    -f ./kubeai.yaml \
    --set secrets.huggingface.token=$HUGGING_FACE_HUB_TOKEN \
    --wait

3. Optionally configure models

Optionally install preconfigured models.

cat <<EOF > kubeai-models.yaml
catalog:
  llama-3.1-8b-instruct-fp8-l4:
    enabled: true
EOF

helm install kubeai-models kubeai/models \
    -f ./kubeai-models.yaml