Skip to content

Commit

Permalink
Update webhook cert-manager install to work with GKE autopilot cluste…
Browse files Browse the repository at this point in the history
…rs (#585)

* Change install-cert-manager to work with autopilot clusters

* Add cert-manager install commands to TPU_GUIDE
  • Loading branch information
ryanaoleary authored and alpha-amundson committed May 3, 2024
1 parent 10f6b6c commit 679d7e5
Show file tree
Hide file tree
Showing 2 changed files with 11 additions and 12 deletions.
8 changes: 1 addition & 7 deletions applications/ray/kuberay-tpu-webhook/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -31,13 +31,7 @@ docker-build:

# Push the docker image
docker-push:
docker push ${IMG}

install-cert-manager:
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.13.2/cert-manager.yaml

uninstall-cert-manager:
kubectl delete -f https://github.com/cert-manager/cert-manager/releases/download/v1.13.2/cert-manager.yaml
docker push ${IMG}

deploy-cert:
kubectl apply -f certs/
Expand Down
15 changes: 10 additions & 5 deletions ray-on-gke/guides/tpu/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,16 +46,21 @@ accelerator_type = "nvidia-tesla-t4"

### Manually Installing the TPU Initialization Webhook

The TPU Initialization Webhook automatically bootstraps the TPU environment for TPU clusters. The webhook needs to be installed once per GKE cluster and requires a Kuberay Operator running v1.1+ and GKE cluster version of 1.28+.
The TPU Initialization Webhook automatically bootstraps the TPU environment for TPU clusters. The webhook needs to be installed once per GKE cluster and requires a Kuberay Operator running v1.1+ and GKE cluster version of 1.28+. The webhook requires [cert-manager](https://github.com/cert-manager/cert-manager) to be installed in-cluster to handle TLS certificate injection. cert-manager can be installed in both GKE standard and autopilot clusters using the following helm commands:
```
helm repo add jetstack https://charts.jetstack.io
helm repo update
helm install --create-namespace --namespace cert-manager --set installCRDs=true --set global.leaderElection.namespace=cert-manager cert-manager jetstack/cert-manager
```
After installing cert-manager, it may take up to two minutes for the certificate to become ready.

Installing the webhook:
1. `git clone https://github.com/GoogleCloudPlatform/ai-on-gke`
2. `cd applications/ray/kuberay-tpu-webhook`
3. `make install-cert-manager` - it may take up to two minutes for the certificate to become ready
4. `make deploy`
3. `make deploy`
- this will create the webhook deployment, configs, and service in the "ray-system" namespace
- to change the namespace, edit the "namespace" value in each .yaml in deployments/ and certs/

5. `make deploy-cert`
4. `make deploy-cert`



Expand Down

0 comments on commit 679d7e5

Please sign in to comment.