Follow this example to run Katib Experiment on your local laptop with Kind cluster. This example doesn't require any public or private cloud to run Katib Experiments.
Install the following tools to run the example:
Run the following command to create Kind cluster with the Katib components:
./deploy.sh
If the above script was successful, Katib components will be running:
$ kubectl get pods -n kubeflow
NAME READY STATUS RESTARTS AGE
katib-controller-566595bdd8-x7z6w 1/1 Running 0 67s
katib-db-manager-57cd769cdb-x4lnz 1/1 Running 0 67s
katib-mysql-7894994f88-7l8nd 1/1 Running 0 67s
katib-ui-5767cfccdc-nt6mz 1/1 Running 0 67s
You can use various Katib interfaces to run your first Katib Experiment.
For example, create Hyperparameter Tuning Katib Experiment with
random search algorithm
using kubectl
:
kubectl create -f https://raw.githubusercontent.com/kubeflow/katib/master/examples/v1beta1/hp-tuning/random.yaml
This example uses a PyTorch neural network to train an image classification model using the MNIST dataset. You can check the training container source code here. The Experiment runs twelve training jobs (Trials) and tunes the following hyperparameters:
- Learning Rate (
lr
). - Momentum (
momentum
).
After creating above example, check the Experiment status:
$ kubectl get experiment random -n kubeflow
NAME TYPE STATUS AGE
random Running True 6m19s
Check the Suggestion status:
$ kubectl get suggestion -n kubeflow
NAME TYPE STATUS REQUESTED ASSIGNED AGE
random Running True 4 4 6m21s
Check the Trials statuses:
$ kubectl get trial -n kubeflow
NAME TYPE STATUS AGE
random-9hmdjqk9 Running True 99s
random-cf7tfss2 Succeeded True 5m21s
random-fr5lfn2x Running True 5m21s
random-z9wqm7xh Running True 5m21s
You can get the best hyperparameters with the following command:
$ kubectl get experiment random -n kubeflow -o jsonpath='{range .status.currentOptimalTrial.parameterAssignments[*]}{.name}: {.value}{"\n"}{end}'
lr: 0.028162244250364066
momentum: 0.583672196492823
To view created Experiment in Katib UI, follow this guide.
To cleanup Kind cluster run:
kind delete cluster