Skip to content

Commit

Permalink
Some fix to getting-started.md (#94)
Browse files Browse the repository at this point in the history
* fix md

* fix
  • Loading branch information
lluunn authored and k8s-ci-robot committed May 25, 2018
1 parent ecb27de commit 5a7af98
Showing 1 changed file with 20 additions and 6 deletions.
26 changes: 20 additions & 6 deletions docs/getting-start.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
First, Copy CLI tool.

```bash
$ curl -Lo katib-cli https://github.com/kubeflow/katib/releases/download/v0.1.0-alpha/katib-cli-linux-amd64 && chmod +x katib-cli && sudo mv katib-cli /usr/local/bin/
$ curl -Lo katib-cli https://github.com/kubeflow/katib/releases/download/v0.1.1-alpha/katib-cli-linux-amd64 && chmod +x katib-cli && sudo mv katib-cli /usr/local/bin/
```

The cli tool will be put `/usr/local/bin/` directory.
Expand All @@ -26,6 +26,8 @@ $ ./scripts/deploy.sh

## Use CLI

vizier-core is a service of type NodePort, with port 30678.

Check which node the vizier-core was deployed.
Then access vizier API.

Expand All @@ -40,17 +42,29 @@ vizier-core-864dd6fdd4-r55qv 1/1 Running 0 11m
vizier-db-7b6f8c59bc-mjhh4 1/1 Running 0 11m 10.36.0.4 node1
vizier-suggestion-random-5895dc79b4-pbqkc 1/1 Running 0 11m 10.47.0.5 gpu-node3

$ katib-cli -s gpu-node2:30678 Getstudies
$ katib-cli -s gpu-node2:30678 get studies
2018/04/03 05:14:49 connecting gpu-node2:30678
StudyID Name Owner RunningTrial CompletedTrial
```

If your DNS cannot resolve the node name, connect it via IP. Get node's IP by

```
kubectl get -n katib node YOUR_NODE -o wide
```

If you are using GKE, create a firewall rule to allow traffic on port 30678.

```
gcloud compute firewall-rules create katibservice --allow tcp:30678
```

## Create Example Study

Try Createstudy. Study will be created and start hyperparameter search.

```bash
$ katib-cli -s gpu-node2:30678 -f ../examples/random.yml Createstudy
$ katib-cli -s gpu-node2:30678 -f ../examples/random.yml create study
2018/04/03 05:16:37 connecting gpu-node2:30678
2018/04/03 05:16:37 study conf{cifer10 root MAXIMIZE 0 configs:<name:"--lr" parameter_type:DOUBLE feasible:<max:"0.07" min:"0.03" > > configs:<name:"--lr-factor" parameter_type:DOUBLE feasible:<max:"0.2" min:"0.05" > > configs:<name:"--max-random-h" parameter_type:INT feasible:<max:"46" min:"26" > > configs:<name:"--max-random-l" parameter_type:INT feasible:<max:"75" min:"25" > > configs:<name:"--num-epochs" parameter_type:INT feasible:<max:"3" min:"3" > > [] random median [name:"SuggestionNum" value:"2" name:"MaxParallel" value:"2" ] [] Validation-accuracy [accuracy] mxnet/python:gpu [python /mxnet/example/image-classification/train_cifar10.py --batch-size=512 --gpus=0,1] 2 default-scheduler <nil> }
2018/04/03 05:16:37 req Createstudy
Expand All @@ -60,7 +74,7 @@ $ katib-cli -s gpu-node2:30678 -f ../examples/random.yml Createstudy
You can check the job is running with `kubectl` command.

```bash
$ katib-cli -s gpu-node2:30678 Getstudies
$ katib-cli -s gpu-node2:30678 get studies
2018/04/03 05:19:49 connecting gpu-node2:30678
StudyID Name Owner RunningTrial CompletedTrial
fef3711aa343fae6 cifer10 root 2 0
Expand All @@ -74,7 +88,7 @@ wbe8aabd6ad4f50e-worker-0 1 0 1m
Check the status of jobs with `katib-cli` command.

```bash
$ katib-cli -s gpu-node2:30678 Getstudies
$ katib-cli -s gpu-node2:30678 get studies
2018/04/03 05:26:20 connecting gpu-node2:30678
StudyID Name Owner RunningTrial CompletedTrial
fef3711aa343fae6 cifer10 root 1 1
Expand Down Expand Up @@ -215,7 +229,7 @@ parameterconfigs:
```

```bash
$ katib-cli -s gpu-node2:30678 -f ../examples/random-pv.yml Createstudy
$ katib-cli -s gpu-node2:30678 -f ../examples/random-pv.yml create study
2018/04/03 05:49:47 connecting gpu-node2:30678
2018/04/03 05:49:47 study conf{cifer10-pv-test root MAXIMIZE 0 configs:<name:"--lr" parameter_type:DOUBLE feasible:<max:"0.07" min:"0.03" > > configs:<name:"--lr-factor" parameter_type:DOUBLE feasible:<max:"0.2" min:"0.05" > > configs:<name:"--max-random-h" parameter_type:INT feasible:<max:"46" min:"26" > > configs:<name:"--max-random-l" parameter_type:INT feasible:<max:"75" min:"25" > > configs:<name:"--num-epochs" parameter_type:INT feasible:<max:"3" min:"3" > > [] random median [name:"SuggestionNum" value:"2" name:"MaxParallel" value:"2" ] [] Validation-accuracy [accuracy] mxnet/python:gpu [python /mxnet/example/image-classification/train_cifar10.py --batch-size=512 --gpus=0,1] 2 default-scheduler pvc:"nfs" path:"/nfs-mnt" }
2018/04/03 05:49:47 req Createstudy
Expand Down

0 comments on commit 5a7af98

Please sign in to comment.