forked from microsoft/nni
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* webui logpath and document (#135) * Add webui document and logpath as a href * fix tslint * fix comments by Chengmin * Pai training service bug fix and enhancement (#136) * Add NNI installation scripts * Update pai script, update NNI_out_dir * Update NNI dir in nni sdk local.py * Create .nni folder in nni sdk local.py * Add check before creating .nni folder * Fix typo for PAI_INSTALL_NNI_SHELL_FORMAT * Improve annotation (#138) * Improve annotation * Minor bugfix * Selectively install through pip (#139) Selectively install through pip * update setup.py * fix paiTrainingService bugs (#137) * fix nnictl bug * add hdfs host validation * fix bugs * fix dockerfile * fix install.sh * update install.sh * fix dockerfile * Set timeout for HDFSUtility exists function * remove unused TODO * fix sdk * add optional for outputDir and dataDir * refactor dockerfile.base * Remove unused import in hdfsclientUtility * Add documentation for NNI PAI mode experiment (#141) * Add documentation for NNI PAI mode * Fix typo based on PR comments * Exit with subprocess return code of trial keeper * Remove additional exit code * Fix typo based on PR comments * update doc for smac tuner (#140) * Revert "Selectively install through pip (#139)" due to potential pip install issue (#142) * Revert "Selectively install through pip (#139)" This reverts commit 1d17483. * Add exit code of subprocess for trial_keeper * Update README, add link to PAImode doc
- Loading branch information
Showing
58 changed files
with
2,080 additions
and
151 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,79 @@ | ||
**Run an Experiment on OpenPAI** | ||
=== | ||
NNI supports running an experiment on [OpenPAI](https://github.com/Microsoft/pai) (aka pai), called pai mode. Before starting to use NNI pai mode, you should have an account to access an [OpenPAI](https://github.com/Microsoft/pai) cluster. See [here](https://github.com/Microsoft/pai#how-to-deploy) if you don't have any OpenPAI account and want to deploy an OpenPAI cluster. In pai mode, your trial program will run in pai's container created by Docker. | ||
|
||
## Setup environment | ||
Install NNI, follow the install guide [here](GetStarted.md). | ||
|
||
## Run an experiment | ||
Use `examples/trials/mnist-annotation` as an example. The nni config yaml file's content is like: | ||
``` | ||
authorName: your_name | ||
experimentName: auto_mnist | ||
# how many trials could be concurrently running | ||
trialConcurrency: 2 | ||
# maximum experiment running duration | ||
maxExecDuration: 3h | ||
# empty means never stop | ||
maxTrialNum: 100 | ||
# choice: local, remote, pai | ||
trainingServicePlatform: pai | ||
# choice: true, false | ||
useAnnotation: true | ||
tuner: | ||
builtinTunerName: TPE | ||
classArgs: | ||
optimize_mode: maximize | ||
trial: | ||
command: python3 mnist.py | ||
codeDir: ~/nni/examples/trials/mnist-annotation | ||
gpuNum: 0 | ||
cpuNum: 1 | ||
memoryMB: 8196 | ||
image: openpai/pai.example.tensorflow | ||
dataDir: hdfs://10.1.1.1:9000/nni | ||
outputDir: hdfs://10.1.1.1:9000/nni | ||
# Configuration to access OpenPAI Cluster | ||
paiConfig: | ||
userName: your_pai_nni_user | ||
passWord: your_pai_password | ||
host: 10.1.1.1 | ||
``` | ||
Note: You should set `trainingServicePlatform: pai` in nni config yaml file if you want to start experiment in pai mode. | ||
|
||
Compared with LocalMode and [RemoteMachineMode](RemoteMachineMode.md), trial configuration in pai mode have five additional keys: | ||
* cpuNum | ||
* Required key. Should be positive number based on your trial program's CPU requirement | ||
* memoryMB | ||
* Required key. Should be positive number based on your trial program's memory requirement | ||
* image | ||
* Required key. In pai mode, your trial program will be scheduled by OpenPAI to run in [Docker container](https://www.docker.com/). This key is used to specify the Docker image used to create the container in which your traill will run. | ||
* dataDir | ||
* Optional key. It specifies the HDFS data direcotry for trial to download data. The format should be something like hdfs://{your HDFS host}:9000/{your data directory} | ||
* outputDir | ||
* Optional key. It specifies the HDFS output direcotry for trial. Once the trial is completed (either succeed or fail), trial's stdout, stderr will be copied to this directory by NNI sdk automatically. The format should be something like hdfs://{your HDFS host}:9000/{your output directory} | ||
|
||
Once complete to fill nni experiment config file and save (for example, save as exp_pai.yaml), then run the following command | ||
``` | ||
nnictl create --config exp_pai.yaml | ||
``` | ||
to start the experiment in pai mode. NNI will create OpanPAI job for each trial, and the job name format is something like `nni_exp_{experiment_id}_trial_{trial_id}`. | ||
You can see the pai jobs created by NNI in your OpenPAI cluster's web portal, like: | ||
![](./nni_pai_joblist.jpg) | ||
|
||
Notice: In pai mode, NNIManager will start a rest server and listen on `51189` port, to receive metrics from trial job running in PAI container. So you should `enable 51189` TCP port in your firewall rule to allow incoming traffic. | ||
|
||
Once a trial job is completed, you can goto NNI WebUI's overview page (like http://localhost:8080/oview) to check trial's information. | ||
|
||
Expand a trial information in trial list view, click the logPath link like: | ||
![](./nni_webui_joblist.jpg) | ||
|
||
And you will be redirected to HDFS web portal to browse the output files of that trial in HDFS: | ||
![](./nni_trial_hdfs_output.jpg) | ||
|
||
You can see there're three fils in output folder: stderr, stdout, and trial.log | ||
|
||
If you also want to save trial's other output into HDFS, like model files, you can use environment variable `NNI_OUTPUT_DIR` in your trial code to save your own output files, and NNI SDK will copy all the files in `NNI_OUTPUT_DIR` from trial's container to HDFS. | ||
|
||
Any problems when using NNI in pai mode, plesae create issues on [NNI github repo](https://github.com/Microsoft/nni), or send mail to nni@microsoft.com | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
# WebUI | ||
|
||
## View summary page | ||
|
||
Click the tab "Overview". | ||
|
||
* See the experiment parameters. | ||
* See search_space json. | ||
* See good performance trial. | ||
|
||
![](./img/overview.jpg) | ||
|
||
## View job accuracy | ||
|
||
Click the tab "Optimization Progress" to see the point graph of all trials. Hover every point to see its specific accuracy. | ||
|
||
![](./img/accuracy.jpg) | ||
|
||
## View hyper parameter | ||
|
||
Click the tab "Hyper Parameter" to see the parallel graph. | ||
|
||
* You can select the percentage to see top trials. | ||
* Choose two axis to swap its positions | ||
|
||
![](./img/searchspace.jpg) | ||
|
||
## View trial status | ||
|
||
Click the tab "Trial Status" to see the status of the all trials. Specifically: | ||
|
||
* Trial duration: trial's duration in the bar graph. | ||
* Trial detail: trial's id, trial's duration, start time, end time, status, accuracy and search space file. | ||
|
||
![](./img/openRow.jpg) | ||
|
||
* Kill: you can kill a job that status is running. | ||
* Tensor: you can see a job in the tensorflow graph, it will link to the Tensorboard page. | ||
|
||
![](./img/trialStatus.jpg) | ||
|
||
* Intermediate Result Graph. | ||
|
||
![](./img/intermediate.jpg) | ||
|
||
## Control | ||
|
||
Click the tab "Control" to add a new trial or update the search_space file and some experiment parameters. | ||
|
||
![](./img/control.jpg) | ||
|
||
## Feedback | ||
|
||
[Known Issues](https://github.com/Microsoft/nni/issues). |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,7 @@ | ||
#!/bin/bash | ||
make build | ||
make install-dependencies | ||
make build | ||
make dev-install | ||
make install-examples | ||
make update-bash-config | ||
source ~/.bashrc |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.