Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pull #105

Merged
merged 12 commits into from
Jul 29, 2020
Merged

pull #105

Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion azure-pipelines.yml
Original file line number Diff line number Diff line change
Expand Up @@ -106,8 +106,8 @@ jobs:

steps:
- script: |
echo "##vso[task.setvariable variable=PATH]/usr/local/Cellar/python@3.7/3.7.8_1/bin:${HOME}/Library/Python/3.7/bin:${PATH}"
python3 -m pip install --upgrade pip setuptools
echo "##vso[task.setvariable variable=PATH]${HOME}/Library/Python/3.7/bin:${PATH}"
displayName: 'Install python tools'
- script: |
echo "network-timeout 600000" >> ${HOME}/.yarnrc
Expand Down
4 changes: 2 additions & 2 deletions deployment/docker/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -44,9 +44,9 @@ RUN python3 -m pip --no-cache-dir install \
numpy==1.14.3 scipy==1.1.0

#
# Tensorflow 1.10.0
# Tensorflow 1.15
#
RUN python3 -m pip --no-cache-dir install tensorflow-gpu==1.10.0
RUN python3 -m pip --no-cache-dir install tensorflow-gpu==1.15

#
# Keras 2.1.6
Expand Down
14 changes: 8 additions & 6 deletions deployment/docker/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,17 @@ Dockerfile
This is the Dockerfile of NNI project. It includes serveral popular deep learning frameworks and NNI. It is tested on `Ubuntu 16.04 LTS`:

```
CUDA 9.0, CuDNN 7.0
numpy 1.14.3,scipy 1.1.0
TensorFlow-gpu 1.10.0
Keras 2.1.6
PyTorch 0.4.1
CUDA 9.0
CuDNN 7.0
numpy 1.14.3
scipy 1.1.0
tensorflow-gpu 1.15.0
keras 2.1.6
torch 1.4.0
scikit-learn 0.20.0
pandas 0.23.4
lightgbm 2.2.2
NNI v0.7
nni
```
You can take this Dockerfile as a reference for your own customized Dockerfile.

Expand Down
44 changes: 44 additions & 0 deletions docs/en_US/CommunitySharings/NNI_colab_support.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@

# Use NNI on Google Colab
NNI can easily run on Google Colab platform. However, Colab doesn't expose its public IP and ports, so by default you can not access NNI's Web UI on Colab. To solve this, you need a reverse proxy software like `ngrok` or `frp`. This tutorial will show you how to use ngrok to access NNI's Web UI on Colab.

## How to Open NNI's Web UI on Google Colab

1. Install required packages and softwares.


```
! pip install nni # install nni
! wget https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip # download ngrok and unzip it
! unzip ngrok-stable-linux-amd64.zip
! mkdir -p nni_repo
! git clone https://github.com/microsoft/nni.git nni_repo/nni # clone NNI's offical repo to get examples
```

2. Register a ngrok account [here](https://ngrok.com/), then connect to your account using your authtoken.


```
! ./ngrok authtoken <your-authtoken>
```

3. Start an NNI example on a port bigger than 1024, then start ngrok with the same port. If you want to use gpu, make sure gpuNum >= 1 in config.yml. Use `get_ipython()` to start ngrok since it will be stuck if you use `! ngrok http 5000 &`.


```
! nnictl create --config nni_repo/nni/examples/trials/mnist-pytorch/config.yml --port 5000 &
get_ipython().system_raw('./ngrok http 5000 &')
```

4. Check the public url.


```
! curl -s http://localhost:4040/api/tunnels # don't change the port number 4040
```

You will see an url like http://xxxx.ngrok.io after step 4, open this url and you will find NNI's Web UI. Have fun :)

## Access Web UI with frp

frp is another reverse proxy software with similar functions. However, frp doesn't provide free public urls, so you may need an server with public IP as a frp server. See [here](https://github.com/fatedier/frp) to know more about how to deploy frp.
1 change: 1 addition & 0 deletions docs/en_US/CommunitySharings/community_sharings.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,4 @@ Different from the tutorials and examples in the rest of the document which show
Model Compression <model_compression>
Feature Engineering <feature_engineering>
Performance measurement, comparison and analysis <perf_compare>
Use NNI on Google Colab <NNI_colab_support>
11 changes: 11 additions & 0 deletions docs/en_US/NAS/Overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,17 @@ Please refer to [here](NasGuide.md) for the usage of one-shot NAS algorithms.
One-shot NAS can be visualized with our visualization tool. Learn more details [here](./Visualization.md).



## Search Space Zoo

NNI provides some predefined search space which can be easily reused. By stacking the extracted cells, user can quickly reproduce those NAS models.

Search Space Zoo contains the following NAS cells:

* [DartsCell](./SearchSpaceZoo.md#DartsCell)
* [ENAS micro](./SearchSpaceZoo.md#ENASMicroLayer)
* [ENAS macro](./SearchSpaceZoo.md#ENASMacroLayer)

## Using NNI API to Write Your Search Space

The programming interface of designing and searching a model is often demanded in two scenarios.
Expand Down
175 changes: 175 additions & 0 deletions docs/en_US/NAS/SearchSpaceZoo.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,175 @@
# Search Space Zoo

## DartsCell

DartsCell is extracted from [CNN model](./DARTS.md) designed [here](https://github.com/microsoft/nni/tree/master/examples/nas/darts). A DartsCell is a directed acyclic graph containing an ordered sequence of N nodes and each node stands for a latent representation (e.g. feature map in a convolutional network). Directed edges from Node 1 to Node 2 are associated with some operations that transform Node 1 and the result is stored on Node 2. The [operations](#darts-predefined-operations) between nodes is predefined and unchangeable. One edge represents an operation that chosen from the predefined ones to be applied to the starting node of the edge. One cell contains two input nodes, a single output node, and other `n_node` nodes. The input nodes are defined as the cell outputs in the previous two layers. The output of the cell is obtained by applying a reduction operation (e.g. concatenation) to all the intermediate nodes. To make the search space continuous, the categorical choice of a particular operation is relaxed to a softmax over all possible operations. By adjusting the weight of softmax on every node, the operation with the highest probability is chosen to be part of the final structure. A CNN model can be formed by stacking several cells together, which builds a search space. Note that, in DARTS paper all cells in the model share the same structure.

One structure in the Darts search space is shown below. Note that, NNI merges the last one of the four intermediate nodes and the output node.

![](../../img/NAS_Darts_cell.svg)

The predefined operations are shown in [references](#predefined-operations-darts).

```eval_rst
.. autoclass:: nni.nas.pytorch.search_space_zoo.DartsCell
:members:
```

### Example code

[example code](https://github.com/microsoft/nni/tree/master/examples/nas/search_space_zoo/darts_example.py)

```bash
git clone https://github.com/Microsoft/nni.git
cd nni/examples/nas/search_space_zoo
# search the best structure
python3 darts_example.py
```

<a name="predefined-operations-darts"></a>

### References

All supported operations for Darts are listed below.

* MaxPool / AvgPool
* MaxPool: Call `torch.nn.MaxPool2d`. This operation applies a 2D max pooling over all input channels. Its parameters `kernel_size=3` and `padding=1` are fixed. The pooling result will pass through a BatchNorm2d then return as the result.
* AvgPool: Call `torch.nn.AvgPool2d`. This operation applies a 2D average pooling over all input channels. Its parameters `kernel_size=3` and `padding=1` are fixed. The pooling result will pass through a BatchNorm2d then return as the result.

MaxPool / AvgPool with `kernel_size=3` and `padding=1` followed by BatchNorm2d
```eval_rst
.. autoclass:: nni.nas.pytorch.search_space_zoo.darts_ops.PoolBN
```
* SkipConnect

There is no operation between two nodes. Call `torch.nn.Identity` to forward what it gets to the output.
* Zero operation

There is no connection between two nodes.
* DilConv3x3 / DilConv5x5

<a name="DilConv"></a>DilConv3x3: (Dilated) depthwise separable Conv. It's a 3x3 depthwise convolution with `C_in` groups, followed by a 1x1 pointwise convolution. It reduces the amount of parameters. Input is first passed through relu, then DilConv and finally batchNorm2d. **Note that the operation is not Dilated Convolution, but we follow the convention in NAS papers to name it DilConv.** 3x3 DilConv has parameters `kernel_size=3`, `padding=1` and 5x5 DilConv has parameters `kernel_size=5`, `padding=4`.
```eval_rst
.. autoclass:: nni.nas.pytorch.search_space_zoo.darts_ops.DilConv
```
* SepConv3x3 / SepConv5x5

Composed of two DilConvs with fixed `kernel_size=3`, `padding=1` or `kernel_size=5`, `padding=2` sequentially.
```eval_rst
.. autoclass:: nni.nas.pytorch.search_space_zoo.darts_ops.SepConv
```

## ENASMicroLayer

This layer is extracted from the model designed [here](https://github.com/microsoft/nni/tree/master/examples/nas/enas). A model contains several blocks that share the same architecture. A block is made up of some normal layers and reduction layers, `ENASMicroLayer` is a unified implementation of the two types of layers. The only difference between the two layers is that reduction layers apply all operations with `stride=2`.

ENAS Micro employs a DAG with N nodes in one cell, where the nodes represent local computations, and the edges represent the flow of information between the N nodes. One cell contains two input nodes and a single output node. The following nodes choose two previous nodes as input and apply two operations from [predefined ones](#predefined-operations-enas) then add them as the output of this node. For example, Node 4 chooses Node 1 and Node 3 as inputs then applies `MaxPool` and `AvgPool` on the inputs respectively, then adds and sums them as the output of Node 4. Nodes that are not served as input for any other node are viewed as the output of the layer. If there are multiple output nodes, the model will calculate the average of these nodes as the layer output.

One structure in the ENAS micro search space is shown below.

![](../../img/NAS_ENAS_micro.svg)

The predefined operations can be seen [here](#predefined-operations-enas).

```eval_rst
.. autoclass:: nni.nas.pytorch.search_space_zoo.ENASMicroLayer
:members:
```

The Reduction Layer is made up of two Conv operations followed by BatchNorm, each of them will output `C_out//2` channels and concat them in channels as the output. The Convolution has `kernel_size=1` and `stride=2`, and they perform alternate sampling on the input to reduce the resolution without loss of information. This layer is wrapped in `ENASMicroLayer`.

### Example code

[example code](https://github.com/microsoft/nni/tree/master/examples/nas/search_space_zoo/enas_micro_example.py)

```bash
git clone https://github.com/Microsoft/nni.git
cd nni/examples/nas/search_space_zoo
# search the best cell structure
python3 enas_micro_example.py
```

<a name="predefined-operations-enas"></a>

### References

All supported operations for ENAS micro search are listed below.

* MaxPool / AvgPool
* MaxPool: Call `torch.nn.MaxPool2d`. This operation applies a 2D max pooling over all input channels followed by BatchNorm2d. Its parameters are fixed to `kernel_size=3`, `stride=1` and `padding=1`.
* AvgPool: Call `torch.nn.AvgPool2d`. This operation applies a 2D average pooling over all input channels followed by BatchNorm2d. Its parameters are fixed to `kernel_size=3`, `stride=1` and `padding=1`.
```eval_rst
.. autoclass:: nni.nas.pytorch.search_space_zoo.enas_ops.Pool
```

* SepConv
* SepConvBN3x3: ReLU followed by a [DilConv](#DilConv) and BatchNorm. Convolution parameters are `kernel_size=3`, `stride=1` and `padding=1`.
* SepConvBN5x5: Do the same operation as the previous one but it has different kernel sizes and paddings, which is set to 5 and 2 respectively.

```eval_rst
.. autoclass:: nni.nas.pytorch.search_space_zoo.enas_ops.SepConvBN
```

* SkipConnect

Call `torch.nn.Identity` to connect directly to the next cell.

## ENASMacroLayer

In Macro search, the controller makes two decisions for each layer: i) the [operation](#macro-operations) to perform on the result of the previous layer, ii) which the previous layer to connect to for SkipConnects. ENAS uses a controller to design the whole model architecture instead of one of its components. The output of operations is going to concat with the tensor of the chosen layer for SkipConnect. NNI provides [predefined operations](#macro-operations) for macro search, which are listed in [references](#macro-operations).

Part of one structure in the ENAS macro search space is shown below.

![](../../img/NAS_ENAS_macro.svg)

```eval_rst
.. autoclass:: nni.nas.pytorch.search_space_zoo.ENASMacroLayer
:members:
```

To describe the whole search space, NNI provides a model, which is built by stacking the layers.

```eval_rst
.. autoclass:: nni.nas.pytorch.search_space_zoo.ENASMacroGeneralModel
:members:
```

### Example code

[example code](https://github.com/microsoft/nni/tree/master/examples/nas/search_space_zoo/enas_macro_example.py)

```bash
git clone https://github.com/Microsoft/nni.git
cd nni/examples/nas/search_space_zoo
# search the best cell structure
python3 enas_macro_example.py
```

<a name="macro-operations"></a>

### References

All supported operations for ENAS macro search are listed below.

* ConvBranch

All input first passes into a StdConv, which is made up of a 1x1Conv followed by BatchNorm2d and ReLU. Then the intermediate result goes through one of the operations listed below. The final result is calculated through a BatchNorm2d and ReLU as post-procedure.
* Separable Conv3x3: If `separable=True`, the cell will use [SepConv](#DilConv) instead of normal Conv operation. SepConv's `kernel_size=3`, `stride=1` and `padding=1`.
* Separable Conv5x5: SepConv's `kernel_size=5`, `stride=1` and `padding=2`.
* Normal Conv3x3: If `separable=False`, the cell will use a normal Conv operations with `kernel_size=3`, `stride=1` and `padding=1`.
* Normal Conv5x5: Conv's `kernel_size=5`, `stride=1` and `padding=2`.

```eval_rst
.. autoclass:: nni.nas.pytorch.search_space_zoo.enas_ops.ConvBranch
```
* PoolBranch

All input first passes into a StdConv, which is made up of a 1x1Conv followed by BatchNorm2d and ReLU. Then the intermediate goes through pooling operation followed by BatchNorm.
* AvgPool: Call `torch.nn.AvgPool2d`. This operation applies a 2D average pooling over all input channels. Its parameters are fixed to `kernel_size=3`, `stride=1` and `padding=1`.
* MaxPool: Call `torch.nn.MaxPool2d`. This operation applies a 2D max pooling over all input channels. Its parameters are fixed to `kernel_size=3`, `stride=1` and `padding=1`.

```eval_rst
.. autoclass:: nni.nas.pytorch.search_space_zoo.enas_ops.PoolBranch
```

<!-- push -->
4 changes: 2 additions & 2 deletions docs/en_US/TrainingService/AMLMode.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,8 @@ Step 6. Create an AML cluster as the computeTarget.

Step 7. Open a command line and install AML package environment.
```
python3 -m pip install azureml --user
python3 -m pip install azureml-sdk --user
python3 -m pip install azureml
python3 -m pip install azureml-sdk
```

## Run an experiment
Expand Down
3 changes: 2 additions & 1 deletion docs/en_US/Tutorial/SetupNniDeveloperEnvironment.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,8 @@ Nothing to do, the code is already linked to package folders.
#### TypeScript

* If `src/nni_manager` is changed, run `yarn watch` under this folder. It will watch and build code continually. The `nnictl` need to be restarted to reload NNI manager.
* If `src/webui` or `src/nasui` are changed, run `yarn start` under the corresponding folder. The web UI will refresh automatically if code is changed.
* If `src/webui` is changed, run `yarn dev`, which will run a mock API server and a webpack dev server simultaneously. Use `EXPERIMENT` environment variable (e.g., `mnist-tfv1-running`) to specify the mock data being used. Built-in mock experiments are listed in `src/webui/mock`. An example of the full command is `EXPERIMENT=mnist-tfv1-running yarn dev`.
* If `src/nasui` is changed, run `yarn start` under the corresponding folder. The web UI will refresh automatically if code is changed. There is also a mock API server that is useful when developing. It can be launched via `node server.js`.

### 5. Submit Pull Request

Expand Down
1 change: 1 addition & 0 deletions docs/en_US/nas.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,5 +23,6 @@ For details, please refer to the following tutorials:
One-shot NAS <NAS/one_shot_nas>
Customize a NAS Algorithm <NAS/Advanced>
NAS Visualization <NAS/Visualization>
Search Space Zoo <NAS/SearchSpaceZoo>
NAS Benchmarks <NAS/Benchmarks>
API Reference <NAS/NasReference>
1 change: 1 addition & 0 deletions docs/img/NAS_Darts_cell.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/img/NAS_ENAS_macro.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/img/NAS_ENAS_micro.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
53 changes: 53 additions & 0 deletions examples/nas/search_space_zoo/darts_example.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# copyright (c) Microsoft Corporation.
# Licensed under the MIT license.

import logging
import time
from argparse import ArgumentParser

import torch
import torch.nn as nn

import datasets
from nni.nas.pytorch.callbacks import ArchitectureCheckpoint, LRSchedulerCallback
from nni.nas.pytorch.darts import DartsTrainer
from utils import accuracy

from nni.nas.pytorch.search_space_zoo import DartsCell
from darts_search_space import DartsStackedCells

logger = logging.getLogger('nni')

if __name__ == "__main__":
parser = ArgumentParser("darts")
parser.add_argument("--layers", default=8, type=int)
parser.add_argument("--batch-size", default=64, type=int)
parser.add_argument("--log-frequency", default=10, type=int)
parser.add_argument("--epochs", default=50, type=int)
parser.add_argument("--channels", default=16, type=int)
parser.add_argument("--unrolled", default=False, action="store_true")
parser.add_argument("--visualization", default=False, action="store_true")
args = parser.parse_args()

dataset_train, dataset_valid = datasets.get_dataset("cifar10")

model = DartsStackedCells(3, args.channels, 10, args.layers, DartsCell)
criterion = nn.CrossEntropyLoss()

optim = torch.optim.SGD(model.parameters(), 0.025, momentum=0.9, weight_decay=3.0E-4)
lr_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optim, args.epochs, eta_min=0.001)

trainer = DartsTrainer(model,
loss=criterion,
metrics=lambda output, target: accuracy(output, target, topk=(1,)),
optimizer=optim,
num_epochs=args.epochs,
dataset_train=dataset_train,
dataset_valid=dataset_valid,
batch_size=args.batch_size,
log_frequency=args.log_frequency,
unrolled=args.unrolled,
callbacks=[LRSchedulerCallback(lr_scheduler), ArchitectureCheckpoint("./checkpoints")])
if args.visualization:
trainer.enable_visualization()
trainer.train()
Loading