microsoft · ultmaster · Jul 27, 2020 · Jun 23, 2020 · Jun 23, 2020 · Jun 24, 2020
diff --git a/docs/en_US/NAS/Overview.md b/docs/en_US/NAS/Overview.md
@@ -54,6 +54,17 @@ Please refer to [here](NasGuide.md) for the usage of one-shot NAS algorithms.
 One-shot NAS can be visualized with our visualization tool. Learn more details [here](./Visualization.md).
 
 
+
+## Search Space Zoo
+
+NNI provides some predefined search space which can be easily reused. By stacking the extracted cells, user can quickly reproduce those NAS models.
+
+Search Space Zoo contains the following NAS cells:
+
+* [DartsCell](./SearchSpaceZoo.md#DartsCell)
+* [ENAS micro](./SearchSpaceZoo.md#ENASMicroLayer)
+* [ENAS macro](./SearchSpaceZoo.md#ENASMacroLayer)
+
 ## Using NNI API to Write Your Search Space
 
 The programming interface of designing and searching a model is often demanded in two scenarios.

diff --git a/docs/en_US/NAS/SearchSpaceZoo.md b/docs/en_US/NAS/SearchSpaceZoo.md
@@ -0,0 +1,160 @@
+# Search Space Zoo
+
+## DartsCell
+
+DartsCell is extracted from [CNN model](./DARTS.md) designed in this repo. [Operations](#darts-predefined-operations) connecting with nodes which contained in the cell strucure is fixed.
+
+The predefined operations are shown as follows:
+
+* MaxPool: call `torch.nn.MaxPool2d`. This operation applies a 2D max pooling over all input channels. Its parameters `kernal_size=3` and `padding=1` are fixed.
+* AvgPool: call `torch.nn.AvgPool2d`. This operation applies a 2D average pooling over all input channels. Its parameters `kernal_size=3` and `padding=1` are fixed.
+* Skip Connect: There is no operation between two nodes. Call `torch.nn.Identity` to forward what it gets to the output.
+* SepConv3x3: Composed of two DilConvs with fixed `kernal_size=3` sequentially.
+* SepConv5x5: Do the same operation as the previous one but it has different kernal size, which is set to 5.
+* <a name="DilConv"></a>DilConv3x3:  (Dilated) depthwise separable Conv. It first calls `torch.nn.Conv2d` with fixed `kernal_size=3` to partition the feature map into `C_in` groups then applies 1x1 Convolution to get `C_out` output channels. It makes extracting features on every channel separately possible and reduces the number of parameters.
+* DilConv5x5: Do the same operation as the previous one but it has different kernal size, which is set to 5.
+
+```eval_rst
+..  autoclass:: nni.nas.pytorch.search_space_zoo.DartsCell
+    :members:
+```
+
+### Example Code
+
+[example code](https://github.com/microsoft/nni/tree/master/examples/nas/search_space_zoo/darts_example.py)
+
+```bash
+git clone https://github.com/Microsoft/nni.git
+cd nni/examples/nas/search_space_zoo
+# search the best structure
+python3 darts_example.py
+```
+
+<a class="predefined-operations-darts"></a>
+
+### DARTS predefined operations
+
+* MaxPool / AvgPool
+
+    MaxPool / AvgPool with `kernal_size=3` and `padding=1` followed by BatchNorm2d
+    ```eval_rst
+    ..  autoclass:: nni.nas.pytorch.search_space_zoo.darts_ops.PoolBN
+    ```
+* Skip Connection
+
+    There is no connection between the two nodes.
+* DilConv3x3 / DilConv5x5
+
+    Dilated Conv with `kernal_size=3` or `kernal_size=5` and `padding=1`
+    ```eval_rst
+    ..  autoclass:: nni.nas.pytorch.search_space_zoo.darts_ops.DilConv
+    ```
+* SepConv3x3 / SepConv5x5
+
+    Depthwise separable Conv with `kernal_size=3` or `kernal_size=5` and `padding=1`
+    ```eval_rst
+    ..  autoclass:: nni.nas.pytorch.search_space_zoo.darts_ops.SepConv
+    ```
+
+## ENASMicroLayer
+
+This layer is extracted from model designed [here](./ENAS.md). A model contains several blocks whose architecture keeps the same. A block is made up of some `ENAMicroLayer` 
+and one `ENASReduceLayer`. The only difference between the two layers is that `ENASReduceLayer` applies all operations with `stride=2`.
+
+An `ENASMicroLayer` contains `num_nodes` nodes and searches the topology among them. The first two nodes in a layer stand for the outputs from previous previous layer and previous layer respectively. The following nodes choose two previous nodes as input and apply two operations from [predefined ones](#predefined-operations-enas) then add them as the output of this node. For example, Node 4 chooses Node 1 and Node 3 as inputs then apply `MaxPool` and `AvgPool` on the inputs respectively. So the output of 
+Node 4 is `MaxPool(Node 1)+AvgPool(Node 3)`. Nodes that are not served as input for other nodes are viewed as the output of the layer. If there are multiple output nodes, 
+the model will concat them in channels as the layer output.
+
+The predefined operations are listed as follows. Details can be seen [here](#predefined-operations-enas).
+
+* MaxPool: call `torch.nn.MaxPool2d`. This operation applies a 2D max pooling over all input channels. Its parameters are fixed to `kernal_size=3`, `stride=1` and `padding=1`.
+* AvgPool: call `torch.nn.AvgPool2d`. This operation applies a 2D average pooling over all input channels. Its parameters are fixed to `kernal_size=3`, `stride=1` and `padding=1`.
+* SepConvBN3x3: ReLU followed by a [DilConv](#DilConv) and BatchNorm. Convilution parameters are `kernal_size=3`, `stride=1` and `padding=1`.
+* SepConvBN5x5: Do the same operation as the previous one but it has different kernal size, which is set to 5.
+* Skip Connect: There is no operation between two nodes. Call `torch.nn.Identity` to forward what it gets to the output.
+
+```eval_rst
+..  autoclass:: nni.nas.pytorch.search_space_zoo.ENASMicroLayer
+    :members:
+```
+
+The Reduction Layer is made up by two Conv operations, each of them will output `C_out//2` channels and concat them in channels as the output. The Convolutions have `kernal_size=1` 
+and `stride=2`, and they perform alternate sampling on the input so as to reduce the resolution without lossing information.
+
+```eval_rst
+..  autoclass:: nni.nas.pytorch.search_space_zoo.ENASReductionLayer
+    :members:
+```
+
+### Example Code
+
+[example code](https://github.com/microsoft/nni/tree/master/examples/nas/search_space_zoo/enas_micro_example.py)
+
+```bash
+git clone https://github.com/Microsoft/nni.git
+cd nni/examples/nas/search_space_zoo
+# search the best cell structure
+python3 enas_micro_example.py
+```
+
+<a name="predefined-operations-enas"></a>
+
+### ENAS Micro predefined operations
+
+* MaxPool / AvgPool
+
+    MaxPool / AvgPool with `kernal_size=3`, `stride=1` and `padding=1` followed by BatchNorm2d
+    ```eval_rst
+    ..  autoclass:: nni.nas.pytorch.search_space_zoo.enas_ops.Pool
+    ```
+
+* SepConv
+
+    <!-- MaxPool / AvgPool with `kernal_size=3`, `stride=1` and `padding=1` followed by BatchNorm2d -->
+    ```eval_rst
+    ..  autoclass:: nni.nas.pytorch.search_space_zoo.enas_ops.SepConvBN
+    ```
+
+* Skip Connection
+
+    There is no connection between the two nodes.
+
+## ENASMacroLayer
+
+In Macro search, the controller makes two decisions for each layer:L i) the [operation](#macro-operations) to perform on the previous layer, ii) the previous layer to connect to for skip connections. NNI privides [predefined operations](#macro-operations) for macro search, which are listed as following:
+
+* Conv3x3(separable and non-separable): Conv parameters are fixed `kernal_size=3`, `padding=1` and `stride=1`. If `separable=True`, Conv is replaced with [DilConv](#DilConv).
+* Conv5x5(separable and non-separable): Do the same operation as the previous one but it has different kernal size, which is set to 5.
+* AvgPool: call `torch.nn.AvgPool2d`. This operation applies a 2D average pooling over all input channels. Its parameters are fixed to `kernal_size=3`, `stride=1` and `padding=1`.
+* MaxPool: call `torch.nn.MaxPool2d`. This operation applies a 2D max pooling over all input channels. Its parameters are fixed to `kernal_size=3`, `stride=1` and `padding=1`.
+
+```eval_rst
+..  autoclass:: nni.nas.pytorch.search_space_zoo.ENASMacroLayer
+    :members:
+```
+
+### Example Code
+
+[example code](https://github.com/microsoft/nni/tree/master/examples/nas/search_space_zoo/enas_macro_example.py)
+
+```bash
+git clone https://github.com/Microsoft/nni.git
+cd nni/examples/nas/search_space_zoo
+# search the best cell structure
+python3 enas_macro_example.py
+```
+
+<a name="macro-operations"></a>
+
+### ENAS Macro predefined operations
+
+* ConvBranch
+
+    ```eval_rst
+    ..  autoclass:: nni.nas.pytorch.search_space_zoo.enas_ops.ConvBranch
+    ```
+* PoolBranch
+
+    ```eval_rst
+    ..  autoclass:: nni.nas.pytorch.search_space_zoo.enas_ops.PoolBranch
+    ```
diff --git a/docs/en_US/nas.rst b/docs/en_US/nas.rst
@@ -25,3 +25,4 @@ For details, please refer to the following tutorials:
     NAS Visualization <NAS/Visualization>
     NAS Benchmarks <NAS/Benchmarks>
     API Reference <NAS/NasReference>
+    Search Space Zoo <NAS/SearchSpaceZoo>
diff --git a/examples/nas/enas/search.py b/examples/nas/enas/search.py
@@ -23,7 +23,7 @@
     parser = ArgumentParser("enas")
     parser.add_argument("--batch-size", default=128, type=int)
     parser.add_argument("--log-frequency", default=10, type=int)
-    parser.add_argument("--search-for", choices=["macro", "micro"], default="macro")
+    # parser.add_argument("--search-for", choices=["macro", "micro"], default="macro")
     parser.add_argument("--epochs", default=None, type=int, help="Number of epochs (default: macro 310, micro 150)")
     parser.add_argument("--visualization", default=False, action="store_true")
     args = parser.parse_args()

diff --git a/examples/nas/search_space_zoo/darts_example.py b/examples/nas/search_space_zoo/darts_example.py
@@ -0,0 +1,53 @@
+# copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+import logging
+import time
+from argparse import ArgumentParser
+
+import torch
+import torch.nn as nn
+
+import datasets
+from nni.nas.pytorch.callbacks import ArchitectureCheckpoint, LRSchedulerCallback
+from nni.nas.pytorch.darts import DartsTrainer
+from utils import accuracy
+
+from nni.nas.pytorch.search_space_zoo import DartsCell
+from darts_search_space import DartsStackedCells
+
+logger = logging.getLogger('nni')
+
+if __name__ == "__main__":
+    parser = ArgumentParser("darts")
+    parser.add_argument("--layers", default=8, type=int)
+    parser.add_argument("--batch-size", default=64, type=int)
+    parser.add_argument("--log-frequency", default=10, type=int)
+    parser.add_argument("--epochs", default=50, type=int)
+    parser.add_argument("--channels", default=16, type=int)
+    parser.add_argument("--unrolled", default=False, action="store_true")
+    parser.add_argument("--visualization", default=False, action="store_true")
+    args = parser.parse_args()
+
+    dataset_train, dataset_valid = datasets.get_dataset("cifar10")
+
+    model = DartsStackedCells(3, args.channels, 10, args.layers, DartsCell)
+    criterion = nn.CrossEntropyLoss()
+
+    optim = torch.optim.SGD(model.parameters(), 0.025, momentum=0.9, weight_decay=3.0E-4)
+    lr_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optim, args.epochs, eta_min=0.001)
+
+    trainer = DartsTrainer(model,
+                           loss=criterion,
+                           metrics=lambda output, target: accuracy(output, target, topk=(1,)),
+                           optimizer=optim,
+                           num_epochs=args.epochs,
+                           dataset_train=dataset_train,
+                           dataset_valid=dataset_valid,
+                           batch_size=args.batch_size,
+                           log_frequency=args.log_frequency,
+                           unrolled=args.unrolled,
+                           callbacks=[LRSchedulerCallback(lr_scheduler), ArchitectureCheckpoint("./checkpoints")])
+    if args.visualization:
+        trainer.enable_visualization()
+    trainer.train()
diff --git a/examples/nas/search_space_zoo/darts_stack_cells.py b/examples/nas/search_space_zoo/darts_stack_cells.py
@@ -0,0 +1,83 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+import torch.nn as nn
+import ops
+
+
+class DartsStackedCells(nn.Module):
+    """
+    builtin Darts Search Space
+    Compared to Darts example, DartsSearchSpace removes Auxiliary Head, which
+    is considered as a trick rather than part of model.
+
+    Attributes
+    ---
+    in_channels: int
+        the number of input channels
+    channels: int
+        the number of initial channels expected
+    n_classes: int
+        classes for final classification
+    n_layers: int
+        the number of cells contained in this network
+    factory_func: function
+        return a callable instance for demand cell structure.
+        user should pass in ``__init__`` of the cell class with required parameters (see nni.nas.DartsCell for detail)
+    n_nodes: int
+        the number of nodes contained in each cell
+    stem_multiplier: int
+        channels multiply coefficient when passing a cell
+    """
+
+    def __init__(self, in_channels, channels, n_classes, n_layers, factory_func, n_nodes=4,
+                 stem_multiplier=3):
+        super().__init__()
+        self.in_channels = in_channels
+        self.channels = channels
+        self.n_classes = n_classes
+        self.n_layers = n_layers
+
+        c_cur = stem_multiplier * self.channels
+        self.stem = nn.Sequential(
+            nn.Conv2d(in_channels, c_cur, 3, 1, 1, bias=False),
+            nn.BatchNorm2d(c_cur)
+        )
+
+        # for the first cell, stem is used for both s0 and s1
+        # [!] channels_pp and channels_p is output channel size, but c_cur is input channel size.
+        channels_pp, channels_p, c_cur = c_cur, c_cur, channels
+
+        self.cells = nn.ModuleList()
+        reduction_p, reduction = False, False
+        for i in range(n_layers):
+            reduction_p, reduction = reduction, False
+            # Reduce featuremap size and double channels in 1/3 and 2/3 layer.
+            if i in [n_layers // 3, 2 * n_layers // 3]:
+                c_cur *= 2
+                reduction = True
+
+            cell = factory_func(n_nodes, channels_pp, channels_p, c_cur, reduction_p, reduction)
+            self.cells.append(cell)
+            c_cur_out = c_cur * n_nodes
+            channels_pp, channels_p = channels_p, c_cur_out
+
+        self.gap = nn.AdaptiveAvgPool2d(1)
+        self.linear = nn.Linear(channels_p, n_classes)
+
+    def forward(self, x):
+        s0 = s1 = self.stem(x)
+
+        for cell in self.cells:
+            s0, s1 = s1, cell(s0, s1)
+
+        out = self.gap(s1)
+        out = out.view(out.size(0), -1)  # flatten
+        logits = self.linear(out)
+
+        return logits
+
+    def drop_path_prob(self, p):
+        for module in self.modules():
+            if isinstance(module, ops.DropPath):
+                module.p = p
diff --git a/examples/nas/search_space_zoo/datasets.py b/examples/nas/search_space_zoo/datasets.py
@@ -0,0 +1,56 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+import numpy as np
+import torch
+from torchvision import transforms
+from torchvision.datasets import CIFAR10
+
+
+class Cutout(object):
+    def __init__(self, length):
+        self.length = length
+
+    def __call__(self, img):
+        h, w = img.size(1), img.size(2)
+        mask = np.ones((h, w), np.float32)
+        y = np.random.randint(h)
+        x = np.random.randint(w)
+
+        y1 = np.clip(y - self.length // 2, 0, h)
+        y2 = np.clip(y + self.length // 2, 0, h)
+        x1 = np.clip(x - self.length // 2, 0, w)
+        x2 = np.clip(x + self.length // 2, 0, w)
+
+        mask[y1: y2, x1: x2] = 0.
+        mask = torch.from_numpy(mask)
+        mask = mask.expand_as(img)
+        img *= mask
+
+        return img
+
+
+def get_dataset(cls, cutout_length=0):
+    MEAN = [0.49139968, 0.48215827, 0.44653124]
+    STD = [0.24703233, 0.24348505, 0.26158768]
+    transf = [
+        transforms.RandomCrop(32, padding=4),
+        transforms.RandomHorizontalFlip()
+    ]
+    normalize = [
+        transforms.ToTensor(),
+        transforms.Normalize(MEAN, STD)
+    ]
+    cutout = []
+    if cutout_length > 0:
+        cutout.append(Cutout(cutout_length))
+
+    train_transform = transforms.Compose(transf + normalize + cutout)
+    valid_transform = transforms.Compose(normalize)
+
+    if cls == "cifar10":
+        dataset_train = CIFAR10(root="./data", train=True, download=True, transform=train_transform)
+        dataset_valid = CIFAR10(root="./data", train=False, download=True, transform=valid_transform)
+    else:
+        raise NotImplementedError
+    return dataset_train, dataset_valid