diff --git a/docs/en_US/Compressor/ActivationRankFilterPruner.md b/docs/en_US/Compressor/ActivationRankFilterPruner.md
new file mode 100644
index 0000000000..7c836cb140
--- /dev/null
+++ b/docs/en_US/Compressor/ActivationRankFilterPruner.md
@@ -0,0 +1,58 @@
+ActivationRankFilterPruner on NNI Compressor
+===
+
+## 1. Introduction
+
+ActivationRankFilterPruner is a series of pruners which prune filters according to some importance criterion calculated from the filters' output activations.
+
+|             Pruner             |       Importance criterion        |                       Reference paper                        |
+| :----------------------------: | :-------------------------------: | :----------------------------------------------------------: |
+| ActivationAPoZRankFilterPruner | APoZ(average percentage of zeros) | [Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures](https://arxiv.org/abs/1607.03250) |
+| ActivationMeanRankFilterPruner | mean value of output activations  | [Pruning Convolutional Neural Networks for Resource Efficient Inference](https://arxiv.org/abs/1611.06440) |
+
+## 2. Pruners
+
+### ActivationAPoZRankFilterPruner
+
+Hengyuan Hu, Rui Peng, Yu-Wing Tai and Chi-Keung Tang,
+
+"[Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures](https://arxiv.org/abs/1607.03250)", ICLR 2016.
+
+ActivationAPoZRankFilterPruner prunes the filters with the smallest APoZ(average percentage of zeros) of output activations.
+
+The APoZ is defined as:
+
+![](../../img/apoz.png)
+
+### ActivationMeanRankFilterPruner
+
+Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila and Jan Kautz,
+
+"[Pruning Convolutional Neural Networks for Resource Efficient Inference](https://arxiv.org/abs/1611.06440)", ICLR 2017.
+
+ActivationMeanRankFilterPruner prunes the filters with the smallest mean value of output activations
+
+## 3. Usage
+
+PyTorch code
+
+```python
+from nni.compression.torch import ActivationAPoZRankFilterPruner
+config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'], 'op_names': ['conv1', 'conv2'] }]
+pruner = ActivationAPoZRankFilterPruner(model, config_list, statistics_batch_num=1)
+pruner.compress()
+```
+
+#### User configuration for ActivationAPoZRankFilterPruner
+
+- **sparsity:** This is to specify the sparsity operations to be compressed to
+- **op_types:** Only Conv2d is supported in ActivationAPoZRankFilterPruner
+
+## 4. Experiment
+
+TODO. 
+
+
+
+
+
diff --git a/docs/en_US/Compressor/Overview.md b/docs/en_US/Compressor/Overview.md
index 3782c578e7..f277de5c0f 100644
--- a/docs/en_US/Compressor/Overview.md
+++ b/docs/en_US/Compressor/Overview.md
@@ -14,10 +14,14 @@ We have provided several compression algorithms, including several pruning and q
 |---|---|
 | [Level Pruner](./Pruner.md#level-pruner) | Pruning the specified ratio on each weight based on absolute values of weights |
 | [AGP Pruner](./Pruner.md#agp-pruner) | Automated gradual pruning (To prune, or not to prune: exploring the efficacy of pruning for model compression) [Reference Paper](https://arxiv.org/abs/1710.01878)|
-| [L1Filter Pruner](./Pruner.md#l1filter-pruner) | Pruning least important filters in convolution layers(PRUNING FILTERS FOR EFFICIENT CONVNETS)[Reference Paper](https://arxiv.org/abs/1608.08710) |
-| [Slim Pruner](./Pruner.md#slim-pruner) | Pruning channels in convolution layers by pruning scaling factors in BN layers(Learning Efficient Convolutional Networks through Network Slimming)[Reference Paper](https://arxiv.org/abs/1708.06519) |
 | [Lottery Ticket Pruner](./Pruner.md#agp-pruner) | The pruning process used by "The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks". It prunes a model iteratively. [Reference Paper](https://arxiv.org/abs/1803.03635)|
 | [FPGM Pruner](./Pruner.md#fpgm-pruner) | Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration [Reference Paper](https://arxiv.org/pdf/1811.00250.pdf)|
+| [L1Filter Pruner](./Pruner.md#l1filter-pruner) | Pruning filters with the smallest L1 norm of weights in convolution layers(PRUNING FILTERS FOR EFFICIENT CONVNETS)[Reference Paper](https://arxiv.org/abs/1608.08710) |
+| [L2Filter Pruner](./Pruner.md#l2filter-pruner) | Pruning filters with the smallest L2 norm of weights in convolution layers |
+| [ActivationAPoZRankFilterPruner](./Pruner.md#ActivationAPoZRankFilterPruner) | Pruning filters prunes the filters with the smallest APoZ(average percentage of zeros) of output activations(Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures)[Reference Paper](https://arxiv.org/abs/1607.03250) |
+| [ActivationMeanRankFilterPruner](./Pruner.md#ActivationMeanRankFilterPruner) | Pruning filters prunes the filters with the smallest mean value of output activations(Pruning Convolutional Neural Networks for Resource Efficient Inference)[Reference Paper](https://arxiv.org/abs/1611.06440) |
+| [Slim Pruner](./Pruner.md#slim-pruner) | Pruning channels in convolution layers by pruning scaling factors in BN layers(Learning Efficient Convolutional Networks through Network Slimming)[Reference Paper](https://arxiv.org/abs/1708.06519) |
+
 
 **Quantization**
 
diff --git a/docs/en_US/Compressor/Pruner.md b/docs/en_US/Compressor/Pruner.md
index 298ade1d1f..a96414edae 100644
--- a/docs/en_US/Compressor/Pruner.md
+++ b/docs/en_US/Compressor/Pruner.md
@@ -10,7 +10,7 @@ We first sort the weights in the specified layer by their absolute values. And t
 ### Usage
 
 Tensorflow code
-```
+```python
 from nni.compression.tensorflow import LevelPruner
 config_list = [{ 'sparsity': 0.8, 'op_types': ['default'] }]
 pruner = LevelPruner(model_graph, config_list)
@@ -18,7 +18,7 @@ pruner.compress()
 ```
 
 PyTorch code
-```
+```python
 from nni.compression.torch import LevelPruner
 config_list = [{ 'sparsity': 0.8, 'op_types': ['default'] }]
 pruner = LevelPruner(model, config_list)
@@ -40,8 +40,6 @@ This is an iterative pruner, In [To prune, or not to prune: exploring the effica
 ### Usage
 You can prune all weight from 0% to 80% sparsity in 10 epoch with the code below.
 
-First, you should import pruner and add mask to model.
-
 Tensorflow code
 ```python
 from nni.compression.tensorflow import AGP_Pruner
@@ -71,7 +69,7 @@ pruner = AGP_Pruner(model, config_list)
 pruner.compress()
 ```
 
-Second, you should add code below to update epoch number when you finish one epoch in your training code.
+you should add code below to update epoch number when you finish one epoch in your training code.
 
 Tensorflow code 
 ```python
@@ -133,13 +131,16 @@ The above configuration means that there are 5 times of iterative pruning. As th
 * **sparsity:** The final sparsity when the compression is done.
 
 ***
-## FPGM Pruner
+## WeightRankFilterPruner
+WeightRankFilterPruner is a series of pruners which prune the filters with the smallest importance criterion calculated from the weights in convolution layers to achieve a preset level of network sparsity
+
+### 1, FPGM Pruner
+
 This is an one-shot pruner, FPGM Pruner is an implementation of paper [Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration](https://arxiv.org/pdf/1811.00250.pdf)
 
 >Previous works utilized “smaller-norm-less-important” criterion to prune filters with smaller norm values in a convolutional neural network. In this paper, we analyze this norm-based criterion and point out that its effectiveness depends on two requirements that are not always met: (1) the norm deviation of the filters should be large; (2) the minimum norm of the filters should be small. To solve this problem, we propose a novel filter pruning method, namely Filter Pruning via Geometric Median (FPGM), to compress the model regardless of those two requirements. Unlike previous methods, FPGM compresses CNN models by pruning filters with redundancy, rather than those with “relatively less” importance.
 
-### Usage
-First, you should import pruner and add mask to model.
+#### Usage
 
 Tensorflow code
 ```python
@@ -163,7 +164,7 @@ pruner.compress()
 ```
 Note: FPGM Pruner is used to prune convolutional layers within deep neural networks, therefore the `op_types` field supports only convolutional layers.
 
-Second, you should add code below to update epoch number at beginning of each epoch.
+you should add code below to update epoch number at beginning of each epoch.
 
 Tensorflow code
 ```python
@@ -180,7 +181,7 @@ You can view example for more information
 
 ***
 
-## L1Filter Pruner
+### 2, L1Filter Pruner
 
 This is an one-shot pruner, In ['PRUNING FILTERS FOR EFFICIENT CONVNETS'](https://arxiv.org/abs/1608.08710), authors Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet and Hans Peter Graf.
 
@@ -193,12 +194,16 @@ This is an one-shot pruner, In ['PRUNING FILTERS FOR EFFICIENT CONVNETS'](https:
 > 1. For each filter ![](http://latex.codecogs.com/gif.latex?F_{i,j}), calculate the sum of its absolute kernel weights![](http://latex.codecogs.com/gif.latex?s_j=\sum_{l=1}^{n_i}\sum|K_l|)
 > 2. Sort the filters by ![](http://latex.codecogs.com/gif.latex?s_j).
 > 3. Prune ![](http://latex.codecogs.com/gif.latex?m) filters with the smallest sum values and their corresponding feature maps. The
->   kernels in the next convolutional layer corresponding to the pruned feature maps are also
->   removed.
+>      kernels in the next convolutional layer corresponding to the pruned feature maps are also
+>        removed.
 > 4. A new kernel matrix is created for both the ![](http://latex.codecogs.com/gif.latex?i)th and ![](http://latex.codecogs.com/gif.latex?i+1)th layers, and the remaining kernel
->   weights are copied to the new model.
+>      weights are copied to the new model.
 
-```
+#### Usage
+
+PyTorch code
+
+```python
 from nni.compression.torch import L1FilterPruner
 config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'] }]
 pruner = L1FilterPruner(model, config_list)
@@ -208,7 +213,90 @@ pruner.compress()
 #### User configuration for L1Filter Pruner
 
 - **sparsity:** This is to specify the sparsity operations to be compressed to
-- **op_types:** Only Conv2d is supported in L1Filter Pruner
+- **op_types:** Only Conv1d and Conv2d is supported in L1Filter Pruner
+
+***
+
+### 3, L2Filter Pruner
+
+This is a structured pruning algorithm that prunes the filters with the smallest L2 norm of the weights.
+
+#### Usage
+
+PyTorch code
+
+```python
+from nni.compression.torch import L2FilterPruner
+config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'] }]
+pruner = L2FilterPruner(model, config_list)
+pruner.compress()
+```
+
+#### User configuration for L2Filter Pruner
+
+- **sparsity:** This is to specify the sparsity operations to be compressed to
+- **op_types:** Only Conv1d and Conv2d is supported in L2Filter Pruner
+
+## ActivationRankFilterPruner
+ActivationRankFilterPruner is a series of pruners which prune the filters with the smallest importance criterion calculated from the output activations of convolution layers to achieve a preset level of network sparsity
+
+### 1, ActivationAPoZRankFilterPruner
+
+This is an one-shot pruner, ActivationAPoZRankFilterPruner is an implementation of paper [Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures](https://arxiv.org/abs/1607.03250)
+
+#### Usage
+
+PyTorch code
+
+```python
+from nni.compression.torch import ActivationAPoZRankFilterPruner
+config_list = [{
+    'sparsity': 0.5,
+    'op_types': ['Conv2d']
+}]
+pruner = ActivationAPoZRankFilterPruner(model, config_list, statistics_batch_num=1)
+pruner.compress()
+```
+
+Note: ActivationAPoZRankFilterPruner is used to prune convolutional layers within deep neural networks, therefore the `op_types` field supports only convolutional layers.
+
+You can view example for more information
+
+#### User configuration for ActivationAPoZRankFilterPruner
+
+- **sparsity:** How much percentage of convolutional filters are to be pruned.
+- **op_types:** Only Conv2d is supported in ActivationAPoZRankFilterPruner
+
+***
+
+### 2, ActivationMeanRankFilterPruner
+
+This is an one-shot pruner, ActivationMeanRankFilterPruner is an implementation of paper [Pruning Convolutional Neural Networks for Resource Efficient Inference](https://arxiv.org/abs/1611.06440)
+
+#### Usage
+
+PyTorch code
+
+```python
+from nni.compression.torch import ActivationMeanRankFilterPruner
+config_list = [{
+    'sparsity': 0.5,
+    'op_types': ['Conv2d']
+}]
+pruner = ActivationMeanRankFilterPruner(model, config_list)
+pruner.compress()
+```
+
+Note: ActivationMeanRankFilterPruner is used to prune convolutional layers within deep neural networks, therefore the `op_types` field supports only convolutional layers.
+
+You can view example for more information
+
+#### User configuration for ActivationMeanRankFilterPruner
+
+- **sparsity:** How much percentage of convolutional filters are to be pruned.
+- **op_types:** Only Conv2d is supported in ActivationMeanRankFilterPruner
+
+***
 
 ## Slim Pruner
 
@@ -222,7 +310,7 @@ This is an one-shot pruner, In ['Learning Efficient Convolutional Networks throu
 
 PyTorch code
 
-```
+```python
 from nni.compression.torch import SlimPruner
 config_list = [{ 'sparsity': 0.8, 'op_types': ['BatchNorm2d'] }]
 pruner = SlimPruner(model, config_list)
diff --git a/docs/en_US/Compressor/Quantizer.md b/docs/en_US/Compressor/Quantizer.md
index 5dd99e3432..67791117e1 100644
--- a/docs/en_US/Compressor/Quantizer.md
+++ b/docs/en_US/Compressor/Quantizer.md
@@ -1,6 +1,5 @@
 Quantizer on NNI Compressor
 ===
-
 ## Naive Quantizer
 
 We provide Naive Quantizer to quantizer weight to default 8 bits, you can use it to test quantize algorithm without any configure.
@@ -53,11 +52,24 @@ You can view example for more information
 
 #### User configuration for QAT Quantizer
 * **quant_types:** : list of string
-type of quantization you want to apply, currently support 'weight', 'input', 'output'
+
+type of quantization you want to apply, currently support 'weight', 'input', 'output'.
+
+* **op_types:** list of string
+
+specify the type of modules that will be quantized. eg. 'Conv2D'
+
+* **op_names:** list of string
+
+specify the name of modules that will be quantized. eg. 'conv1'
+
 * **quant_bits:** int or dict of {str : int}
-bits length of quantization, key is the quantization type, value is the length, eg. {'weight', 8},
-when the type is int, all quantization types share same bits length
+
+bits length of quantization, key is the quantization type, value is the length, eg. {'weight': 8},
+when the type is int, all quantization types share same bits length.
+
 * **quant_start_step:** int
+
 disable quantization until model are run by certain number of steps, this allows the network to enter a more stable
 state where activation quantization ranges do not exclude a signiﬁcant fraction of values, default value is 0
 
@@ -71,17 +83,14 @@ In [DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bit
 ### Usage
 To implement DoReFa Quantizer, you can add code below before your training code
 
-Tensorflow code
-```python
-from nni.compressors.tensorflow import DoReFaQuantizer
-config_list = [{ 'q_bits': 8, 'op_types': 'default' }]
-quantizer = DoReFaQuantizer(tf.get_default_graph(), config_list)
-quantizer.compress()
-```
 PyTorch code
 ```python
 from nni.compressors.torch import DoReFaQuantizer
-config_list = [{ 'q_bits': 8, 'op_types': 'default' }]
+config_list = [{ 
+    'quant_types': ['weight'],
+    'quant_bits': 8, 
+    'op_types': 'default' 
+}]
 quantizer = DoReFaQuantizer(model, config_list)
 quantizer.compress()
 ```
@@ -89,4 +98,79 @@ quantizer.compress()
 You can view example for more information
 
 #### User configuration for DoReFa Quantizer
-* **q_bits:** This is to specify the q_bits operations to be quantized to
+* **quant_types:** : list of string
+
+type of quantization you want to apply, currently support 'weight', 'input', 'output'.
+
+* **op_types:** list of string
+
+specify the type of modules that will be quantized. eg. 'Conv2D'
+
+* **op_names:** list of string
+
+specify the name of modules that will be quantized. eg. 'conv1'
+
+* **quant_bits:** int or dict of {str : int}
+
+bits length of quantization, key is the quantization type, value is the length, eg. {'weight': 8},
+when the type is int, all quantization types share same bits length.
+
+
+## BNN Quantizer
+In [Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1](https://arxiv.org/abs/1602.02830), 
+
+>We introduce a method to train Binarized Neural Networks (BNNs) - neural networks with binary weights and activations at run-time. At training-time the binary weights and activations are used for computing the parameters gradients. During the forward pass, BNNs drastically reduce memory size and accesses, and replace most arithmetic operations with bit-wise operations, which is expected to substantially improve power-efficiency.
+
+
+### Usage
+
+PyTorch code
+```python
+from nni.compression.torch import BNNQuantizer
+model = VGG_Cifar10(num_classes=10)
+
+configure_list = [{
+    'quant_types': ['weight'],
+    'quant_bits': 1,
+    'op_types': ['Conv2d', 'Linear'],
+    'op_names': ['features.0', 'features.3', 'features.7', 'features.10', 'features.14', 'features.17', 'classifier.0', 'classifier.3']
+}, {
+    'quant_types': ['output'],
+    'quant_bits': 1,
+    'op_types': ['Hardtanh'],
+    'op_names': ['features.6', 'features.9', 'features.13', 'features.16', 'features.20', 'classifier.2', 'classifier.5']
+}]
+
+quantizer = BNNQuantizer(model, configure_list)
+model = quantizer.compress()
+```
+
+You can view example [examples/model_compress/BNN_quantizer_cifar10.py]( https://github.com/microsoft/nni/tree/master/examples/model_compress/BNN_quantizer_cifar10.py) for more information.
+
+#### User configuration for BNN Quantizer
+* **quant_types:** : list of string
+
+type of quantization you want to apply, currently support 'weight', 'input', 'output'.
+
+* **op_types:** list of string
+
+specify the type of modules that will be quantized. eg. 'Conv2D'
+
+* **op_names:** list of string
+
+specify the name of modules that will be quantized. eg. 'conv1'
+
+* **quant_bits:** int or dict of {str : int}
+
+bits length of quantization, key is the quantization type, value is the length, eg. {'weight': 8},
+when the type is int, all quantization types share same bits length.
+
+### Experiment
+We implemented one of the experiments in [Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1](https://arxiv.org/abs/1602.02830), we quantized the **VGGNet** for CIFAR-10 in the paper. Our experiments results are as follows:
+
+| Model         | Accuracy  | 
+| ------------- | --------- | 
+| VGGNet        | 86.93%    |
+
+
+The experiments code can be found at [examples/model_compress/BNN_quantizer_cifar10.py]( https://github.com/microsoft/nni/tree/master/examples/model_compress/BNN_quantizer_cifar10.py) 
\ No newline at end of file
diff --git a/docs/en_US/Compressor/L1FilterPruner.md b/docs/en_US/Compressor/WeightRankFilterPruner.md
similarity index 52%
rename from docs/en_US/Compressor/L1FilterPruner.md
rename to docs/en_US/Compressor/WeightRankFilterPruner.md
index 2906fde271..ef99dcff03 100644
--- a/docs/en_US/Compressor/L1FilterPruner.md
+++ b/docs/en_US/Compressor/WeightRankFilterPruner.md
@@ -1,8 +1,20 @@
-L1FilterPruner on NNI Compressor
+WeightRankFilterPruner on NNI Compressor
 ===
 
 ## 1. Introduction
 
+WeightRankFilterPruner is a series of pruners which prune filters according to some importance criterion calculated from the filters' weight.
+
+|     Pruner     |    Importance criterion     |                       Reference paper                        |
+| :------------: | :-------------------------: | :----------------------------------------------------------: |
+| L1FilterPruner |     L1 norm of weights      | [PRUNING FILTERS FOR EFFICIENT CONVNETS](https://arxiv.org/abs/1608.08710) |
+| L2FilterPruner |     L2 norm of weights      |                                                              |
+|   FPGMPruner   | Geometric Median of weights | [Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration](https://arxiv.org/pdf/1811.00250.pdf) |
+
+## 2. Pruners
+
+### L1FilterPruner
+
 L1FilterPruner is a general structured pruning algorithm for pruning filters in the convolutional layers.
 
 In ['PRUNING FILTERS FOR EFFICIENT CONVNETS'](https://arxiv.org/abs/1608.08710), authors Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet and Hans Peter Graf.
@@ -16,12 +28,26 @@ In ['PRUNING FILTERS FOR EFFICIENT CONVNETS'](https://arxiv.org/abs/1608.08710),
 > 1. For each filter ![](http://latex.codecogs.com/gif.latex?F_{i,j}), calculate the sum of its absolute kernel weights![](http://latex.codecogs.com/gif.latex?s_j=\sum_{l=1}^{n_i}\sum|K_l|)
 > 2. Sort the filters by ![](http://latex.codecogs.com/gif.latex?s_j).
 > 3. Prune ![](http://latex.codecogs.com/gif.latex?m) filters with the smallest sum values and their corresponding feature maps. The
->   kernels in the next convolutional layer corresponding to the pruned feature maps are also
->   removed.
+>      kernels in the next convolutional layer corresponding to the pruned feature maps are also
+>        removed.
 > 4. A new kernel matrix is created for both the ![](http://latex.codecogs.com/gif.latex?i)th and ![](http://latex.codecogs.com/gif.latex?i+1)th layers, and the remaining kernel
->   weights are copied to the new model.
+>      weights are copied to the new model.
+
+### L2FilterPruner
+
+L2FilterPruner is similar to L1FilterPruner, but only replace the importance criterion from L1 norm to L2 norm
+
+### FPGMPruner
+
+Yang He, Ping Liu, Ziwei Wang, Zhilan Hu, Yi Yang
+
+"[Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration](https://arxiv.org/abs/1811.00250)", CVPR 2019.
+
+FPGMPruner prune filters with the smallest geometric median
+
+ ![](../../img/fpgm_fig1.png)
 
-## 2. Usage
+## 3. Usage
 
 PyTorch code
 
@@ -37,9 +63,9 @@ pruner.compress()
 - **sparsity:** This is to specify the sparsity operations to be compressed to
 - **op_types:** Only Conv2d is supported in L1Filter Pruner
 
-## 3. Experiment
+## 4. Experiment
 
-We implemented one of the experiments in ['PRUNING FILTERS FOR EFFICIENT CONVNETS'](https://arxiv.org/abs/1608.08710), we pruned **VGG-16** for CIFAR-10 to **VGG-16-pruned-A** in the paper, in which $64\%$ parameters are pruned. Our experiments results are as follows:
+We implemented one of the experiments in ['PRUNING FILTERS FOR EFFICIENT CONVNETS'](https://arxiv.org/abs/1608.08710) with **L1FilterPruner**, we pruned **VGG-16** for CIFAR-10 to **VGG-16-pruned-A** in the paper, in which $64\%$ parameters are pruned. Our experiments results are as follows:
 
 | Model           | Error(paper/ours) | Parameters      | Pruned   |
 | --------------- | ----------------- | --------------- | -------- |
diff --git a/docs/img/apoz.png b/docs/img/apoz.png
new file mode 100644
index 0000000000..e0c452e978
Binary files /dev/null and b/docs/img/apoz.png differ
diff --git a/docs/img/fpgm_fig1.png b/docs/img/fpgm_fig1.png
new file mode 100644
index 0000000000..f9a1fe4031
Binary files /dev/null and b/docs/img/fpgm_fig1.png differ
diff --git a/examples/model_compress/APoZ_torch_cifar10.py b/examples/model_compress/APoZ_torch_cifar10.py
new file mode 100644
index 0000000000..52bcf8ffd3
--- /dev/null
+++ b/examples/model_compress/APoZ_torch_cifar10.py
@@ -0,0 +1,121 @@
+import math
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from torchvision import datasets, transforms
+from nni.compression.torch import ActivationAPoZRankFilterPruner
+from models.cifar10.vgg import VGG
+
+
+def train(model, device, train_loader, optimizer):
+    model.train()
+    for batch_idx, (data, target) in enumerate(train_loader):
+        data, target = data.to(device), target.to(device)
+        optimizer.zero_grad()
+        output = model(data)
+        loss = F.cross_entropy(output, target)
+        loss.backward()
+        optimizer.step()
+        if batch_idx % 100 == 0:
+            print('{:2.0f}%  Loss {}'.format(100 * batch_idx / len(train_loader), loss.item()))
+
+
+def test(model, device, test_loader):
+    model.eval()
+    test_loss = 0
+    correct = 0
+    with torch.no_grad():
+        for data, target in test_loader:
+            data, target = data.to(device), target.to(device)
+            output = model(data)
+            test_loss += F.nll_loss(output, target, reduction='sum').item()
+            pred = output.argmax(dim=1, keepdim=True)
+            correct += pred.eq(target.view_as(pred)).sum().item()
+    test_loss /= len(test_loader.dataset)
+    acc = 100 * correct / len(test_loader.dataset)
+
+    print('Loss: {}  Accuracy: {}%)\n'.format(
+        test_loss, acc))
+    return acc
+
+
+def main():
+    torch.manual_seed(0)
+    device = torch.device('cuda')
+    train_loader = torch.utils.data.DataLoader(
+        datasets.CIFAR10('./data.cifar10', train=True, download=True,
+                         transform=transforms.Compose([
+                             transforms.Pad(4),
+                             transforms.RandomCrop(32),
+                             transforms.RandomHorizontalFlip(),
+                             transforms.ToTensor(),
+                             transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
+                         ])),
+        batch_size=64, shuffle=True)
+    test_loader = torch.utils.data.DataLoader(
+        datasets.CIFAR10('./data.cifar10', train=False, transform=transforms.Compose([
+            transforms.ToTensor(),
+            transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
+        ])),
+        batch_size=200, shuffle=False)
+
+    model = VGG(depth=16)
+    model.to(device)
+
+    # Train the base VGG-16 model
+    print('=' * 10 + 'Train the unpruned base model' + '=' * 10)
+    optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=1e-4)
+    lr_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, 160, 0)
+    for epoch in range(160):
+        train(model, device, train_loader, optimizer)
+        test(model, device, test_loader)
+        lr_scheduler.step(epoch)
+    torch.save(model.state_dict(), 'vgg16_cifar10.pth')
+
+    # Test base model accuracy
+    print('=' * 10 + 'Test on the original model' + '=' * 10)
+    model.load_state_dict(torch.load('vgg16_cifar10.pth'))
+    test(model, device, test_loader)
+    # top1 = 93.51%
+
+    # Pruning Configuration, in paper 'PRUNING FILTERS FOR EFFICIENT CONVNETS',
+    # Conv_1, Conv_8, Conv_9, Conv_10, Conv_11, Conv_12 are pruned with 50% sparsity, as 'VGG-16-pruned-A'
+    configure_list = [{
+        'sparsity': 0.5,
+        'op_types': ['default'],
+        'op_names': ['feature.0', 'feature.24', 'feature.27', 'feature.30', 'feature.34', 'feature.37']
+    }]
+
+    # Prune model and test accuracy without fine tuning.
+    print('=' * 10 + 'Test on the pruned model before fine tune' + '=' * 10)
+    pruner = ActivationAPoZRankFilterPruner(model, configure_list)
+    model = pruner.compress()
+    test(model, device, test_loader)
+    # top1 = 88.19%
+
+    # Fine tune the pruned model for 40 epochs and test accuracy
+    print('=' * 10 + 'Fine tuning' + '=' * 10)
+    optimizer_finetune = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9, weight_decay=1e-4)
+    best_top1 = 0
+    for epoch in range(40):
+        pruner.update_epoch(epoch)
+        print('# Epoch {} #'.format(epoch))
+        train(model, device, train_loader, optimizer_finetune)
+        top1 = test(model, device, test_loader)
+        if top1 > best_top1:
+            best_top1 = top1
+            # Export the best model, 'model_path' stores state_dict of the pruned model,
+            # mask_path stores mask_dict of the pruned model
+            pruner.export_model(model_path='pruned_vgg16_cifar10.pth', mask_path='mask_vgg16_cifar10.pth')
+
+    # Test the exported model
+    print('=' * 10 + 'Test on the pruned model after fine tune' + '=' * 10)
+    new_model = VGG(depth=16)
+    new_model.to(device)
+    new_model.load_state_dict(torch.load('pruned_vgg16_cifar10.pth'))
+    test(new_model, device, test_loader)
+    # top1 = 93.53%
+
+
+if __name__ == '__main__':
+    main()
diff --git a/examples/model_compress/BNN_quantizer_cifar10.py b/examples/model_compress/BNN_quantizer_cifar10.py
new file mode 100644
index 0000000000..d4908885c3
--- /dev/null
+++ b/examples/model_compress/BNN_quantizer_cifar10.py
@@ -0,0 +1,155 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from torchvision import datasets, transforms
+from nni.compression.torch import BNNQuantizer
+
+
+class VGG_Cifar10(nn.Module):
+    def __init__(self, num_classes=1000):
+        super(VGG_Cifar10, self).__init__()
+        self.features = nn.Sequential(
+            nn.Conv2d(3, 128, kernel_size=3, padding=1, bias=False),
+            nn.BatchNorm2d(128, eps=1e-4, momentum=0.1),
+            nn.Hardtanh(inplace=True),
+
+            nn.Conv2d(128, 128, kernel_size=3, padding=1, bias=False),
+            nn.MaxPool2d(kernel_size=2, stride=2),
+            nn.BatchNorm2d(128, eps=1e-4, momentum=0.1),
+            nn.Hardtanh(inplace=True),
+
+            nn.Conv2d(128, 256, kernel_size=3, padding=1, bias=False),
+            nn.BatchNorm2d(256, eps=1e-4, momentum=0.1),
+            nn.Hardtanh(inplace=True),
+
+
+            nn.Conv2d(256, 256, kernel_size=3, padding=1, bias=False),
+            nn.MaxPool2d(kernel_size=2, stride=2),
+            nn.BatchNorm2d(256, eps=1e-4, momentum=0.1),
+            nn.Hardtanh(inplace=True),
+
+
+            nn.Conv2d(256, 512, kernel_size=3, padding=1, bias=False),
+            nn.BatchNorm2d(512, eps=1e-4, momentum=0.1),
+            nn.Hardtanh(inplace=True),
+
+
+            nn.Conv2d(512, 512, kernel_size=3, padding=1, bias=False),
+            nn.MaxPool2d(kernel_size=2, stride=2),
+            nn.BatchNorm2d(512, eps=1e-4, momentum=0.1),
+            nn.Hardtanh(inplace=True)
+        )
+
+        self.classifier = nn.Sequential(
+            nn.Linear(512 * 4 * 4, 1024, bias=False),
+            nn.BatchNorm1d(1024),
+            nn.Hardtanh(inplace=True),
+            nn.Linear(1024, 1024, bias=False),
+            nn.BatchNorm1d(1024),
+            nn.Hardtanh(inplace=True),
+            nn.Linear(1024, num_classes), # do not quantize output
+            nn.BatchNorm1d(num_classes, affine=False)
+        )
+
+
+    def forward(self, x):
+        x = self.features(x)
+        x = x.view(-1, 512 * 4 * 4)
+        x = self.classifier(x)
+        return x
+
+
+def train(model, device, train_loader, optimizer):
+    model.train()
+    for batch_idx, (data, target) in enumerate(train_loader):
+        data, target = data.to(device), target.to(device)
+        optimizer.zero_grad()
+        output = model(data)
+        loss = F.cross_entropy(output, target)
+        loss.backward()
+        optimizer.step()
+        for name, param in model.named_parameters():
+            if name.endswith('old_weight'):
+                param = param.clamp(-1, 1)
+        if batch_idx % 100 == 0:
+            print('{:2.0f}%  Loss {}'.format(100 * batch_idx / len(train_loader), loss.item()))
+
+
+def test(model, device, test_loader):
+    model.eval()
+    test_loss = 0
+    correct = 0
+    with torch.no_grad():
+        for data, target in test_loader:
+            data, target = data.to(device), target.to(device)
+            output = model(data)
+            test_loss += F.nll_loss(output, target, reduction='sum').item()
+            pred = output.argmax(dim=1, keepdim=True)
+            correct += pred.eq(target.view_as(pred)).sum().item()
+    test_loss /= len(test_loader.dataset)
+    acc = 100 * correct / len(test_loader.dataset)
+
+    print('Loss: {}  Accuracy: {}%)\n'.format(
+        test_loss, acc))
+    return acc
+
+def adjust_learning_rate(optimizer, epoch):
+    update_list = [55, 100, 150, 200, 400, 600]
+    if epoch in update_list:
+        for param_group in optimizer.param_groups:
+            param_group['lr'] = param_group['lr'] * 0.1
+    return
+
+def main():
+    torch.manual_seed(0)
+    device = torch.device('cuda')
+    train_loader = torch.utils.data.DataLoader(
+        datasets.CIFAR10('./data.cifar10', train=True, download=True,
+                         transform=transforms.Compose([
+                             transforms.ToTensor(),
+                             transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
+                         ])),
+        batch_size=64, shuffle=True)
+    test_loader = torch.utils.data.DataLoader(
+        datasets.CIFAR10('./data.cifar10', train=False, transform=transforms.Compose([
+            transforms.ToTensor(),
+            transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
+        ])),
+        batch_size=200, shuffle=False)
+
+    model = VGG_Cifar10(num_classes=10)
+    model.to(device)
+
+    configure_list = [{
+        'quant_types': ['weight'],
+        'quant_bits': 1,
+        'op_types': ['Conv2d', 'Linear'],
+        'op_names': ['features.3', 'features.7', 'features.10', 'features.14', 'classifier.0', 'classifier.3']
+    }, {
+        'quant_types': ['output'],
+        'quant_bits': 1,
+        'op_types': ['Hardtanh'],
+        'op_names': ['features.6', 'features.9', 'features.13', 'features.16', 'features.20', 'classifier.2', 'classifier.5']
+    }]
+
+    quantizer = BNNQuantizer(model, configure_list)
+    model = quantizer.compress()
+
+    print('=' * 10 + 'train' + '=' * 10)
+    optimizer = torch.optim.Adam(model.parameters(), lr=1e-2)
+    best_top1 = 0
+    for epoch in range(400):
+        print('# Epoch {} #'.format(epoch))
+        train(model, device, train_loader, optimizer)
+        adjust_learning_rate(optimizer, epoch)
+        top1 = test(model, device, test_loader)
+        if top1 > best_top1:
+            best_top1 = top1
+    print(best_top1)
+
+
+if __name__ == '__main__':
+    main()
diff --git a/examples/model_compress/MeanActivation_torch_cifar10.py b/examples/model_compress/MeanActivation_torch_cifar10.py
new file mode 100644
index 0000000000..40ad2bb023
--- /dev/null
+++ b/examples/model_compress/MeanActivation_torch_cifar10.py
@@ -0,0 +1,121 @@
+import math
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from torchvision import datasets, transforms
+from nni.compression.torch import L1FilterPruner
+from models.cifar10.vgg import VGG
+
+
+def train(model, device, train_loader, optimizer):
+    model.train()
+    for batch_idx, (data, target) in enumerate(train_loader):
+        data, target = data.to(device), target.to(device)
+        optimizer.zero_grad()
+        output = model(data)
+        loss = F.cross_entropy(output, target)
+        loss.backward()
+        optimizer.step()
+        if batch_idx % 100 == 0:
+            print('{:2.0f}%  Loss {}'.format(100 * batch_idx / len(train_loader), loss.item()))
+
+
+def test(model, device, test_loader):
+    model.eval()
+    test_loss = 0
+    correct = 0
+    with torch.no_grad():
+        for data, target in test_loader:
+            data, target = data.to(device), target.to(device)
+            output = model(data)
+            test_loss += F.nll_loss(output, target, reduction='sum').item()
+            pred = output.argmax(dim=1, keepdim=True)
+            correct += pred.eq(target.view_as(pred)).sum().item()
+    test_loss /= len(test_loader.dataset)
+    acc = 100 * correct / len(test_loader.dataset)
+
+    print('Loss: {}  Accuracy: {}%)\n'.format(
+        test_loss, acc))
+    return acc
+
+
+def main():
+    torch.manual_seed(0)
+    device = torch.device('cuda')
+    train_loader = torch.utils.data.DataLoader(
+        datasets.CIFAR10('./data.cifar10', train=True, download=True,
+                         transform=transforms.Compose([
+                             transforms.Pad(4),
+                             transforms.RandomCrop(32),
+                             transforms.RandomHorizontalFlip(),
+                             transforms.ToTensor(),
+                             transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
+                         ])),
+        batch_size=64, shuffle=True)
+    test_loader = torch.utils.data.DataLoader(
+        datasets.CIFAR10('./data.cifar10', train=False, transform=transforms.Compose([
+            transforms.ToTensor(),
+            transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
+        ])),
+        batch_size=200, shuffle=False)
+
+    model = VGG(depth=16)
+    model.to(device)
+
+    # Train the base VGG-16 model
+    print('=' * 10 + 'Train the unpruned base model' + '=' * 10)
+    optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=1e-4)
+    lr_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, 160, 0)
+    for epoch in range(160):
+        train(model, device, train_loader, optimizer)
+        test(model, device, test_loader)
+        lr_scheduler.step(epoch)
+    torch.save(model.state_dict(), 'vgg16_cifar10.pth')
+
+    # Test base model accuracy
+    print('=' * 10 + 'Test on the original model' + '=' * 10)
+    model.load_state_dict(torch.load('vgg16_cifar10.pth'))
+    test(model, device, test_loader)
+    # top1 = 93.51%
+
+    # Pruning Configuration, in paper 'PRUNING FILTERS FOR EFFICIENT CONVNETS',
+    # Conv_1, Conv_8, Conv_9, Conv_10, Conv_11, Conv_12 are pruned with 50% sparsity, as 'VGG-16-pruned-A'
+    configure_list = [{
+        'sparsity': 0.5,
+        'op_types': ['default'],
+        'op_names': ['feature.0', 'feature.24', 'feature.27', 'feature.30', 'feature.34', 'feature.37']
+    }]
+
+    # Prune model and test accuracy without fine tuning.
+    print('=' * 10 + 'Test on the pruned model before fine tune' + '=' * 10)
+    pruner = L1FilterPruner(model, configure_list)
+    model = pruner.compress()
+    test(model, device, test_loader)
+    # top1 = 88.19%
+
+    # Fine tune the pruned model for 40 epochs and test accuracy
+    print('=' * 10 + 'Fine tuning' + '=' * 10)
+    optimizer_finetune = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9, weight_decay=1e-4)
+    best_top1 = 0
+    for epoch in range(40):
+        pruner.update_epoch(epoch)
+        print('# Epoch {} #'.format(epoch))
+        train(model, device, train_loader, optimizer_finetune)
+        top1 = test(model, device, test_loader)
+        if top1 > best_top1:
+            best_top1 = top1
+            # Export the best model, 'model_path' stores state_dict of the pruned model,
+            # mask_path stores mask_dict of the pruned model
+            pruner.export_model(model_path='pruned_vgg16_cifar10.pth', mask_path='mask_vgg16_cifar10.pth')
+
+    # Test the exported model
+    print('=' * 10 + 'Test on the pruned model after fine tune' + '=' * 10)
+    new_model = VGG(depth=16)
+    new_model.to(device)
+    new_model.load_state_dict(torch.load('pruned_vgg16_cifar10.pth'))
+    test(new_model, device, test_loader)
+    # top1 = 93.53%
+
+
+if __name__ == '__main__':
+    main()
diff --git a/examples/nas/.gitignore b/examples/nas/.gitignore
index 8eeb0c2a3f..e26f9a17a1 100644
--- a/examples/nas/.gitignore
+++ b/examples/nas/.gitignore
@@ -1,3 +1,4 @@
 data
 checkpoints
 runs
+nni_auto_gen_search_space.json
diff --git a/examples/nas/spos/README.md b/examples/nas/spos/README.md
new file mode 100644
index 0000000000..ed239f30a1
--- /dev/null
+++ b/examples/nas/spos/README.md
@@ -0,0 +1,88 @@
+# Single Path One-Shot Neural Architecture Search with Uniform Sampling
+
+Single Path One-Shot by Megvii Research. [Paper link](https://arxiv.org/abs/1904.00420). [Official repo](https://github.com/megvii-model/SinglePathOneShot).
+
+Block search only. Channel search is not supported yet.
+
+Only GPU version is provided here.
+
+## Preparation
+
+### Requirements
+
+* PyTorch >= 1.2
+* NVIDIA DALI >= 0.16 as we use DALI to accelerate the data loading of ImageNet. [Installation guide](https://docs.nvidia.com/deeplearning/sdk/dali-developer-guide/docs/installation.html)
+
+### Data
+
+Need to download the flops lookup table from [here](https://1drv.ms/u/s!Am_mmG2-KsrnajesvSdfsq_cN48?e=aHVppN).
+Put `op_flops_dict.pkl` and `checkpoint-150000.pth.tar` (if you don't want to retrain the supernet) under `data` directory.
+
+Prepare ImageNet in the standard format (follow the script [here](https://gist.github.com/BIGBALLON/8a71d225eff18d88e469e6ea9b39cef4)). Link it to `data/imagenet` will be more convenient.
+
+After preparation, it's expected to have the following code structure:
+
+```
+spos
+├── architecture_final.json
+├── blocks.py
+├── config_search.yml
+├── data
+│   ├── imagenet
+│   │   ├── train
+│   │   └── val
+│   └── op_flops_dict.pkl
+├── dataloader.py
+├── network.py
+├── readme.md
+├── scratch.py
+├── supernet.py
+├── tester.py
+├── tuner.py
+└── utils.py
+```
+
+## Step 1. Train Supernet
+
+```
+python supernet.py
+```
+
+Will export the checkpoint to checkpoints directory, for the next step.
+
+NOTE: The data loading used in the official repo is [slightly different from usual](https://github.com/megvii-model/SinglePathOneShot/issues/5), as they use BGR tensor and keep the values between 0 and 255 intentionally to align with their own DL framework. The option `--spos-preprocessing` will simulate the behavior used originally and enable you to use the checkpoints pretrained.
+
+## Step 2. Evolution Search
+
+Single Path One-Shot leverages evolution algorithm to search for the best architecture. The tester, which is responsible for testing the sampled architecture, recalculates all the batch norm for a subset of training images, and evaluates the architecture on the full validation set.
+
+To have a search space ready for NNI framework, first run
+
+```
+nnictl ss_gen -t "python tester.py"
+```
+
+This will generate a file called `nni_auto_gen_search_space.json`, which is a serialized representation of your search space.
+
+Then search with evolution tuner.
+
+```
+nnictl create --config config_search.yml
+```
+
+The final architecture exported from every epoch of evolution can be found in `checkpoints` under the working directory of your tuner, which, by default, is `$HOME/nni/experiments/your_experiment_id/log`.
+
+## Step 3. Train from Scratch
+
+```
+python scratch.py
+```
+
+By default, it will use `architecture_final.json`. This architecture is provided by the official repo (converted into NNI format). You can use any architecture (e.g., the architecture found in step 2) with `--fixed-arc` option.
+
+## Current Reproduction Results
+
+Reproduction is still undergoing. Due to the gap between official release and original paper, we compare our current results with official repo (our run) and paper.
+
+* Evolution phase is almost aligned with official repo. Our evolution algorithm shows a converging trend and reaches ~65% accuracy at the end of search. Nevertheless, this result is not on par with paper. For details, please refer to [this issue](https://github.com/megvii-model/SinglePathOneShot/issues/6).
+* Retrain phase is not aligned. Our retraining code, which uses the architecture released by the authors, reaches 72.14% accuracy, still having a gap towards 73.61% by official release and 74.3% reported in original paper.
diff --git a/examples/nas/spos/architecture_final.json b/examples/nas/spos/architecture_final.json
new file mode 100644
index 0000000000..512a73b9d6
--- /dev/null
+++ b/examples/nas/spos/architecture_final.json
@@ -0,0 +1,22 @@
+{
+  "LayerChoice1": [false, false, true, false],
+  "LayerChoice2": [false, true, false, false],
+  "LayerChoice3": [true, false, false, false],
+  "LayerChoice4": [false, true, false, false],
+  "LayerChoice5": [false, false, true, false],
+  "LayerChoice6": [true, false, false, false],
+  "LayerChoice7": [false, false, true, false],
+  "LayerChoice8": [true, false, false, false],
+  "LayerChoice9": [false, false, true, false],
+  "LayerChoice10": [true, false, false, false],
+  "LayerChoice11": [false, false, true, false],
+  "LayerChoice12": [false, false, false, true],
+  "LayerChoice13": [true, false, false, false],
+  "LayerChoice14": [true, false, false, false],
+  "LayerChoice15": [true, false, false, false],
+  "LayerChoice16": [true, false, false, false],
+  "LayerChoice17": [false, false, false, true],
+  "LayerChoice18": [false, false, true, false],
+  "LayerChoice19": [false, false, false, true],
+  "LayerChoice20": [false, false, false, true]
+}
diff --git a/examples/nas/spos/blocks.py b/examples/nas/spos/blocks.py
new file mode 100644
index 0000000000..5908ecf077
--- /dev/null
+++ b/examples/nas/spos/blocks.py
@@ -0,0 +1,89 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+import torch
+import torch.nn as nn
+
+
+class ShuffleNetBlock(nn.Module):
+    """
+    When stride = 1, the block receives input with 2 * inp channels. Otherwise inp channels.
+    """
+
+    def __init__(self, inp, oup, mid_channels, ksize, stride, sequence="pdp"):
+        super().__init__()
+        assert stride in [1, 2]
+        assert ksize in [3, 5, 7]
+        self.channels = inp // 2 if stride == 1 else inp
+        self.inp = inp
+        self.oup = oup
+        self.mid_channels = mid_channels
+        self.ksize = ksize
+        self.stride = stride
+        self.pad = ksize // 2
+        self.oup_main = oup - self.channels
+        assert self.oup_main > 0
+
+        self.branch_main = nn.Sequential(*self._decode_point_depth_conv(sequence))
+
+        if stride == 2:
+            self.branch_proj = nn.Sequential(
+                # dw
+                nn.Conv2d(self.channels, self.channels, ksize, stride, self.pad,
+                          groups=self.channels, bias=False),
+                nn.BatchNorm2d(self.channels, affine=False),
+                # pw-linear
+                nn.Conv2d(self.channels, self.channels, 1, 1, 0, bias=False),
+                nn.BatchNorm2d(self.channels, affine=False),
+                nn.ReLU(inplace=True)
+            )
+
+    def forward(self, x):
+        if self.stride == 2:
+            x_proj, x = self.branch_proj(x), x
+        else:
+            x_proj, x = self._channel_shuffle(x)
+        return torch.cat((x_proj, self.branch_main(x)), 1)
+
+    def _decode_point_depth_conv(self, sequence):
+        result = []
+        first_depth = first_point = True
+        pc = c = self.channels
+        for i, token in enumerate(sequence):
+            # compute output channels of this conv
+            if i + 1 == len(sequence):
+                assert token == "p", "Last conv must be point-wise conv."
+                c = self.oup_main
+            elif token == "p" and first_point:
+                c = self.mid_channels
+            if token == "d":
+                # depth-wise conv
+                assert pc == c, "Depth-wise conv must not change channels."
+                result.append(nn.Conv2d(pc, c, self.ksize, self.stride if first_depth else 1, self.pad,
+                                        groups=c, bias=False))
+                result.append(nn.BatchNorm2d(c, affine=False))
+                first_depth = False
+            elif token == "p":
+                # point-wise conv
+                result.append(nn.Conv2d(pc, c, 1, 1, 0, bias=False))
+                result.append(nn.BatchNorm2d(c, affine=False))
+                result.append(nn.ReLU(inplace=True))
+                first_point = False
+            else:
+                raise ValueError("Conv sequence must be d and p.")
+            pc = c
+        return result
+
+    def _channel_shuffle(self, x):
+        bs, num_channels, height, width = x.data.size()
+        assert (num_channels % 4 == 0)
+        x = x.reshape(bs * num_channels // 2, 2, height * width)
+        x = x.permute(1, 0, 2)
+        x = x.reshape(2, -1, num_channels // 2, height, width)
+        return x[0], x[1]
+
+
+class ShuffleXceptionBlock(ShuffleNetBlock):
+
+    def __init__(self, inp, oup, mid_channels, stride):
+        super().__init__(inp, oup, mid_channels, 3, stride, "dpdpdp")
diff --git a/examples/nas/spos/config_search.yml b/examples/nas/spos/config_search.yml
new file mode 100644
index 0000000000..fe27faefc8
--- /dev/null
+++ b/examples/nas/spos/config_search.yml
@@ -0,0 +1,16 @@
+authorName: unknown
+experimentName: SPOS Search
+trialConcurrency: 4
+maxExecDuration: 7d
+maxTrialNum: 99999
+trainingServicePlatform: local
+searchSpacePath: nni_auto_gen_search_space.json
+useAnnotation: false
+tuner:
+  codeDir: .
+  classFileName: tuner.py
+  className: EvolutionWithFlops
+trial:
+  command: python tester.py --imagenet-dir /path/to/your/imagenet --spos-prep
+  codeDir: .
+  gpuNum: 1
diff --git a/examples/nas/spos/dataloader.py b/examples/nas/spos/dataloader.py
new file mode 100644
index 0000000000..198d637ed1
--- /dev/null
+++ b/examples/nas/spos/dataloader.py
@@ -0,0 +1,106 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+import os
+
+import nvidia.dali.ops as ops
+import nvidia.dali.types as types
+import torch.utils.data
+from nvidia.dali.pipeline import Pipeline
+from nvidia.dali.plugin.pytorch import DALIClassificationIterator
+
+
+class HybridTrainPipe(Pipeline):
+    def __init__(self, batch_size, num_threads, device_id, data_dir, crop, seed=12, local_rank=0, world_size=1,
+                 spos_pre=False):
+        super(HybridTrainPipe, self).__init__(batch_size, num_threads, device_id, seed=seed + device_id)
+        color_space_type = types.BGR if spos_pre else types.RGB
+        self.input = ops.FileReader(file_root=data_dir, shard_id=local_rank, num_shards=world_size, random_shuffle=True)
+        self.decode = ops.ImageDecoder(device="mixed", output_type=color_space_type)
+        self.res = ops.RandomResizedCrop(device="gpu", size=crop,
+                                         interp_type=types.INTERP_LINEAR if spos_pre else types.INTERP_TRIANGULAR)
+        self.twist = ops.ColorTwist(device="gpu")
+        self.jitter_rng = ops.Uniform(range=[0.6, 1.4])
+        self.cmnp = ops.CropMirrorNormalize(device="gpu",
+                                            output_dtype=types.FLOAT,
+                                            output_layout=types.NCHW,
+                                            image_type=color_space_type,
+                                            mean=0. if spos_pre else [0.485 * 255, 0.456 * 255, 0.406 * 255],
+                                            std=1. if spos_pre else [0.229 * 255, 0.224 * 255, 0.225 * 255])
+        self.coin = ops.CoinFlip(probability=0.5)
+
+    def define_graph(self):
+        rng = self.coin()
+        self.jpegs, self.labels = self.input(name="Reader")
+        images = self.decode(self.jpegs)
+        images = self.res(images)
+        images = self.twist(images, saturation=self.jitter_rng(),
+                            contrast=self.jitter_rng(), brightness=self.jitter_rng())
+        output = self.cmnp(images, mirror=rng)
+        return [output, self.labels]
+
+
+class HybridValPipe(Pipeline):
+    def __init__(self, batch_size, num_threads, device_id, data_dir, crop, size, seed=12, local_rank=0, world_size=1,
+                 spos_pre=False, shuffle=False):
+        super(HybridValPipe, self).__init__(batch_size, num_threads, device_id, seed=seed + device_id)
+        color_space_type = types.BGR if spos_pre else types.RGB
+        self.input = ops.FileReader(file_root=data_dir, shard_id=local_rank, num_shards=world_size,
+                                    random_shuffle=shuffle)
+        self.decode = ops.ImageDecoder(device="mixed", output_type=color_space_type)
+        self.res = ops.Resize(device="gpu", resize_shorter=size,
+                              interp_type=types.INTERP_LINEAR if spos_pre else types.INTERP_TRIANGULAR)
+        self.cmnp = ops.CropMirrorNormalize(device="gpu",
+                                            output_dtype=types.FLOAT,
+                                            output_layout=types.NCHW,
+                                            crop=(crop, crop),
+                                            image_type=color_space_type,
+                                            mean=0. if spos_pre else [0.485 * 255, 0.456 * 255, 0.406 * 255],
+                                            std=1. if spos_pre else [0.229 * 255, 0.224 * 255, 0.225 * 255])
+
+    def define_graph(self):
+        self.jpegs, self.labels = self.input(name="Reader")
+        images = self.decode(self.jpegs)
+        images = self.res(images)
+        output = self.cmnp(images)
+        return [output, self.labels]
+
+
+class ClassificationWrapper:
+    def __init__(self, loader, size):
+        self.loader = loader
+        self.size = size
+
+    def __iter__(self):
+        return self
+
+    def __next__(self):
+        data = next(self.loader)
+        return data[0]["data"], data[0]["label"].view(-1).long().cuda(non_blocking=True)
+
+    def __len__(self):
+        return self.size
+
+
+def get_imagenet_iter_dali(split, image_dir, batch_size, num_threads, crop=224, val_size=256,
+                           spos_preprocessing=False, seed=12, shuffle=False, device_id=None):
+    world_size, local_rank = 1, 0
+    if device_id is None:
+        device_id = torch.cuda.device_count() - 1  # use last gpu
+    if split == "train":
+        pipeline = HybridTrainPipe(batch_size=batch_size, num_threads=num_threads, device_id=device_id,
+                                   data_dir=os.path.join(image_dir, "train"), seed=seed,
+                                   crop=crop, world_size=world_size, local_rank=local_rank,
+                                   spos_pre=spos_preprocessing)
+    elif split == "val":
+        pipeline = HybridValPipe(batch_size=batch_size, num_threads=num_threads, device_id=device_id,
+                                 data_dir=os.path.join(image_dir, "val"), seed=seed,
+                                 crop=crop, size=val_size, world_size=world_size, local_rank=local_rank,
+                                 spos_pre=spos_preprocessing, shuffle=shuffle)
+    else:
+        raise AssertionError
+    pipeline.build()
+    num_samples = pipeline.epoch_size("Reader")
+    return ClassificationWrapper(
+        DALIClassificationIterator(pipeline, size=num_samples, fill_last_batch=split == "train",
+                                   auto_reset=True), (num_samples + batch_size - 1) // batch_size)
diff --git a/examples/nas/spos/network.py b/examples/nas/spos/network.py
new file mode 100644
index 0000000000..ba45095775
--- /dev/null
+++ b/examples/nas/spos/network.py
@@ -0,0 +1,156 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+import os
+import pickle
+import re
+
+import torch
+import torch.nn as nn
+from nni.nas.pytorch import mutables
+
+from blocks import ShuffleNetBlock, ShuffleXceptionBlock
+
+
+class ShuffleNetV2OneShot(nn.Module):
+    block_keys = [
+        'shufflenet_3x3',
+        'shufflenet_5x5',
+        'shufflenet_7x7',
+        'xception_3x3',
+    ]
+
+    def __init__(self, input_size=224, first_conv_channels=16, last_conv_channels=1024, n_classes=1000,
+                 op_flops_path="./data/op_flops_dict.pkl"):
+        super().__init__()
+
+        assert input_size % 32 == 0
+        with open(os.path.join(os.path.dirname(__file__), op_flops_path), "rb") as fp:
+            self._op_flops_dict = pickle.load(fp)
+
+        self.stage_blocks = [4, 4, 8, 4]
+        self.stage_channels = [64, 160, 320, 640]
+        self._parsed_flops = dict()
+        self._input_size = input_size
+        self._feature_map_size = input_size
+        self._first_conv_channels = first_conv_channels
+        self._last_conv_channels = last_conv_channels
+        self._n_classes = n_classes
+
+        # building first layer
+        self.first_conv = nn.Sequential(
+            nn.Conv2d(3, first_conv_channels, 3, 2, 1, bias=False),
+            nn.BatchNorm2d(first_conv_channels, affine=False),
+            nn.ReLU(inplace=True),
+        )
+        self._feature_map_size //= 2
+
+        p_channels = first_conv_channels
+        features = []
+        for num_blocks, channels in zip(self.stage_blocks, self.stage_channels):
+            features.extend(self._make_blocks(num_blocks, p_channels, channels))
+            p_channels = channels
+        self.features = nn.Sequential(*features)
+
+        self.conv_last = nn.Sequential(
+            nn.Conv2d(p_channels, last_conv_channels, 1, 1, 0, bias=False),
+            nn.BatchNorm2d(last_conv_channels, affine=False),
+            nn.ReLU(inplace=True),
+        )
+        self.globalpool = nn.AvgPool2d(self._feature_map_size)
+        self.dropout = nn.Dropout(0.1)
+        self.classifier = nn.Sequential(
+            nn.Linear(last_conv_channels, n_classes, bias=False),
+        )
+
+        self._initialize_weights()
+
+    def _make_blocks(self, blocks, in_channels, channels):
+        result = []
+        for i in range(blocks):
+            stride = 2 if i == 0 else 1
+            inp = in_channels if i == 0 else channels
+            oup = channels
+
+            base_mid_channels = channels // 2
+            mid_channels = int(base_mid_channels)  # prepare for scale
+            choice_block = mutables.LayerChoice([
+                ShuffleNetBlock(inp, oup, mid_channels=mid_channels, ksize=3, stride=stride),
+                ShuffleNetBlock(inp, oup, mid_channels=mid_channels, ksize=5, stride=stride),
+                ShuffleNetBlock(inp, oup, mid_channels=mid_channels, ksize=7, stride=stride),
+                ShuffleXceptionBlock(inp, oup, mid_channels=mid_channels, stride=stride)
+            ])
+            result.append(choice_block)
+
+            # find the corresponding flops
+            flop_key = (inp, oup, mid_channels, self._feature_map_size, self._feature_map_size, stride)
+            self._parsed_flops[choice_block.key] = [
+                self._op_flops_dict["{}_stride_{}".format(k, stride)][flop_key] for k in self.block_keys
+            ]
+            if stride == 2:
+                self._feature_map_size //= 2
+        return result
+
+    def forward(self, x):
+        bs = x.size(0)
+        x = self.first_conv(x)
+        x = self.features(x)
+        x = self.conv_last(x)
+        x = self.globalpool(x)
+
+        x = self.dropout(x)
+        x = x.contiguous().view(bs, -1)
+        x = self.classifier(x)
+        return x
+
+    def get_candidate_flops(self, candidate):
+        conv1_flops = self._op_flops_dict["conv1"][(3, self._first_conv_channels,
+                                                    self._input_size, self._input_size, 2)]
+        # Should use `last_conv_channels` here, but megvii insists that it's `n_classes`. Keeping it.
+        # https://github.com/megvii-model/SinglePathOneShot/blob/36eed6cf083497ffa9cfe7b8da25bb0b6ba5a452/src/Supernet/flops.py#L313
+        rest_flops = self._op_flops_dict["rest_operation"][(self.stage_channels[-1], self._n_classes,
+                                                            self._feature_map_size, self._feature_map_size, 1)]
+        total_flops = conv1_flops + rest_flops
+        for k, m in candidate.items():
+            parsed_flops_dict = self._parsed_flops[k]
+            if isinstance(m, dict):  # to be compatible with classical nas format
+                total_flops += parsed_flops_dict[m["_idx"]]
+            else:
+                total_flops += parsed_flops_dict[torch.max(m, 0)[1]]
+        return total_flops
+
+    def _initialize_weights(self):
+        for name, m in self.named_modules():
+            if isinstance(m, nn.Conv2d):
+                if 'first' in name:
+                    nn.init.normal_(m.weight, 0, 0.01)
+                else:
+                    nn.init.normal_(m.weight, 0, 1.0 / m.weight.shape[1])
+                if m.bias is not None:
+                    nn.init.constant_(m.bias, 0)
+            elif isinstance(m, nn.BatchNorm2d):
+                if m.weight is not None:
+                    nn.init.constant_(m.weight, 1)
+                if m.bias is not None:
+                    nn.init.constant_(m.bias, 0.0001)
+                nn.init.constant_(m.running_mean, 0)
+            elif isinstance(m, nn.BatchNorm1d):
+                nn.init.constant_(m.weight, 1)
+                if m.bias is not None:
+                    nn.init.constant_(m.bias, 0.0001)
+                nn.init.constant_(m.running_mean, 0)
+            elif isinstance(m, nn.Linear):
+                nn.init.normal_(m.weight, 0, 0.01)
+                if m.bias is not None:
+                    nn.init.constant_(m.bias, 0)
+
+
+def load_and_parse_state_dict(filepath="./data/checkpoint-150000.pth.tar"):
+    checkpoint = torch.load(filepath, map_location=torch.device("cpu"))
+    result = dict()
+    for k, v in checkpoint["state_dict"].items():
+        if k.startswith("module."):
+            k = k[len("module."):]
+        k = re.sub(r"^(features.\d+).(\d+)", "\\1.choices.\\2", k)
+        result[k] = v
+    return result
diff --git a/examples/nas/spos/scratch.py b/examples/nas/spos/scratch.py
new file mode 100644
index 0000000000..3a944a7909
--- /dev/null
+++ b/examples/nas/spos/scratch.py
@@ -0,0 +1,128 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+import argparse
+import logging
+import random
+
+import numpy as np
+import torch
+import torch.nn as nn
+from dataloader import get_imagenet_iter_dali
+from nni.nas.pytorch.fixed import apply_fixed_architecture
+from nni.nas.pytorch.utils import AverageMeterGroup
+from torch.utils.tensorboard import SummaryWriter
+
+from network import ShuffleNetV2OneShot
+from utils import CrossEntropyLabelSmooth, accuracy
+
+logger = logging.getLogger("nni.spos.scratch")
+
+
+def train(epoch, model, criterion, optimizer, loader, writer, args):
+    model.train()
+    meters = AverageMeterGroup()
+    cur_lr = optimizer.param_groups[0]["lr"]
+
+    for step, (x, y) in enumerate(loader):
+        cur_step = len(loader) * epoch + step
+        optimizer.zero_grad()
+        logits = model(x)
+        loss = criterion(logits, y)
+        loss.backward()
+        optimizer.step()
+
+        metrics = accuracy(logits, y)
+        metrics["loss"] = loss.item()
+        meters.update(metrics)
+
+        writer.add_scalar("lr", cur_lr, global_step=cur_step)
+        writer.add_scalar("loss/train", loss.item(), global_step=cur_step)
+        writer.add_scalar("acc1/train", metrics["acc1"], global_step=cur_step)
+        writer.add_scalar("acc5/train", metrics["acc5"], global_step=cur_step)
+
+        if step % args.log_frequency == 0 or step + 1 == len(loader):
+            logger.info("Epoch [%d/%d] Step [%d/%d]  %s", epoch + 1,
+                        args.epochs, step + 1, len(loader), meters)
+
+    logger.info("Epoch %d training summary: %s", epoch + 1, meters)
+
+
+def validate(epoch, model, criterion, loader, writer, args):
+    model.eval()
+    meters = AverageMeterGroup()
+    with torch.no_grad():
+        for step, (x, y) in enumerate(loader):
+            logits = model(x)
+            loss = criterion(logits, y)
+            metrics = accuracy(logits, y)
+            metrics["loss"] = loss.item()
+            meters.update(metrics)
+
+            if step % args.log_frequency == 0 or step + 1 == len(loader):
+                logger.info("Epoch [%d/%d] Validation Step [%d/%d]  %s", epoch + 1,
+                            args.epochs, step + 1, len(loader), meters)
+
+    writer.add_scalar("loss/test", meters.loss.avg, global_step=epoch)
+    writer.add_scalar("acc1/test", meters.acc1.avg, global_step=epoch)
+    writer.add_scalar("acc5/test", meters.acc5.avg, global_step=epoch)
+
+    logger.info("Epoch %d validation: top1 = %f, top5 = %f", epoch + 1, meters.acc1.avg, meters.acc5.avg)
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser("SPOS Training From Scratch")
+    parser.add_argument("--imagenet-dir", type=str, default="./data/imagenet")
+    parser.add_argument("--tb-dir", type=str, default="runs")
+    parser.add_argument("--architecture", type=str, default="architecture_final.json")
+    parser.add_argument("--workers", type=int, default=12)
+    parser.add_argument("--batch-size", type=int, default=1024)
+    parser.add_argument("--epochs", type=int, default=240)
+    parser.add_argument("--learning-rate", type=float, default=0.5)
+    parser.add_argument("--momentum", type=float, default=0.9)
+    parser.add_argument("--weight-decay", type=float, default=4E-5)
+    parser.add_argument("--label-smooth", type=float, default=0.1)
+    parser.add_argument("--log-frequency", type=int, default=10)
+    parser.add_argument("--lr-decay", type=str, default="linear")
+    parser.add_argument("--seed", type=int, default=42)
+    parser.add_argument("--spos-preprocessing", default=False, action="store_true")
+    parser.add_argument("--label-smoothing", type=float, default=0.1)
+
+    args = parser.parse_args()
+
+    torch.manual_seed(args.seed)
+    torch.cuda.manual_seed_all(args.seed)
+    np.random.seed(args.seed)
+    random.seed(args.seed)
+    torch.backends.cudnn.deterministic = True
+
+    model = ShuffleNetV2OneShot()
+    model.cuda()
+    apply_fixed_architecture(model, args.architecture)
+    if torch.cuda.device_count() > 1:  # exclude last gpu, saving for data preprocessing on gpu
+        model = nn.DataParallel(model, device_ids=list(range(0, torch.cuda.device_count() - 1)))
+    criterion = CrossEntropyLabelSmooth(1000, args.label_smoothing)
+    optimizer = torch.optim.SGD(model.parameters(), lr=args.learning_rate,
+                                momentum=args.momentum, weight_decay=args.weight_decay)
+    if args.lr_decay == "linear":
+        scheduler = torch.optim.lr_scheduler.LambdaLR(optimizer,
+                                                      lambda step: (1.0 - step / args.epochs)
+                                                      if step <= args.epochs else 0,
+                                                      last_epoch=-1)
+    elif args.lr_decay == "cosine":
+        scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, args.epochs, 1E-3)
+    else:
+        raise ValueError("'%s' not supported." % args.lr_decay)
+    writer = SummaryWriter(log_dir=args.tb_dir)
+
+    train_loader = get_imagenet_iter_dali("train", args.imagenet_dir, args.batch_size, args.workers,
+                                          spos_preprocessing=args.spos_preprocessing)
+    val_loader = get_imagenet_iter_dali("val", args.imagenet_dir, args.batch_size, args.workers,
+                                        spos_preprocessing=args.spos_preprocessing)
+
+    for epoch in range(args.epochs):
+        train(epoch, model, criterion, optimizer, train_loader, writer, args)
+        validate(epoch, model, criterion, val_loader, writer, args)
+        scheduler.step()
+
+    writer.close()
diff --git a/examples/nas/spos/supernet.py b/examples/nas/spos/supernet.py
new file mode 100644
index 0000000000..3ab717868c
--- /dev/null
+++ b/examples/nas/spos/supernet.py
@@ -0,0 +1,74 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+import argparse
+import logging
+import random
+
+import numpy as np
+import torch
+import torch.nn as nn
+from nni.nas.pytorch.callbacks import LRSchedulerCallback
+from nni.nas.pytorch.callbacks import ModelCheckpoint
+from nni.nas.pytorch.spos import SPOSSupernetTrainingMutator, SPOSSupernetTrainer
+
+from dataloader import get_imagenet_iter_dali
+from network import ShuffleNetV2OneShot, load_and_parse_state_dict
+from utils import CrossEntropyLabelSmooth, accuracy
+
+logger = logging.getLogger("nni.spos.supernet")
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser("SPOS Supernet Training")
+    parser.add_argument("--imagenet-dir", type=str, default="./data/imagenet")
+    parser.add_argument("--load-checkpoint", action="store_true", default=False)
+    parser.add_argument("--spos-preprocessing", action="store_true", default=False,
+                        help="When true, image values will range from 0 to 255 and use BGR "
+                             "(as in original repo).")
+    parser.add_argument("--workers", type=int, default=4)
+    parser.add_argument("--batch-size", type=int, default=768)
+    parser.add_argument("--epochs", type=int, default=120)
+    parser.add_argument("--learning-rate", type=float, default=0.5)
+    parser.add_argument("--momentum", type=float, default=0.9)
+    parser.add_argument("--weight-decay", type=float, default=4E-5)
+    parser.add_argument("--label-smooth", type=float, default=0.1)
+    parser.add_argument("--log-frequency", type=int, default=10)
+    parser.add_argument("--seed", type=int, default=42)
+    parser.add_argument("--label-smoothing", type=float, default=0.1)
+
+    args = parser.parse_args()
+
+    torch.manual_seed(args.seed)
+    torch.cuda.manual_seed_all(args.seed)
+    np.random.seed(args.seed)
+    random.seed(args.seed)
+    torch.backends.cudnn.deterministic = True
+
+    model = ShuffleNetV2OneShot()
+    if args.load_checkpoint:
+        if not args.spos_preprocessing:
+            logger.warning("You might want to use SPOS preprocessing if you are loading their checkpoints.")
+        model.load_state_dict(load_and_parse_state_dict())
+    model.cuda()
+    if torch.cuda.device_count() > 1:  # exclude last gpu, saving for data preprocessing on gpu
+        model = nn.DataParallel(model, device_ids=list(range(0, torch.cuda.device_count() - 1)))
+    mutator = SPOSSupernetTrainingMutator(model, flops_func=model.module.get_candidate_flops,
+                                          flops_lb=290E6, flops_ub=360E6)
+    criterion = CrossEntropyLabelSmooth(1000, args.label_smoothing)
+    optimizer = torch.optim.SGD(model.parameters(), lr=args.learning_rate,
+                                momentum=args.momentum, weight_decay=args.weight_decay)
+    scheduler = torch.optim.lr_scheduler.LambdaLR(optimizer,
+                                                  lambda step: (1.0 - step / args.epochs)
+                                                  if step <= args.epochs else 0,
+                                                  last_epoch=-1)
+    train_loader = get_imagenet_iter_dali("train", args.imagenet_dir, args.batch_size, args.workers,
+                                          spos_preprocessing=args.spos_preprocessing)
+    valid_loader = get_imagenet_iter_dali("val", args.imagenet_dir, args.batch_size, args.workers,
+                                          spos_preprocessing=args.spos_preprocessing)
+    trainer = SPOSSupernetTrainer(model, criterion, accuracy, optimizer,
+                                  args.epochs, train_loader, valid_loader,
+                                  mutator=mutator, batch_size=args.batch_size,
+                                  log_frequency=args.log_frequency, workers=args.workers,
+                                  callbacks=[LRSchedulerCallback(scheduler),
+                                             ModelCheckpoint("./checkpoints")])
+    trainer.train()
diff --git a/examples/nas/spos/tester.py b/examples/nas/spos/tester.py
new file mode 100644
index 0000000000..b31b8f2fab
--- /dev/null
+++ b/examples/nas/spos/tester.py
@@ -0,0 +1,115 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+import argparse
+import logging
+import random
+import time
+from itertools import cycle
+
+import nni
+import numpy as np
+import torch
+import torch.nn as nn
+from nni.nas.pytorch.classic_nas import get_and_apply_next_architecture
+from nni.nas.pytorch.utils import AverageMeterGroup
+
+from dataloader import get_imagenet_iter_dali
+from network import ShuffleNetV2OneShot, load_and_parse_state_dict
+from utils import CrossEntropyLabelSmooth, accuracy
+
+logger = logging.getLogger("nni.spos.tester")
+
+
+def retrain_bn(model, criterion, max_iters, log_freq, loader):
+    with torch.no_grad():
+        logger.info("Clear BN statistics...")
+        for m in model.modules():
+            if isinstance(m, nn.BatchNorm2d):
+                m.running_mean = torch.zeros_like(m.running_mean)
+                m.running_var = torch.ones_like(m.running_var)
+
+        logger.info("Train BN with training set (BN sanitize)...")
+        model.train()
+        meters = AverageMeterGroup()
+        for step in range(max_iters):
+            inputs, targets = next(loader)
+            logits = model(inputs)
+            loss = criterion(logits, targets)
+            metrics = accuracy(logits, targets)
+            metrics["loss"] = loss.item()
+            meters.update(metrics)
+            if step % log_freq == 0 or step + 1 == max_iters:
+                logger.info("Train Step [%d/%d] %s", step + 1, max_iters, meters)
+
+
+def test_acc(model, criterion, log_freq, loader):
+    logger.info("Start testing...")
+    model.eval()
+    meters = AverageMeterGroup()
+    start_time = time.time()
+    with torch.no_grad():
+        for step, (inputs, targets) in enumerate(loader):
+            logits = model(inputs)
+            loss = criterion(logits, targets)
+            metrics = accuracy(logits, targets)
+            metrics["loss"] = loss.item()
+            meters.update(metrics)
+            if step % log_freq == 0 or step + 1 == len(loader):
+                logger.info("Valid Step [%d/%d] time %.3fs acc1 %.4f acc5 %.4f loss %.4f",
+                            step + 1, len(loader), time.time() - start_time,
+                            meters.acc1.avg, meters.acc5.avg, meters.loss.avg)
+    return meters.acc1.avg
+
+
+def evaluate_acc(model, criterion, args, loader_train, loader_test):
+    acc_before = test_acc(model, criterion, args.log_frequency, loader_test)
+    nni.report_intermediate_result(acc_before)
+
+    retrain_bn(model, criterion, args.train_iters, args.log_frequency, loader_train)
+    acc = test_acc(model, criterion, args.log_frequency, loader_test)
+    assert isinstance(acc, float)
+    nni.report_intermediate_result(acc)
+    nni.report_final_result(acc)
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser("SPOS Candidate Tester")
+    parser.add_argument("--imagenet-dir", type=str, default="./data/imagenet")
+    parser.add_argument("--checkpoint", type=str, default="./data/checkpoint-150000.pth.tar")
+    parser.add_argument("--spos-preprocessing", action="store_true", default=False,
+                        help="When true, image values will range from 0 to 255 and use BGR "
+                             "(as in original repo).")
+    parser.add_argument("--seed", type=int, default=42)
+    parser.add_argument("--workers", type=int, default=6)
+    parser.add_argument("--train-batch-size", type=int, default=128)
+    parser.add_argument("--train-iters", type=int, default=200)
+    parser.add_argument("--test-batch-size", type=int, default=512)
+    parser.add_argument("--log-frequency", type=int, default=10)
+
+    args = parser.parse_args()
+
+    # use a fixed set of image will improve the performance
+    torch.manual_seed(args.seed)
+    torch.cuda.manual_seed_all(args.seed)
+    np.random.seed(args.seed)
+    random.seed(args.seed)
+    torch.backends.cudnn.deterministic = True
+
+    assert torch.cuda.is_available()
+
+    model = ShuffleNetV2OneShot()
+    criterion = CrossEntropyLabelSmooth(1000, 0.1)
+    get_and_apply_next_architecture(model)
+    model.load_state_dict(load_and_parse_state_dict(filepath=args.checkpoint))
+    model.cuda()
+
+    train_loader = get_imagenet_iter_dali("train", args.imagenet_dir, args.train_batch_size, args.workers,
+                                          spos_preprocessing=args.spos_preprocessing,
+                                          seed=args.seed, device_id=0)
+    val_loader = get_imagenet_iter_dali("val", args.imagenet_dir, args.test_batch_size, args.workers,
+                                        spos_preprocessing=args.spos_preprocessing, shuffle=True,
+                                        seed=args.seed, device_id=0)
+    train_loader = cycle(train_loader)
+
+    evaluate_acc(model, criterion, args, train_loader, val_loader)
diff --git a/examples/nas/spos/tuner.py b/examples/nas/spos/tuner.py
new file mode 100644
index 0000000000..fb8b9f2aa4
--- /dev/null
+++ b/examples/nas/spos/tuner.py
@@ -0,0 +1,25 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+from nni.nas.pytorch.spos import SPOSEvolution
+
+from network import ShuffleNetV2OneShot
+
+
+class EvolutionWithFlops(SPOSEvolution):
+    """
+    This tuner extends the function of evolution tuner, by limiting the flops generated by tuner.
+    Needs a function to examine the flops.
+    """
+
+    def __init__(self, flops_limit=330E6, **kwargs):
+        super().__init__(**kwargs)
+        self.model = ShuffleNetV2OneShot()
+        self.flops_limit = flops_limit
+
+    def _is_legal(self, cand):
+        if not super()._is_legal(cand):
+            return False
+        if self.model.get_candidate_flops(cand) > self.flops_limit:
+            return False
+        return True
diff --git a/examples/nas/spos/utils.py b/examples/nas/spos/utils.py
new file mode 100644
index 0000000000..70ad98b55f
--- /dev/null
+++ b/examples/nas/spos/utils.py
@@ -0,0 +1,41 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+import torch
+import torch.nn as nn
+
+
+class CrossEntropyLabelSmooth(nn.Module):
+
+    def __init__(self, num_classes, epsilon):
+        super(CrossEntropyLabelSmooth, self).__init__()
+        self.num_classes = num_classes
+        self.epsilon = epsilon
+        self.logsoftmax = nn.LogSoftmax(dim=1)
+
+    def forward(self, inputs, targets):
+        log_probs = self.logsoftmax(inputs)
+        targets = torch.zeros_like(log_probs).scatter_(1, targets.unsqueeze(1), 1)
+        targets = (1 - self.epsilon) * targets + self.epsilon / self.num_classes
+        loss = (-targets * log_probs).mean(0).sum()
+        return loss
+
+
+def accuracy(output, target, topk=(1, 5)):
+    """ Computes the precision@k for the specified values of k """
+    maxk = max(topk)
+    batch_size = target.size(0)
+
+    _, pred = output.topk(maxk, 1, True, True)
+    pred = pred.t()
+    # one-hot case
+    if target.ndimension() > 1:
+        target = target.max(1)[1]
+
+    correct = pred.eq(target.view(1, -1).expand_as(pred))
+
+    res = dict()
+    for k in topk:
+        correct_k = correct[:k].view(-1).float().sum(0)
+        res["acc{}".format(k)] = correct_k.mul_(1.0 / batch_size).item()
+    return res
diff --git a/src/sdk/pynni/nni/compression/torch/builtin_pruners.py b/src/sdk/pynni/nni/compression/torch/builtin_pruners.py
index b31a8dd77f..8e19ea394d 100644
--- a/src/sdk/pynni/nni/compression/torch/builtin_pruners.py
+++ b/src/sdk/pynni/nni/compression/torch/builtin_pruners.py
@@ -5,7 +5,8 @@
 import torch
 from .compressor import Pruner
 
-__all__ = ['LevelPruner', 'AGP_Pruner', 'SlimPruner', 'L1FilterPruner', 'L2FilterPruner', 'FPGMPruner']
+__all__ = ['LevelPruner', 'AGP_Pruner', 'SlimPruner', 'L1FilterPruner', 'L2FilterPruner', 'FPGMPruner',
+           'ActivationAPoZRankFilterPruner', 'ActivationMeanRankFilterPruner']
 
 logger = logging.getLogger('torch pruner')
 
@@ -26,7 +27,7 @@ def __init__(self, model, config_list):
         """
 
         super().__init__(model, config_list)
-        self.if_init_list = {}
+        self.mask_calculated_ops = set()
 
     def calc_mask(self, layer, config):
         """
@@ -39,22 +40,24 @@ def calc_mask(self, layer, config):
             layer's pruning config
         Returns
         -------
-        torch.Tensor
-            mask of the layer's weight
+        dict
+            dictionary for storing masks
         """
 
         weight = layer.module.weight.data
         op_name = layer.name
-        if self.if_init_list.get(op_name, True):
+        if op_name not in self.mask_calculated_ops:
             w_abs = weight.abs()
             k = int(weight.numel() * config['sparsity'])
             if k == 0:
                 return torch.ones(weight.shape).type_as(weight)
             threshold = torch.topk(w_abs.view(-1), k, largest=False)[0].max()
-            mask = torch.gt(w_abs, threshold).type_as(weight)
+            mask_weight = torch.gt(w_abs, threshold).type_as(weight)
+            mask = {'weight': mask_weight}
             self.mask_dict.update({op_name: mask})
-            self.if_init_list.update({op_name: False})
+            self.mask_calculated_ops.add(op_name)
         else:
+            assert op_name in self.mask_dict, "op_name not in the mask_dict"
             mask = self.mask_dict[op_name]
         return mask
 
@@ -94,8 +97,8 @@ def calc_mask(self, layer, config):
             layer's pruning config
         Returns
         -------
-        torch.Tensor
-            mask of the layer's weight
+        dict
+            dictionary for storing masks
         """
 
         weight = layer.module.weight.data
@@ -104,7 +107,7 @@ def calc_mask(self, layer, config):
         freq = config.get('frequency', 1)
         if self.now_epoch >= start_epoch and self.if_init_list.get(op_name, True) \
                 and (self.now_epoch - start_epoch) % freq == 0:
-            mask = self.mask_dict.get(op_name, torch.ones(weight.shape).type_as(weight))
+            mask = self.mask_dict.get(op_name, {'weight': torch.ones(weight.shape).type_as(weight)})
             target_sparsity = self.compute_target_sparsity(config)
             k = int(weight.numel() * target_sparsity)
             if k == 0 or target_sparsity >= 1 or target_sparsity <= 0:
@@ -112,11 +115,11 @@ def calc_mask(self, layer, config):
             # if we want to generate new mask, we should update weigth first
             w_abs = weight.abs() * mask
             threshold = torch.topk(w_abs.view(-1), k, largest=False)[0].max()
-            new_mask = torch.gt(w_abs, threshold).type_as(weight)
+            new_mask = {'weight': torch.gt(w_abs, threshold).type_as(weight)}
             self.mask_dict.update({op_name: new_mask})
             self.if_init_list.update({op_name: False})
         else:
-            new_mask = self.mask_dict.get(op_name, torch.ones(weight.shape).type_as(weight))
+            new_mask = self.mask_dict.get(op_name, {'weight': torch.ones(weight.shape).type_as(weight)})
         return new_mask
 
     def compute_target_sparsity(self, config):
@@ -208,8 +211,8 @@ def calc_mask(self, layer, config):
             layer's pruning config
         Returns
         -------
-        torch.Tensor
-            mask of the layer's weight
+        dict
+            dictionary for storing masks
         """
 
         weight = layer.module.weight.data
@@ -219,10 +222,17 @@ def calc_mask(self, layer, config):
         if op_name in self.mask_calculated_ops:
             assert op_name in self.mask_dict
             return self.mask_dict.get(op_name)
-        mask = torch.ones(weight.size()).type_as(weight)
+        base_mask = torch.ones(weight.size()).type_as(weight).detach()
+        mask = {'weight': base_mask.detach(), 'bias': base_mask.clone().detach()}
         try:
+            filters = weight.size(0)
+            num_prune = int(filters * config.get('sparsity'))
+            if filters < 2 or num_prune < 1:
+                return mask
             w_abs = weight.abs()
-            mask = torch.gt(w_abs, self.global_threshold).type_as(weight)
+            mask_weight = torch.gt(w_abs, self.global_threshold).type_as(weight)
+            mask_bias = mask_weight.clone()
+            mask = {'weight': mask_weight.detach(), 'bias': mask_bias.detach()}
         finally:
             self.mask_dict.update({layer.name: mask})
             self.mask_calculated_ops.add(layer.name)
@@ -230,7 +240,7 @@ def calc_mask(self, layer, config):
         return mask
 
 
-class RankFilterPruner(Pruner):
+class WeightRankFilterPruner(Pruner):
     """
     A structured pruning base class that prunes the filters with the smallest
     importance criterion in convolution layers to achieve a preset level of network sparsity.
@@ -248,10 +258,10 @@ def __init__(self, model, config_list):
         """
 
         super().__init__(model, config_list)
-        self.mask_calculated_ops = set()
+        self.mask_calculated_ops = set()  # operations whose mask has been calculated
 
     def _get_mask(self, base_mask, weight, num_prune):
-        return torch.ones(weight.size()).type_as(weight)
+        return {'weight': None, 'bias': None}
 
     def calc_mask(self, layer, config):
         """
@@ -265,20 +275,25 @@ def calc_mask(self, layer, config):
             layer's pruning config
         Returns
         -------
-        torch.Tensor
-            mask of the layer's weight
+        dict
+            dictionary for storing masks
         """
 
         weight = layer.module.weight.data
         op_name = layer.name
         op_type = layer.type
-        assert 0 <= config.get('sparsity') < 1
-        assert op_type in ['Conv1d', 'Conv2d']
+        assert 0 <= config.get('sparsity') < 1, "sparsity must in the range [0, 1)"
+        assert op_type in ['Conv1d', 'Conv2d'], "only support Conv1d and Conv2d"
         assert op_type in config.get('op_types')
         if op_name in self.mask_calculated_ops:
             assert op_name in self.mask_dict
             return self.mask_dict.get(op_name)
-        mask = torch.ones(weight.size()).type_as(weight)
+        mask_weight = torch.ones(weight.size()).type_as(weight).detach()
+        if hasattr(layer.module, 'bias') and layer.module.bias is not None:
+            mask_bias = torch.ones(layer.module.bias.size()).type_as(layer.module.bias).detach()
+        else:
+            mask_bias = None
+        mask = {'weight': mask_weight, 'bias': mask_bias}
         try:
             filters = weight.size(0)
             num_prune = int(filters * config.get('sparsity'))
@@ -288,10 +303,10 @@ def calc_mask(self, layer, config):
         finally:
             self.mask_dict.update({op_name: mask})
             self.mask_calculated_ops.add(op_name)
-        return mask.detach()
+        return mask
 
 
-class L1FilterPruner(RankFilterPruner):
+class L1FilterPruner(WeightRankFilterPruner):
     """
     A structured pruning algorithm that prunes the filters of smallest magnitude
     weights sum in the convolution layers to achieve a preset level of network sparsity.
@@ -319,31 +334,33 @@ def _get_mask(self, base_mask, weight, num_prune):
         Filters with the smallest sum of its absolute kernel weights are masked.
         Parameters
         ----------
-        base_mask : torch.Tensor
-            The basic mask with the same shape of weight, all item in the basic mask is 1.
+        base_mask : dict
+            The basic mask with the same shape of weight or bias, all item in the basic mask is 1.
         weight : torch.Tensor
             Layer's weight
         num_prune : int
             Num of filters to prune
+
         Returns
         -------
-        torch.Tensor
-            Mask of the layer's weight
+        dict
+            dictionary for storing masks
         """
 
         filters = weight.shape[0]
         w_abs = weight.abs()
         w_abs_structured = w_abs.view(filters, -1).sum(dim=1)
         threshold = torch.topk(w_abs_structured.view(-1), num_prune, largest=False)[0].max()
-        mask = torch.gt(w_abs_structured, threshold)[:, None, None, None].expand_as(weight).type_as(weight)
+        mask_weight = torch.gt(w_abs_structured, threshold)[:, None, None, None].expand_as(weight).type_as(weight)
+        mask_bias = torch.gt(w_abs_structured, threshold).type_as(weight)
 
-        return mask
+        return {'weight': mask_weight.detach(), 'bias': mask_bias.detach()}
 
 
-class L2FilterPruner(RankFilterPruner):
+class L2FilterPruner(WeightRankFilterPruner):
     """
     A structured pruning algorithm that prunes the filters with the
-    smallest L2 norm of the absolute kernel weights are masked.
+    smallest L2 norm of the weights.
     """
 
     def __init__(self, model, config_list):
@@ -365,27 +382,28 @@ def _get_mask(self, base_mask, weight, num_prune):
         Filters with the smallest L2 norm of the absolute kernel weights are masked.
         Parameters
         ----------
-        base_mask : torch.Tensor
-            The basic mask with the same shape of weight, all item in the basic mask is 1.
+        base_mask : dict
+            The basic mask with the same shape of weight or bias, all item in the basic mask is 1.
         weight : torch.Tensor
             Layer's weight
         num_prune : int
             Num of filters to prune
         Returns
         -------
-        torch.Tensor
-            Mask of the layer's weight
+        dict
+            dictionary for storing masks
         """
         filters = weight.shape[0]
         w = weight.view(filters, -1)
         w_l2_norm = torch.sqrt((w ** 2).sum(dim=1))
         threshold = torch.topk(w_l2_norm.view(-1), num_prune, largest=False)[0].max()
-        mask = torch.gt(w_l2_norm, threshold)[:, None, None, None].expand_as(weight).type_as(weight)
+        mask_weight = torch.gt(w_l2_norm, threshold)[:, None, None, None].expand_as(weight).type_as(weight)
+        mask_bias = torch.gt(w_l2_norm, threshold).type_as(weight)
 
-        return mask
+        return {'weight': mask_weight.detach(), 'bias': mask_bias.detach()}
 
 
-class FPGMPruner(RankFilterPruner):
+class FPGMPruner(WeightRankFilterPruner):
     """
     A filter pruner via geometric median.
     "Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration",
@@ -410,20 +428,22 @@ def _get_mask(self, base_mask, weight, num_prune):
         Filters with the smallest sum of its absolute kernel weights are masked.
         Parameters
         ----------
-        base_mask : torch.Tensor
-            The basic mask with the same shape of weight, all item in the basic mask is 1.
+        base_mask : dict
+            The basic mask with the same shape of weight and bias, all item in the basic mask is 1.
         weight : torch.Tensor
             Layer's weight
         num_prune : int
             Num of filters to prune
         Returns
         -------
-        torch.Tensor
-            Mask of the layer's weight
+        dict
+            dictionary for storing masks
         """
         min_gm_idx = self._get_min_gm_kernel_idx(weight, num_prune)
         for idx in min_gm_idx:
-            base_mask[idx] = 0.
+            base_mask['weight'][idx] = 0.
+            if base_mask['bias'] is not None:
+                base_mask['bias'][idx] = 0.
         return base_mask
 
     def _get_min_gm_kernel_idx(self, weight, n):
@@ -471,3 +491,251 @@ def _get_distance_sum(self, weight, in_idx, out_idx):
 
     def update_epoch(self, epoch):
         self.mask_calculated_ops = set()
+
+
+class ActivationRankFilterPruner(Pruner):
+    """
+    A structured pruning base class that prunes the filters with the smallest
+    importance criterion in convolution layers to achieve a preset level of network sparsity.
+    Hengyuan Hu, Rui Peng, Yu-Wing Tai and Chi-Keung Tang,
+    "Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures", ICLR 2016.
+    https://arxiv.org/abs/1607.03250
+    Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila and Jan Kautz,
+    "Pruning Convolutional Neural Networks for Resource Efficient Inference", ICLR 2017.
+    https://arxiv.org/abs/1611.06440
+    """
+
+    def __init__(self, model, config_list, activation='relu', statistics_batch_num=1):
+        """
+        Parameters
+        ----------
+        model : torch.nn.module
+            Model to be pruned
+        config_list : list
+            support key for each list item:
+                - sparsity: percentage of convolutional filters to be pruned.
+        activation : str
+            Activation function
+        statistics_batch_num : int
+            Num of batches for activation statistics
+        """
+
+        super().__init__(model, config_list)
+        self.mask_calculated_ops = set()
+        self.statistics_batch_num = statistics_batch_num
+        self.collected_activation = {}
+        self.hooks = {}
+        assert activation in ['relu', 'relu6']
+        if activation == 'relu':
+            self.activation = torch.nn.functional.relu
+        elif activation == 'relu6':
+            self.activation = torch.nn.functional.relu6
+        else:
+            self.activation = None
+
+    def compress(self):
+        """
+        Compress the model, register a hook for collecting activations.
+        """
+        modules_to_compress = self.detect_modules_to_compress()
+        for layer, config in modules_to_compress:
+            self._instrument_layer(layer, config)
+            self.collected_activation[layer.name] = []
+
+            def _hook(module_, input_, output, name=layer.name):
+                if len(self.collected_activation[name]) < self.statistics_batch_num:
+                    self.collected_activation[name].append(self.activation(output.detach().cpu()))
+
+            layer.module.register_forward_hook(_hook)
+        return self.bound_model
+
+    def _get_mask(self, base_mask, activations, num_prune):
+        return {'weight': None, 'bias': None}
+
+    def calc_mask(self, layer, config):
+        """
+        Calculate the mask of given layer.
+        Filters with the smallest importance criterion which is calculated from the activation are masked.
+
+        Parameters
+        ----------
+        layer : LayerInfo
+            the layer to instrument the compression operation
+        config : dict
+            layer's pruning config
+
+        Returns
+        -------
+        dict
+            dictionary for storing masks
+        """
+
+        weight = layer.module.weight.data
+        op_name = layer.name
+        op_type = layer.type
+        assert 0 <= config.get('sparsity') < 1, "sparsity must in the range [0, 1)"
+        assert op_type in ['Conv2d'], "only support Conv2d"
+        assert op_type in config.get('op_types')
+        if op_name in self.mask_calculated_ops:
+            assert op_name in self.mask_dict
+            return self.mask_dict.get(op_name)
+        mask_weight = torch.ones(weight.size()).type_as(weight).detach()
+        if hasattr(layer.module, 'bias') and layer.module.bias is not None:
+            mask_bias = torch.ones(layer.module.bias.size()).type_as(layer.module.bias).detach()
+        else:
+            mask_bias = None
+        mask = {'weight': mask_weight, 'bias': mask_bias}
+        try:
+            filters = weight.size(0)
+            num_prune = int(filters * config.get('sparsity'))
+            if filters < 2 or num_prune < 1 or len(self.collected_activation[layer.name]) < self.statistics_batch_num:
+                return mask
+            mask = self._get_mask(mask, self.collected_activation[layer.name], num_prune)
+        finally:
+            if len(self.collected_activation[layer.name]) == self.statistics_batch_num:
+                self.mask_dict.update({op_name: mask})
+                self.mask_calculated_ops.add(op_name)
+        return mask
+
+
+class ActivationAPoZRankFilterPruner(ActivationRankFilterPruner):
+    """
+    A structured pruning algorithm that prunes the filters with the
+    smallest APoZ(average percentage of zeros) of output activations.
+    Hengyuan Hu, Rui Peng, Yu-Wing Tai and Chi-Keung Tang,
+    "Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures", ICLR 2016.
+    https://arxiv.org/abs/1607.03250
+    """
+
+    def __init__(self, model, config_list, activation='relu', statistics_batch_num=1):
+        """
+        Parameters
+        ----------
+        model : torch.nn.module
+            Model to be pruned
+        config_list : list
+            support key for each list item:
+                - sparsity: percentage of convolutional filters to be pruned.
+        activation : str
+            Activation function
+        statistics_batch_num : int
+            Num of batches for activation statistics
+        """
+        super().__init__(model, config_list, activation, statistics_batch_num)
+
+    def _get_mask(self, base_mask, activations, num_prune):
+        """
+        Calculate the mask of given layer.
+        Filters with the smallest APoZ(average percentage of zeros) of output activations are masked.
+
+        Parameters
+        ----------
+        base_mask : dict
+            The basic mask with the same shape of weight, all item in the basic mask is 1.
+        activations : list
+            Layer's output activations
+        num_prune : int
+            Num of filters to prune
+
+        Returns
+        -------
+        dict
+            dictionary for storing masks
+        """
+        apoz = self._calc_apoz(activations)
+        prune_indices = torch.argsort(apoz, descending=True)[:num_prune]
+        for idx in prune_indices:
+            base_mask['weight'][idx] = 0.
+            if base_mask['bias'] is not None:
+                base_mask['bias'][idx] = 0.
+        return base_mask
+
+    def _calc_apoz(self, activations):
+        """
+        Calculate APoZ(average percentage of zeros) of activations.
+
+        Parameters
+        ----------
+        activations : list
+            Layer's output activations
+
+        Returns
+        -------
+        torch.Tensor
+            Filter's APoZ(average percentage of zeros) of the activations
+        """
+        activations = torch.cat(activations, 0)
+        _eq_zero = torch.eq(activations, torch.zeros_like(activations))
+        _apoz = torch.sum(_eq_zero, dim=(0, 2, 3)) / torch.numel(_eq_zero[:, 0, :, :])
+        return _apoz
+
+
+class ActivationMeanRankFilterPruner(ActivationRankFilterPruner):
+    """
+    A structured pruning algorithm that prunes the filters with the
+    smallest mean value of output activations.
+    Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila and Jan Kautz,
+    "Pruning Convolutional Neural Networks for Resource Efficient Inference", ICLR 2017.
+    https://arxiv.org/abs/1611.06440
+    """
+
+    def __init__(self, model, config_list, activation='relu', statistics_batch_num=1):
+        """
+        Parameters
+        ----------
+        model : torch.nn.module
+            Model to be pruned
+        config_list : list
+            support key for each list item:
+                - sparsity: percentage of convolutional filters to be pruned.
+        activation : str
+            Activation function
+        statistics_batch_num : int
+            Num of batches for activation statistics
+        """
+        super().__init__(model, config_list, activation, statistics_batch_num)
+
+    def _get_mask(self, base_mask, activations, num_prune):
+        """
+        Calculate the mask of given layer.
+        Filters with the smallest APoZ(average percentage of zeros) of output activations are masked.
+
+        Parameters
+        ----------
+        base_mask : dict
+            The basic mask with the same shape of weight, all item in the basic mask is 1.
+        activations : list
+            Layer's output activations
+        num_prune : int
+            Num of filters to prune
+
+        Returns
+        -------
+        dict
+            dictionary for storing masks
+        """
+        mean_activation = self._cal_mean_activation(activations)
+        prune_indices = torch.argsort(mean_activation)[:num_prune]
+        for idx in prune_indices:
+            base_mask['weight'][idx] = 0.
+            if base_mask['bias'] is not None:
+                base_mask['bias'][idx] = 0.
+        return base_mask
+
+    def _cal_mean_activation(self, activations):
+        """
+        Calculate mean value of activations.
+
+        Parameters
+        ----------
+        activations : list
+            Layer's output activations
+
+        Returns
+        -------
+        torch.Tensor
+            Filter's mean value of the output activations
+        """
+        activations = torch.cat(activations, 0)
+        mean_activation = torch.mean(activations, dim=(0, 2, 3))
+        return mean_activation
diff --git a/src/sdk/pynni/nni/compression/torch/builtin_quantizers.py b/src/sdk/pynni/nni/compression/torch/builtin_quantizers.py
index 7f9c3b144a..2204428574 100644
--- a/src/sdk/pynni/nni/compression/torch/builtin_quantizers.py
+++ b/src/sdk/pynni/nni/compression/torch/builtin_quantizers.py
@@ -3,7 +3,7 @@
 
 import logging
 import torch
-from .compressor import Quantizer
+from .compressor import Quantizer, QuantGrad, QuantType
 
 __all__ = ['NaiveQuantizer', 'QAT_Quantizer', 'DoReFaQuantizer']
 
@@ -240,4 +240,34 @@ def quantize_weight(self, weight, config, **kwargs):
     def quantize(self, input_ri, q_bits):
         scale = pow(2, q_bits)-1
         output = torch.round(input_ri*scale)/scale
-        return output
\ No newline at end of file
+        return output
+
+
+class ClipGrad(QuantGrad):
+    @staticmethod
+    def quant_backward(tensor, grad_output, quant_type):
+        if quant_type == QuantType.QUANT_OUTPUT:
+            grad_output[torch.abs(tensor) > 1] = 0
+        return grad_output
+
+
+class BNNQuantizer(Quantizer):
+    """Binarized Neural Networks, as defined in:
+    Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1
+    (https://arxiv.org/abs/1602.02830)
+    """
+    def __init__(self, model, config_list):
+        super().__init__(model, config_list)
+        self.quant_grad = ClipGrad
+
+    def quantize_weight(self, weight, config, **kwargs):
+        out = torch.sign(weight)
+        # remove zeros
+        out[out == 0] = 1
+        return out
+
+    def quantize_output(self, output, config, **kwargs):
+        out = torch.sign(output)
+        # remove zeros
+        out[out == 0] = 1
+        return out
diff --git a/src/sdk/pynni/nni/compression/torch/compressor.py b/src/sdk/pynni/nni/compression/torch/compressor.py
index e7965d837b..d8ae199d43 100644
--- a/src/sdk/pynni/nni/compression/torch/compressor.py
+++ b/src/sdk/pynni/nni/compression/torch/compressor.py
@@ -16,6 +16,7 @@ def __init__(self, name, module):
 
         self._forward = None
 
+
 class Compressor:
     """
     Abstract base PyTorch compressor
@@ -193,10 +194,16 @@ def _instrument_layer(self, layer, config):
         layer._forward = layer.module.forward
 
         def new_forward(*inputs):
+            mask = self.calc_mask(layer, config)
             # apply mask to weight
             old_weight = layer.module.weight.data
-            mask = self.calc_mask(layer, config)
-            layer.module.weight.data = old_weight.mul(mask)
+            mask_weight = mask['weight']
+            layer.module.weight.data = old_weight.mul(mask_weight)
+            # apply mask to bias
+            if mask.__contains__('bias') and hasattr(layer.module, 'bias') and layer.module.bias is not None:
+                old_bias = layer.module.bias.data
+                mask_bias = mask['bias']
+                layer.module.bias.data = old_bias.mul(mask_bias)
             # calculate forward
             ret = layer._forward(*inputs)
             return ret
@@ -224,12 +231,14 @@ def export_model(self, model_path, mask_path=None, onnx_path=None, input_shape=N
         for name, m in self.bound_model.named_modules():
             if name == "":
                 continue
-            mask = self.mask_dict.get(name)
-            if mask is not None:
-                mask_sum = mask.sum().item()
-                mask_num = mask.numel()
+            masks = self.mask_dict.get(name)
+            if masks is not None:
+                mask_sum = masks['weight'].sum().item()
+                mask_num = masks['weight'].numel()
                 _logger.info('Layer: %s  Sparsity: %.2f', name, 1 - mask_sum / mask_num)
-                m.weight.data = m.weight.data.mul(mask)
+                m.weight.data = m.weight.data.mul(masks['weight'])
+                if masks.__contains__('bias') and hasattr(m, 'bias') and m.bias is not None:
+                    m.bias.data = m.bias.data.mul(masks['bias'])
             else:
                 _logger.info('Layer: %s  NOT compressed', name)
         torch.save(self.bound_model.state_dict(), model_path)
@@ -258,7 +267,6 @@ def quantize_weight(self, weight, config, op, op_type, op_name):
         """
         quantize should overload this method to quantize weight.
         This method is effectively hooked to :meth:`forward` of the model.
-
         Parameters
         ----------
         weight : Tensor
@@ -272,7 +280,6 @@ def quantize_output(self, output, config, op, op_type, op_name):
         """
         quantize should overload this method to quantize output.
         This method is effectively hooked to :meth:`forward` of the model.
-
         Parameters
         ----------
         output : Tensor
@@ -286,7 +293,6 @@ def quantize_input(self, *inputs, config, op, op_type, op_name):
         """
         quantize should overload this method to quantize input.
         This method is effectively hooked to :meth:`forward` of the model.
-
         Parameters
         ----------
         inputs : Tensor
@@ -300,7 +306,6 @@ def quantize_input(self, *inputs, config, op, op_type, op_name):
     def _instrument_layer(self, layer, config):
         """
         Create a wrapper forward function to replace the original one.
-
         Parameters
         ----------
         layer : LayerInfo
@@ -365,7 +370,6 @@ def quant_backward(tensor, grad_output, quant_type):
         """
         This method should be overrided by subclass to provide customized backward function,
         default implementation is Straight-Through Estimator
-
         Parameters
         ----------
         tensor : Tensor
@@ -375,7 +379,6 @@ def quant_backward(tensor, grad_output, quant_type):
         quant_type : QuantType
             the type of quantization, it can be `QuantType.QUANT_INPUT`, `QuantType.QUANT_WEIGHT`, `QuantType.QUANT_OUTPUT`,
             you can define different behavior for different types.
-
         Returns
         -------
         tensor
@@ -399,3 +402,4 @@ def _check_weight(module):
         return isinstance(module.weight.data, torch.Tensor)
     except AttributeError:
         return False
+    
\ No newline at end of file
diff --git a/src/sdk/pynni/nni/compression/torch/lottery_ticket.py b/src/sdk/pynni/nni/compression/torch/lottery_ticket.py
index d8e4f78c76..233d90ced8 100644
--- a/src/sdk/pynni/nni/compression/torch/lottery_ticket.py
+++ b/src/sdk/pynni/nni/compression/torch/lottery_ticket.py
@@ -17,6 +17,7 @@ class LotteryTicketPruner(Pruner):
     4. Reset the remaining parameters to their values in theta_0, creating the winning ticket f(x;m*theta_0).
     5. Repeat step 2, 3, and 4.
     """
+
     def __init__(self, model, config_list, optimizer, lr_scheduler=None, reset_weights=True):
         """
         Parameters
@@ -55,7 +56,8 @@ def _validate_config(self, config_list):
             assert 'prune_iterations' in config, 'prune_iterations must exist in your config'
             assert 'sparsity' in config, 'sparsity must exist in your config'
             if prune_iterations is not None:
-                assert prune_iterations == config['prune_iterations'], 'The values of prune_iterations must be equal in your config'
+                assert prune_iterations == config[
+                    'prune_iterations'], 'The values of prune_iterations must be equal in your config'
             prune_iterations = config['prune_iterations']
         return prune_iterations
 
@@ -67,8 +69,8 @@ def _print_masks(self, print_mask=False):
             if print_mask:
                 print('mask: ', mask)
             # calculate current sparsity
-            mask_num = mask.sum().item()
-            mask_size = mask.numel()
+            mask_num = mask['weight'].sum().item()
+            mask_size = mask['weight'].numel()
             print('sparsity: ', 1 - mask_num / mask_size)
         torch.set_printoptions(profile='default')
 
@@ -84,11 +86,11 @@ def _calc_mask(self, weight, sparsity, op_name):
             curr_sparsity = self._calc_sparsity(sparsity)
             assert self.mask_dict.get(op_name) is not None
             curr_mask = self.mask_dict.get(op_name)
-            w_abs = weight.abs() * curr_mask
+            w_abs = weight.abs() * curr_mask['weight']
             k = int(w_abs.numel() * curr_sparsity)
             threshold = torch.topk(w_abs.view(-1), k, largest=False).values.max()
             mask = torch.gt(w_abs, threshold).type_as(weight)
-        return mask
+        return {'weight': mask}
 
     def calc_mask(self, layer, config):
         """
diff --git a/src/sdk/pynni/nni/nas/pytorch/spos/__init__.py b/src/sdk/pynni/nni/nas/pytorch/spos/__init__.py
new file mode 100644
index 0000000000..ed432b0845
--- /dev/null
+++ b/src/sdk/pynni/nni/nas/pytorch/spos/__init__.py
@@ -0,0 +1,6 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+from .evolution import SPOSEvolution
+from .mutator import SPOSSupernetTrainingMutator
+from .trainer import SPOSSupernetTrainer
diff --git a/src/sdk/pynni/nni/nas/pytorch/spos/evolution.py b/src/sdk/pynni/nni/nas/pytorch/spos/evolution.py
new file mode 100644
index 0000000000..3541c81fd7
--- /dev/null
+++ b/src/sdk/pynni/nni/nas/pytorch/spos/evolution.py
@@ -0,0 +1,222 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+import json
+import logging
+import os
+import re
+from collections import deque
+
+import numpy as np
+from nni.tuner import Tuner
+from nni.nas.pytorch.classic_nas.mutator import LAYER_CHOICE, INPUT_CHOICE
+
+
+_logger = logging.getLogger(__name__)
+
+
+class SPOSEvolution(Tuner):
+
+    def __init__(self, max_epochs=20, num_select=10, num_population=50, m_prob=0.1,
+                 num_crossover=25, num_mutation=25):
+        """
+        Initialize SPOS Evolution Tuner.
+
+        Parameters
+        ----------
+        max_epochs : int
+            Maximum number of epochs to run.
+        num_select : int
+            Number of survival candidates of each epoch.
+        num_population : int
+            Number of candidates at the start of each epoch. If candidates generated by
+            crossover and mutation are not enough, the rest will be filled with random
+            candidates.
+        m_prob : float
+            The probability of mutation.
+        num_crossover : int
+            Number of candidates generated by crossover in each epoch.
+        num_mutation : int
+            Number of candidates generated by mutation in each epoch.
+        """
+        assert num_population >= num_select
+        self.max_epochs = max_epochs
+        self.num_select = num_select
+        self.num_population = num_population
+        self.m_prob = m_prob
+        self.num_crossover = num_crossover
+        self.num_mutation = num_mutation
+        self.epoch = 0
+        self.candidates = []
+        self.search_space = None
+        self.random_state = np.random.RandomState(0)
+
+        # async status
+        self._to_evaluate_queue = deque()
+        self._sending_parameter_queue = deque()
+        self._pending_result_ids = set()
+        self._reward_dict = dict()
+        self._id2candidate = dict()
+        self._st_callback = None
+
+    def update_search_space(self, search_space):
+        """
+        Handle the initialization/update event of search space.
+        """
+        self._search_space = search_space
+        self._next_round()
+
+    def _next_round(self):
+        _logger.info("Epoch %d, generating...", self.epoch)
+        if self.epoch == 0:
+            self._get_random_population()
+            self.export_results(self.candidates)
+        else:
+            best_candidates = self._select_top_candidates()
+            self.export_results(best_candidates)
+            if self.epoch >= self.max_epochs:
+                return
+            self.candidates = self._get_mutation(best_candidates) + self._get_crossover(best_candidates)
+            self._get_random_population()
+        self.epoch += 1
+
+    def _random_candidate(self):
+        chosen_arch = dict()
+        for key, val in self._search_space.items():
+            if val["_type"] == LAYER_CHOICE:
+                choices = val["_value"]
+                index = self.random_state.randint(len(choices))
+                chosen_arch[key] = {"_value": choices[index], "_idx": index}
+            elif val["_type"] == INPUT_CHOICE:
+                raise NotImplementedError("Input choice is not implemented yet.")
+        return chosen_arch
+
+    def _add_to_evaluate_queue(self, cand):
+        _logger.info("Generate candidate %s, adding to eval queue.", self._get_architecture_repr(cand))
+        self._reward_dict[self._hashcode(cand)] = 0.
+        self._to_evaluate_queue.append(cand)
+
+    def _get_random_population(self):
+        while len(self.candidates) < self.num_population:
+            cand = self._random_candidate()
+            if self._is_legal(cand):
+                _logger.info("Random candidate generated.")
+                self._add_to_evaluate_queue(cand)
+                self.candidates.append(cand)
+
+    def _get_crossover(self, best):
+        result = []
+        for _ in range(10 * self.num_crossover):
+            cand_p1 = best[self.random_state.randint(len(best))]
+            cand_p2 = best[self.random_state.randint(len(best))]
+            assert cand_p1.keys() == cand_p2.keys()
+            cand = {k: cand_p1[k] if self.random_state.randint(2) == 0 else cand_p2[k]
+                    for k in cand_p1.keys()}
+            if self._is_legal(cand):
+                result.append(cand)
+                self._add_to_evaluate_queue(cand)
+            if len(result) >= self.num_crossover:
+                break
+        _logger.info("Found %d architectures with crossover.", len(result))
+        return result
+
+    def _get_mutation(self, best):
+        result = []
+        for _ in range(10 * self.num_mutation):
+            cand = best[self.random_state.randint(len(best))].copy()
+            mutation_sample = np.random.random_sample(len(cand))
+            for s, k in zip(mutation_sample, cand):
+                if s < self.m_prob:
+                    choices = self._search_space[k]["_value"]
+                    index = self.random_state.randint(len(choices))
+                    cand[k] = {"_value": choices[index], "_idx": index}
+            if self._is_legal(cand):
+                result.append(cand)
+                self._add_to_evaluate_queue(cand)
+            if len(result) >= self.num_mutation:
+                break
+        _logger.info("Found %d architectures with mutation.", len(result))
+        return result
+
+    def _get_architecture_repr(self, cand):
+        return re.sub(r"\".*?\": \{\"_idx\": (\d+), \"_value\": \".*?\"\}", r"\1",
+                      self._hashcode(cand))
+
+    def _is_legal(self, cand):
+        if self._hashcode(cand) in self._reward_dict:
+            return False
+        return True
+
+    def _select_top_candidates(self):
+        reward_query = lambda cand: self._reward_dict[self._hashcode(cand)]
+        _logger.info("All candidate rewards: %s", list(map(reward_query, self.candidates)))
+        result = sorted(self.candidates, key=reward_query, reverse=True)[:self.num_select]
+        _logger.info("Best candidate rewards: %s", list(map(reward_query, result)))
+        return result
+
+    @staticmethod
+    def _hashcode(d):
+        return json.dumps(d, sort_keys=True)
+
+    def _bind_and_send_parameters(self):
+        """
+        There are two types of resources: parameter ids and candidates. This function is called at
+        necessary times to bind these resources to send new trials with st_callback.
+        """
+        result = []
+        while self._sending_parameter_queue and self._to_evaluate_queue:
+            parameter_id = self._sending_parameter_queue.popleft()
+            parameters = self._to_evaluate_queue.popleft()
+            self._id2candidate[parameter_id] = parameters
+            result.append(parameters)
+            self._pending_result_ids.add(parameter_id)
+            self._st_callback(parameter_id, parameters)
+            _logger.info("Send parameter [%d] %s.", parameter_id, self._get_architecture_repr(parameters))
+        return result
+
+    def generate_multiple_parameters(self, parameter_id_list, **kwargs):
+        """
+        Callback function necessary to implement a tuner. This will put more parameter ids into the
+        parameter id queue.
+        """
+        if "st_callback" in kwargs and self._st_callback is None:
+            self._st_callback = kwargs["st_callback"]
+        for parameter_id in parameter_id_list:
+            self._sending_parameter_queue.append(parameter_id)
+        self._bind_and_send_parameters()
+        return []  # always not use this. might induce problem of over-sending
+
+    def receive_trial_result(self, parameter_id, parameters, value, **kwargs):
+        """
+        Callback function. Receive a trial result.
+        """
+        _logger.info("Candidate %d, reported reward %f", parameter_id, value)
+        self._reward_dict[self._hashcode(self._id2candidate[parameter_id])] = value
+
+    def trial_end(self, parameter_id, success, **kwargs):
+        """
+        Callback function when a trial is ended and resource is released.
+        """
+        self._pending_result_ids.remove(parameter_id)
+        if not self._pending_result_ids and not self._to_evaluate_queue:
+            # a new epoch now
+            self._next_round()
+            assert self._st_callback is not None
+            self._bind_and_send_parameters()
+
+    def export_results(self, result):
+        """
+        Export a number of candidates to `checkpoints` dir.
+
+        Parameters
+        ----------
+        result : dict
+        """
+        os.makedirs("checkpoints", exist_ok=True)
+        for i, cand in enumerate(result):
+            converted = dict()
+            for cand_key, cand_val in cand.items():
+                onehot = [k == cand_val["_idx"] for k in range(len(self._search_space[cand_key]["_value"]))]
+                converted[cand_key] = onehot
+            with open(os.path.join("checkpoints", "%03d_%03d.json" % (self.epoch, i)), "w") as fp:
+                json.dump(converted, fp)
diff --git a/src/sdk/pynni/nni/nas/pytorch/spos/mutator.py b/src/sdk/pynni/nni/nas/pytorch/spos/mutator.py
new file mode 100644
index 0000000000..88a01eeeaf
--- /dev/null
+++ b/src/sdk/pynni/nni/nas/pytorch/spos/mutator.py
@@ -0,0 +1,63 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+import logging
+
+import numpy as np
+from nni.nas.pytorch.random import RandomMutator
+
+_logger = logging.getLogger(__name__)
+
+
+class SPOSSupernetTrainingMutator(RandomMutator):
+    def __init__(self, model, flops_func=None, flops_lb=None, flops_ub=None,
+                 flops_bin_num=7, flops_sample_timeout=500):
+        """
+
+        Parameters
+        ----------
+        model : nn.Module
+        flops_func : callable
+            Callable that takes a candidate from `sample_search` and returns its candidate. When `flops_func`
+            is None, functions related to flops will be deactivated.
+        flops_lb : number
+            Lower bound of flops.
+        flops_ub : number
+            Upper bound of flops.
+        flops_bin_num : number
+            Number of bins divided for the interval of flops to ensure the uniformity. Bigger number will be more
+            uniform, but the sampling will be slower.
+        flops_sample_timeout : int
+            Maximum number of attempts to sample before giving up and use a random candidate.
+        """
+        super().__init__(model)
+        self._flops_func = flops_func
+        if self._flops_func is not None:
+            self._flops_bin_num = flops_bin_num
+            self._flops_bins = [flops_lb + (flops_ub - flops_lb) / flops_bin_num * i for i in range(flops_bin_num + 1)]
+            self._flops_sample_timeout = flops_sample_timeout
+
+    def sample_search(self):
+        """
+        Sample a candidate for training. When `flops_func` is not None, candidates will be sampled uniformly
+        relative to flops.
+
+        Returns
+        -------
+        dict
+        """
+        if self._flops_func is not None:
+            for times in range(self._flops_sample_timeout):
+                idx = np.random.randint(self._flops_bin_num)
+                cand = super().sample_search()
+                if self._flops_bins[idx] <= self._flops_func(cand) <= self._flops_bins[idx + 1]:
+                    _logger.debug("Sampled candidate flops %f in %d times.", cand, times)
+                    return cand
+            _logger.warning("Failed to sample a flops-valid candidate within %d tries.", self._flops_sample_timeout)
+        return super().sample_search()
+
+    def sample_final(self):
+        """
+        Implement only to suffice the interface of Mutator.
+        """
+        return self.sample_search()
diff --git a/src/sdk/pynni/nni/nas/pytorch/spos/trainer.py b/src/sdk/pynni/nni/nas/pytorch/spos/trainer.py
new file mode 100644
index 0000000000..ab23760bf9
--- /dev/null
+++ b/src/sdk/pynni/nni/nas/pytorch/spos/trainer.py
@@ -0,0 +1,63 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+import logging
+
+import torch
+from nni.nas.pytorch.trainer import Trainer
+from nni.nas.pytorch.utils import AverageMeterGroup
+
+from .mutator import SPOSSupernetTrainingMutator
+
+logger = logging.getLogger(__name__)
+
+
+class SPOSSupernetTrainer(Trainer):
+    """
+    This trainer trains a supernet that can be used for evolution search.
+    """
+
+    def __init__(self, model, loss, metrics,
+                 optimizer, num_epochs, train_loader, valid_loader,
+                 mutator=None, batch_size=64, workers=4, device=None, log_frequency=None,
+                 callbacks=None):
+        assert torch.cuda.is_available()
+        super().__init__(model, mutator if mutator is not None else SPOSSupernetTrainingMutator(model),
+                         loss, metrics, optimizer, num_epochs, None, None,
+                         batch_size, workers, device, log_frequency, callbacks)
+
+        self.train_loader = train_loader
+        self.valid_loader = valid_loader
+
+    def train_one_epoch(self, epoch):
+        self.model.train()
+        meters = AverageMeterGroup()
+        for step, (x, y) in enumerate(self.train_loader):
+            self.optimizer.zero_grad()
+            self.mutator.reset()
+            logits = self.model(x)
+            loss = self.loss(logits, y)
+            loss.backward()
+            self.optimizer.step()
+
+            metrics = self.metrics(logits, y)
+            metrics["loss"] = loss.item()
+            meters.update(metrics)
+            if self.log_frequency is not None and step % self.log_frequency == 0:
+                logger.info("Epoch [%s/%s] Step [%s/%s]  %s", epoch + 1,
+                            self.num_epochs, step + 1, len(self.train_loader), meters)
+
+    def validate_one_epoch(self, epoch):
+        self.model.eval()
+        meters = AverageMeterGroup()
+        with torch.no_grad():
+            for step, (x, y) in enumerate(self.valid_loader):
+                self.mutator.reset()
+                logits = self.model(x)
+                loss = self.loss(logits, y)
+                metrics = self.metrics(logits, y)
+                metrics["loss"] = loss.item()
+                meters.update(metrics)
+                if self.log_frequency is not None and step % self.log_frequency == 0:
+                    logger.info("Epoch [%s/%s] Validation Step [%s/%s]  %s", epoch + 1,
+                                self.num_epochs, step + 1, len(self.valid_loader), meters)
diff --git a/src/sdk/pynni/tests/test_compressor.py b/src/sdk/pynni/tests/test_compressor.py
index 0632858cec..778f4341e9 100644
--- a/src/sdk/pynni/tests/test_compressor.py
+++ b/src/sdk/pynni/tests/test_compressor.py
@@ -136,12 +136,12 @@ def test_torch_fpgm_pruner(self):
         model.conv2.weight.data = torch.tensor(w).float()
         layer = torch_compressor.compressor.LayerInfo('conv2', model.conv2)
         masks = pruner.calc_mask(layer, config_list[0])
-        assert all(torch.sum(masks, (1, 2, 3)).numpy() == np.array([45., 45., 45., 45., 0., 0., 45., 45., 45., 45.]))
+        assert all(torch.sum(masks['weight'], (1, 2, 3)).numpy() == np.array([45., 45., 45., 45., 0., 0., 45., 45., 45., 45.]))
 
         pruner.update_epoch(1)
         model.conv2.weight.data = torch.tensor(w).float()
         masks = pruner.calc_mask(layer, config_list[1])
-        assert all(torch.sum(masks, (1, 2, 3)).numpy() == np.array([45., 45., 0., 0., 0., 0., 0., 0., 45., 45.]))
+        assert all(torch.sum(masks['weight'], (1, 2, 3)).numpy() == np.array([45., 45., 0., 0., 0., 0., 0., 0., 45., 45.]))
 
     @tf2
     def test_tf_fpgm_pruner(self):
@@ -190,8 +190,8 @@ def test_torch_l1filter_pruner(self):
         mask1 = pruner.calc_mask(layer1, config_list[0])
         layer2 = torch_compressor.compressor.LayerInfo('conv2', model.conv2)
         mask2 = pruner.calc_mask(layer2, config_list[1])
-        assert all(torch.sum(mask1, (1, 2, 3)).numpy() == np.array([0., 27., 27., 27., 27.]))
-        assert all(torch.sum(mask2, (1, 2, 3)).numpy() == np.array([0., 0., 0., 27., 27.]))
+        assert all(torch.sum(mask1['weight'], (1, 2, 3)).numpy() == np.array([0., 27., 27., 27., 27.]))
+        assert all(torch.sum(mask2['weight'], (1, 2, 3)).numpy() == np.array([0., 0., 0., 27., 27.]))
 
     def test_torch_slim_pruner(self):
         """
@@ -218,8 +218,10 @@ def test_torch_slim_pruner(self):
         mask1 = pruner.calc_mask(layer1, config_list[0])
         layer2 = torch_compressor.compressor.LayerInfo('bn2', model.bn2)
         mask2 = pruner.calc_mask(layer2, config_list[0])
-        assert all(mask1.numpy() == np.array([0., 1., 1., 1., 1.]))
-        assert all(mask2.numpy() == np.array([0., 1., 1., 1., 1.]))
+        assert all(mask1['weight'].numpy() == np.array([0., 1., 1., 1., 1.]))
+        assert all(mask2['weight'].numpy() == np.array([0., 1., 1., 1., 1.]))
+        assert all(mask1['bias'].numpy() == np.array([0., 1., 1., 1., 1.]))
+        assert all(mask2['bias'].numpy() == np.array([0., 1., 1., 1., 1.]))
 
         config_list = [{'sparsity': 0.6, 'op_types': ['BatchNorm2d']}]
         model.bn1.weight.data = torch.tensor(w).float()
@@ -230,8 +232,10 @@ def test_torch_slim_pruner(self):
         mask1 = pruner.calc_mask(layer1, config_list[0])
         layer2 = torch_compressor.compressor.LayerInfo('bn2', model.bn2)
         mask2 = pruner.calc_mask(layer2, config_list[0])
-        assert all(mask1.numpy() == np.array([0., 0., 0., 1., 1.]))
-        assert all(mask2.numpy() == np.array([0., 0., 0., 1., 1.]))
+        assert all(mask1['weight'].numpy() == np.array([0., 0., 0., 1., 1.]))
+        assert all(mask2['weight'].numpy() == np.array([0., 0., 0., 1., 1.]))
+        assert all(mask1['bias'].numpy() == np.array([0., 0., 0., 1., 1.]))
+        assert all(mask2['bias'].numpy() == np.array([0., 0., 0., 1., 1.]))
 
     def test_torch_QAT_quantizer(self):
         model = TorchModel()
diff --git a/tools/nni_cmd/config_schema.py b/tools/nni_cmd/config_schema.py
index 413f7b94a2..8017946ce9 100644
--- a/tools/nni_cmd/config_schema.py
+++ b/tools/nni_cmd/config_schema.py
@@ -65,7 +65,7 @@ def setPathCheck(key):
         'builtinTunerName': 'SMAC',
         Optional('classArgs'): {
             'optimize_mode': setChoice('optimize_mode', 'maximize', 'minimize'),
-            'config_dedup': setType('config_dedup', bool)
+            Optional('config_dedup'): setType('config_dedup', bool)
         },
         Optional('includeIntermediateResults'): setType('includeIntermediateResults', bool),
         Optional('gpuIndices'): Or(int, And(str, lambda x: len([int(i) for i in x.split(',')]) > 0), error='gpuIndex format error!'),