diff --git a/docs/en_US/Compressor/DependencyAware.md b/docs/en_US/Compressor/DependencyAware.md
new file mode 100644
index 0000000000..6881198ef4
--- /dev/null
+++ b/docs/en_US/Compressor/DependencyAware.md
@@ -0,0 +1,55 @@
+# Dependency-aware Mode for Filter Pruning
+
+Currently, we have several filter pruning algorithm for the convolutional layers: FPGM Pruner, L1Filter Pruner, L2Filter Pruner, Activation APoZ Rank Filter Pruner, Activation Mean Rank Filter Pruner, Taylor FO On Weight Pruner. In these filter pruning algorithms, the pruner will prune each convolutional layer separately. While pruning a convolution layer, the algorithm will quantify the importance of each filter based on some specific rules(such as l1-norm), and prune the less important filters.
+
+As [dependency analysis utils](./CompressionUtils.md) shows, if the output channels of two convolutional layers(conv1, conv2) are added together, then these two conv layers have channel dependency with each other(more details please see [Compression Utils](./CompressionUtils.md)). Take the following figure as an example.
+![](../../img/mask_conflict.jpg)
+
+If we prune the first 50% of output channels(filters) for conv1, and prune the last 50% of output channels for conv2. Although both layers have pruned 50% of the filters, the speedup module still needs to add zeros to align the output channels. In this case, we cannot harvest the speed benefit from the model pruning.
+
+
+ To better gain the speed benefit of the model pruning, we add a dependency-aware mode for the Filter Pruner. In the dependency-aware mode, the pruner prunes the model not only based on the l1 norm of each filter, but also the topology of the whole network architecture.
+
+In the dependency-aware mode(`dependency_aware` is set `True`), the pruner will try to prune the same output channels for the layers that have the channel dependencies with each other, as shown in the following figure.
+
+![](../../img/dependency-aware.jpg)
+
+Take the dependency-aware mode of L1Filter Pruner as an example. Specifically, the pruner will calculate the L1 norm (for example) sum of all the layers in the dependency set for each channel. Obviously, the number of channels that can actually be pruned of this dependency set in the end is determined by the minimum sparsity of layers in this dependency set(denoted by `min_sparsity`). According to the L1 norm sum of each channel, the pruner will prune the same `min_sparsity` channels for all the layers. Next, the pruner will additionally prune `sparsity` - `min_sparsity` channels for each convolutional layer based on its own L1 norm of each channel. For example, suppose the output channels of `conv1` , `conv2` are added together and the configured sparsities of `conv1` and `conv2` are 0.3, 0.2 respectively. In this case, the `dependency-aware pruner` will 
+
+    - First, prune the same 20% of channels for `conv1` and `conv2` according to L1 norm sum of `conv1` and `conv2`. 
+    - Second, the pruner will additionally prune 10% channels for `conv1` according to the L1 norm of each channel of `conv1`.
+
+In addition, for the convolutional layers that have more than one filter group, `dependency-aware pruner` will also try to prune the same number of the channels for each filter group. Overall, this pruner will prune the model according to the L1 norm of each filter and try to meet the topological constrains(channel dependency, etc) to improve the final speed gain after the speedup process. 
+
+In the dependency-aware mode, the pruner will provide a better speed gain from the model pruning.
+
+## Usage
+In this section, we will show how to enable the dependency-aware mode for the filter pruner. Currently, only the one-shot pruners such as FPGM Pruner, L1Filter Pruner, L2Filter Pruner, Activation APoZ Rank Filter Pruner, Activation Mean Rank Filter Pruner, Taylor FO On Weight Pruner, support the dependency-aware mode.
+
+To enable the dependency-aware mode for `L1FilterPruner`:
+```python
+from nni.compression.torch import L1FilterPruner
+config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'] }]
+# dummy_input is necessary for the dependency_aware mode
+dummy_input = torch.ones(1, 3, 224, 224).cuda()
+pruner = L1FilterPruner(model, config_list, dependency_aware=True, dummy_input=dummy_input)
+# for L2FilterPruner
+# pruner = L2FilterPruner(model, config_list, dependency_aware=True, dummy_input=dummy_input)
+# for FPGMPruner
+# pruner = FPGMPruner(model, config_list, dependency_aware=True, dummy_input=dummy_input)
+# for ActivationAPoZRankFilterPruner
+# pruner = ActivationAPoZRankFilterPruner(model, config_list, statistics_batch_num=1, , dependency_aware=True, dummy_input=dummy_input)
+# for ActivationMeanRankFilterPruner
+# pruner = ActivationMeanRankFilterPruner(model, config_list, statistics_batch_num=1, dependency_aware=True, dummy_input=dummy_input)
+# for TaylorFOWeightFilterPruner
+# pruner = TaylorFOWeightFilterPruner(model, config_list, statistics_batch_num=1, dependency_aware=True, dummy_input=dummy_input)
+
+pruner.compress()
+```
+
+## Evaluation
+In order to compare the performance of the pruner with or without the dependency-aware mode, we use L1FilterPruner to prune the Mobilenet_v2 separately when the dependency-aware mode is turned on and off. To simplify the experiment, we use the uniform pruning which means we allocate the same sparsity for all convolutional layers in the model.
+We trained a Mobilenet_v2 model on the cifar10 dataset and prune the model based on this pretrained checkpoint. The following figure shows the accuracy and FLOPs of the model pruned by different pruners.
+![](../../img/mobilev2_l1_cifar.jpg)
+
+In the figure, the `Dependency-aware` represents the L1FilterPruner with dependency-aware mode enabled. `L1 Filter` is the normal `L1FilterPruner` without the dependency-aware mode, and the `No-Dependency` means  pruner only prunes the layers that has no channel dependency with other layers. As we can see in the figure, when the dependency-aware mode enabled, the pruner can bring higher accuracy under the same Flops.
\ No newline at end of file
diff --git a/docs/en_US/Compressor/Pruner.md b/docs/en_US/Compressor/Pruner.md
index 0901c2a46d..cc0de93768 100644
--- a/docs/en_US/Compressor/Pruner.md
+++ b/docs/en_US/Compressor/Pruner.md
@@ -114,7 +114,9 @@ FPGMPruner prune filters with the smallest geometric median.
 
  ![](../../img/fpgm_fig1.png)
 
->Previous works utilized “smaller-norm-less-important” criterion to prune filters with smaller norm values in a convolutional neural network. In this paper, we analyze this norm-based criterion and point out that its effectiveness depends on two requirements that are not always met: (1) the norm deviation of the filters should be large; (2) the minimum norm of the filters should be small. To solve this problem, we propose a novel filter pruning method, namely Filter Pruning via Geometric Median (FPGM), to compress the model regardless of those two requirements. Unlike previous methods, FPGM compresses CNN models by pruning filters with redundancy, rather than those with “relatively less” importance.
+>Previous works utilized “smaller-norm-less-important” criterion to prune filters with smaller norm values in a convolutional neural network. In this paper, we analyze this norm-based criterion and point out that its effectiveness depends on two requirements that are not always met: (1) the norm deviation of the filters should be large; (2) the minimum norm of the filters should be small. To solve this problem, we propose a novel filter pruning method, namely Filter Pruning via Geometric Median (FPGM), to compress the model regardless of those two requirements. Unlike previous methods, FPGM compresses CNN models by pruning filters with redundancy, rather than those with “relatively less” importance. 
+
+We also provide a dependency-aware mode for this pruner to get better speedup from the pruning. Please reference [dependency-aware](./DependencyAware.md) for more details.
 
 ### Usage
 
@@ -154,6 +156,8 @@ This is an one-shot pruner, In ['PRUNING FILTERS FOR EFFICIENT CONVNETS'](https:
 > 4. A new kernel matrix is created for both the ![](http://latex.codecogs.com/gif.latex?i)th and ![](http://latex.codecogs.com/gif.latex?i+1)th layers, and the remaining kernel
 >      weights are copied to the new model.
 
+In addition, we also provide a dependency-aware mode for the L1FilterPruner. For more details about the dependency-aware mode, please reference [dependency-aware mode](./DependencyAware.md).
+
 ### Usage
 
 PyTorch code
@@ -189,6 +193,8 @@ The experiments code can be found at [examples/model_compress]( https://github.c
 
 This is a structured pruning algorithm that prunes the filters with the smallest L2 norm of the weights. It is implemented as a one-shot pruner.
 
+We also provide a dependency-aware mode for this pruner to get better speedup from the pruning. Please reference [dependency-aware](./DependencyAware.md) for more details.
+
 ### Usage
 
 PyTorch code
@@ -200,6 +206,7 @@ pruner = L2FilterPruner(model, config_list)
 pruner.compress()
 ```
 
+
 ### User configuration for L2Filter Pruner
 
 ##### PyTorch
@@ -208,6 +215,7 @@ pruner.compress()
 ```
 ***
 
+
 ## ActivationAPoZRankFilter Pruner
 
 ActivationAPoZRankFilter Pruner is a pruner which prunes the filters with the smallest importance criterion `APoZ` calculated from the output activations of convolution layers to achieve a preset level of network sparsity. The pruning criterion `APoZ` is explained in the paper [Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures](https://arxiv.org/abs/1607.03250).
@@ -216,6 +224,8 @@ The APoZ is defined as:
 
 ![](../../img/apoz.png)
 
+We also provide a dependency-aware mode for this pruner to get better speedup from the pruning. Please reference [dependency-aware](./DependencyAware.md) for more details.
+
 ### Usage
 
 PyTorch code
@@ -234,6 +244,8 @@ Note: ActivationAPoZRankFilterPruner is used to prune convolutional layers withi
 
 You can view [example](https://github.com/microsoft/nni/blob/master/examples/model_compress/model_prune_torch.py) for more information.
 
+
+
 ### User configuration for ActivationAPoZRankFilter Pruner
 
 ##### PyTorch
@@ -247,6 +259,8 @@ You can view [example](https://github.com/microsoft/nni/blob/master/examples/mod
 
 ActivationMeanRankFilterPruner is a pruner which prunes the filters with the smallest importance criterion `mean activation` calculated from the output activations of convolution layers to achieve a preset level of network sparsity. The pruning criterion `mean activation` is explained in section 2.2 of the paper[Pruning Convolutional Neural Networks for Resource Efficient Inference](https://arxiv.org/abs/1611.06440). Other pruning criteria mentioned in this paper will be supported in future release.
 
+We also provide a dependency-aware mode for this pruner to get better speedup from the pruning. Please reference [dependency-aware](./DependencyAware.md) for more details.
+
 ### Usage
 
 PyTorch code
@@ -265,6 +279,7 @@ Note: ActivationMeanRankFilterPruner is used to prune convolutional layers withi
 
 You can view [example](https://github.com/microsoft/nni/blob/master/examples/model_compress/model_prune_torch.py) for more information.
 
+
 ### User configuration for ActivationMeanRankFilterPruner
 
 ##### PyTorch
@@ -273,6 +288,7 @@ You can view [example](https://github.com/microsoft/nni/blob/master/examples/mod
 ```
 ***
 
+
 ## TaylorFOWeightFilter Pruner
 
 TaylorFOWeightFilter Pruner is a pruner which prunes convolutional layers based on estimated importance calculated from the first order taylor expansion on weights to achieve a preset level of network sparsity. The estimated importance of filters is defined as the paper [Importance Estimation for Neural Network Pruning](http://jankautz.com/publications/Importance4NNPruning_CVPR19.pdf). Other pruning criteria mentioned in this paper will be supported in future release.
@@ -281,6 +297,8 @@ TaylorFOWeightFilter Pruner is a pruner which prunes convolutional layers based
 
 ![](../../img/importance_estimation_sum.png)
 
+We also provide a dependency-aware mode for this pruner to get better speedup from the pruning. Please reference [dependency-aware](./DependencyAware.md) for more details.
+
 ### Usage
 
 PyTorch code
diff --git a/docs/en_US/TrainingService/RemoteMachineMode.md b/docs/en_US/TrainingService/RemoteMachineMode.md
index bb8c9d5d67..f25aa364bc 100644
--- a/docs/en_US/TrainingService/RemoteMachineMode.md
+++ b/docs/en_US/TrainingService/RemoteMachineMode.md
@@ -107,3 +107,79 @@ Files in `codeDir` will be uploaded to remote machines automatically. You can ru
 ```bash
 nnictl create --config examples/trials/mnist-annotation/config_remote.yml
 ```
+
+### Configure python environment
+
+By default, commands and scripts will be executed in the default environment in remote machine. If there are multiple python virtual environments in your remote machine, and you want to run experiments in a specific environment, then use __preCommand__ to specify a python environment on your remote machine. 
+
+Use `examples/trials/mnist-tfv2` as the example. Below is content of `examples/trials/mnist-tfv2/config_remote.yml`:
+
+```yaml
+authorName: default
+experimentName: example_mnist
+trialConcurrency: 1
+maxExecDuration: 1h
+maxTrialNum: 10
+#choice: local, remote, pai
+trainingServicePlatform: remote
+searchSpacePath: search_space.json
+#choice: true, false
+useAnnotation: false
+tuner:
+  #choice: TPE, Random, Anneal, Evolution, BatchTuner, MetisTuner
+  #SMAC (SMAC should be installed through nnictl)
+  builtinTunerName: TPE
+  classArgs:
+    #choice: maximize, minimize
+    optimize_mode: maximize
+trial:
+  command: python3 mnist.py
+  codeDir: .
+  gpuNum: 0
+#machineList can be empty if the platform is local
+machineList:
+  - ip: ${replace_to_your_remote_machine_ip}
+    username: ${replace_to_your_remote_machine_username}
+    sshKeyPath: ${replace_to_your_remote_machine_sshKeyPath}
+    # Pre-command will be executed before the remote machine executes other commands.
+    # Below is an example of specifying python environment.
+    # If you want to execute multiple commands, please use "&&" to connect them.
+    # preCommand: source ${replace_to_absolute_path_recommended_here}/bin/activate
+    # preCommand: source ${replace_to_conda_path}/bin/activate ${replace_to_conda_env_name}
+    preCommand: export PATH=${replace_to_python_environment_path_in_your_remote_machine}:$PATH
+```
+
+The __preCommand__ will be executed before the remote machine executes other commands. So you can configure python environment path like this:
+
+```yaml
+# Linux remote machine
+preCommand: export PATH=${replace_to_python_environment_path_in_your_remote_machine}:$PATH
+# Windows remote machine
+preCommand: set path=${replace_to_python_environment_path_in_your_remote_machine};%path%
+```
+
+Or if you want to activate the `virtualenv` environment:
+
+```yaml
+# Linux remote machine
+preCommand: source ${replace_to_absolute_path_recommended_here}/bin/activate
+# Windows remote machine
+preCommand: ${replace_to_absolute_path_recommended_here}\\scripts\\activate
+```
+
+Or if you want to activate the `conda` environment:
+
+```yaml
+# Linux remote machine
+preCommand: source ${replace_to_conda_path}/bin/activate ${replace_to_conda_env_name}
+# Windows remote machine
+preCommand: call activate ${replace_to_conda_env_name}
+```
+
+If you want multiple commands to be executed, you can use `&&` to connect these commands:
+
+```yaml
+preCommand: command1 && command2 && command3
+```
+
+__Note__: Because __preCommand__ will execute before other commands each time, it is strongly not recommended to set __preCommand__ that will make changes to system, i.e. `mkdir` or `touch`.
diff --git a/docs/en_US/TrialExample/SklearnExamples.md b/docs/en_US/TrialExample/SklearnExamples.md
index 0c481ee2ac..469db6b201 100644
--- a/docs/en_US/TrialExample/SklearnExamples.md
+++ b/docs/en_US/TrialExample/SklearnExamples.md
@@ -67,7 +67,7 @@ It is easy to use NNI in your scikit-learn code, there are only a few steps.
     "kernel": {"_type":"choice","_value":["linear", "rbf", "poly", "sigmoid"]},
     "degree": {"_type":"choice","_value":[1, 2, 3, 4]},
     "gamma": {"_type":"uniform","_value":[0.01, 0.1]},
-    "coef0 ": {"_type":"uniform","_value":[0.01, 0.1]}
+    "coef0": {"_type":"uniform","_value":[0.01, 0.1]}
   }
   ```
 
diff --git a/docs/en_US/Tutorial/ExperimentConfig.md b/docs/en_US/Tutorial/ExperimentConfig.md
index 687e9980f6..0321154e58 100644
--- a/docs/en_US/Tutorial/ExperimentConfig.md
+++ b/docs/en_US/Tutorial/ExperimentConfig.md
@@ -58,6 +58,7 @@ This document describes the rules to write the config file, and provides some ex
       - [gpuIndices](#gpuindices-3)
       - [maxTrialNumPerGpu](#maxtrialnumpergpu-1)
       - [useActiveGpu](#useactivegpu-1)
+      - [preCommand](#preCommand)
     + [kubeflowConfig](#kubeflowconfig)
       - [operator](#operator)
       - [storage](#storage)
@@ -583,6 +584,14 @@ Optional. Bool. Default: false.
 
 Used to specify whether to use a GPU if there is another process. By default, NNI will use the GPU only if there is no other active process in the GPU. If __useActiveGpu__ is set to true, NNI will use the GPU regardless of another processes. This field is not applicable for NNI on Windows.
 
+#### preCommand
+
+Optional. String.
+
+Specifies the pre-command that will be executed before the remote machine executes other commands. Users can configure the experimental environment on remote machine by setting __preCommand__. If there are multiple commands need to execute, use `&&` to connect them, such as `preCommand: command1 && command2 && ...`.
+
+__Note__: Because __preCommand__ will execute before other commands each time, it is strongly not recommended to set __preCommand__ that will make changes to system, i.e. `mkdir` or `touch`.
+
 ### kubeflowConfig
 
 #### operator
@@ -795,6 +804,12 @@ If run trial jobs in remote machine, users could specify the remote machine info
       username: test
       sshKeyPath: /nni/sshkey
       passphrase: qwert
+      # Pre-command will be executed before the remote machine executes other commands.
+      # Below is an example of specifying python environment.
+      # If you want to execute multiple commands, please use "&&" to connect them.
+      # preCommand: source ${replace_to_absolute_path_recommended_here}/bin/activate
+      # preCommand: source ${replace_to_conda_path}/bin/activate ${replace_to_conda_env_name}
+      preCommand: export PATH=${replace_to_python_environment_path_in_your_remote_machine}:$PATH
   ```
 
 ### PAI mode
diff --git a/docs/en_US/Tutorial/Nnictl.md b/docs/en_US/Tutorial/Nnictl.md
index 28084494a3..edb25afacd 100644
--- a/docs/en_US/Tutorial/Nnictl.md
+++ b/docs/en_US/Tutorial/Nnictl.md
@@ -578,6 +578,7 @@ Debug mode will disable version check function in Trialkeeper.
   |--path, -p|  True| |the file path of nni package|
   |--codeDir, -c| True| |the path of codeDir for loaded experiment, this path will also put the code in the loaded experiment package|
   |--logDir, -l| False| |the path of logDir for loaded experiment|
+  |--searchSpacePath, -s| True| |the path of search space file for loaded experiment, this path contains file name. Default in $codeDir/search_space.json|
 
   * Examples
 
diff --git a/docs/en_US/model_compression.rst b/docs/en_US/model_compression.rst
index e594ba1fb7..8e5dce684a 100644
--- a/docs/en_US/model_compression.rst
+++ b/docs/en_US/model_compression.rst
@@ -17,7 +17,7 @@ For details, please refer to the following tutorials:
 
     Overview <Compressor/Overview>
     Quick Start <Compressor/QuickStart>
-    Pruners <Compressor/Pruner>
+    Pruning <pruning>
     Quantizers <Compressor/Quantizer>
     Automatic Model Compression <Compressor/AutoCompression>
     Model Speedup <Compressor/ModelSpeedup>
diff --git a/docs/en_US/pruning.rst b/docs/en_US/pruning.rst
new file mode 100644
index 0000000000..0f06d2efc8
--- /dev/null
+++ b/docs/en_US/pruning.rst
@@ -0,0 +1,17 @@
+#################
+Pruning
+#################
+
+NNI provides several pruning algorithms that support fine-grained weight pruning and structural filter pruning.
+It supports Tensorflow and PyTorch with unified interface.
+For users to prune their models, they only need to add several lines in their code.
+For the structural filter pruning, NNI also provides a dependency-aware mode. In the dependency-aware mode, the
+filter pruner will get better speed gain after the speedup.
+
+For details, please refer to the following tutorials:
+
+..  toctree::
+    :maxdepth: 2
+
+    Pruners <Compressor/Pruner>
+    Dependency Aware Mode <Compressor/DependencyAware>
diff --git a/docs/img/dependency-aware.jpg b/docs/img/dependency-aware.jpg
new file mode 100644
index 0000000000..d2f9b57db3
Binary files /dev/null and b/docs/img/dependency-aware.jpg differ
diff --git a/docs/img/mask_conflict.jpg b/docs/img/mask_conflict.jpg
new file mode 100644
index 0000000000..d28bacf520
Binary files /dev/null and b/docs/img/mask_conflict.jpg differ
diff --git a/docs/img/mobilev2_l1_cifar.jpg b/docs/img/mobilev2_l1_cifar.jpg
new file mode 100644
index 0000000000..202e5740e1
Binary files /dev/null and b/docs/img/mobilev2_l1_cifar.jpg differ
diff --git a/examples/model_compress/model_prune_torch.py b/examples/model_compress/model_prune_torch.py
index 9129509ae7..885666b586 100644
--- a/examples/model_compress/model_prune_torch.py
+++ b/examples/model_compress/model_prune_torch.py
@@ -48,7 +48,7 @@
         'dataset_name': 'mnist',
         'model_name': 'naive',
         'pruner_class': FPGMPruner,
-        'config_list':[{
+        'config_list': [{
             'sparsity': 0.5,
             'op_types': ['Conv2d']
         }]
@@ -85,6 +85,7 @@
     }
 }
 
+
 def get_data_loaders(dataset_name='mnist', batch_size=128):
     assert dataset_name in ['cifar10', 'mnist']
 
@@ -98,20 +99,23 @@ def get_data_loaders(dataset_name='mnist', batch_size=128):
     train_loader = DataLoader(
         ds_class(
             './data', train=True, download=True,
-            transform=transforms.Compose([transforms.ToTensor(), transforms.Normalize(MEAN, STD)])
+            transform=transforms.Compose(
+                [transforms.ToTensor(), transforms.Normalize(MEAN, STD)])
         ),
         batch_size=batch_size, shuffle=True
     )
     test_loader = DataLoader(
         ds_class(
             './data', train=False, download=True,
-            transform=transforms.Compose([transforms.ToTensor(), transforms.Normalize(MEAN, STD)])
+            transform=transforms.Compose(
+                [transforms.ToTensor(), transforms.Normalize(MEAN, STD)])
         ),
         batch_size=batch_size, shuffle=False
     )
 
     return train_loader, test_loader
 
+
 class NaiveModel(torch.nn.Module):
     def __init__(self):
         super().__init__()
@@ -132,6 +136,7 @@ def forward(self, x):
         x = self.fc2(x)
         return x
 
+
 def create_model(model_name='naive'):
     assert model_name in ['naive', 'vgg16', 'vgg19']
 
@@ -142,10 +147,18 @@ def create_model(model_name='naive'):
     else:
         return VGG(19)
 
-def create_pruner(model, pruner_name, optimizer=None):
+
+def create_pruner(model, pruner_name, optimizer=None, dependency_aware=False, dummy_input=None):
     pruner_class = prune_config[pruner_name]['pruner_class']
     config_list = prune_config[pruner_name]['config_list']
-    return pruner_class(model, config_list, optimizer)
+    kw_args = {}
+    if dependency_aware:
+        print('Enable the dependency_aware mode')
+        # note that, not all pruners support the dependency_aware mode
+        kw_args['dependency_aware'] = True
+        kw_args['dummy_input'] = dummy_input
+    pruner = pruner_class(model, config_list, optimizer, **kw_args)
+    return pruner
 
 def train(model, device, train_loader, optimizer):
     model.train()
@@ -157,7 +170,9 @@ def train(model, device, train_loader, optimizer):
         loss.backward()
         optimizer.step()
         if batch_idx % 100 == 0:
-            print('{:2.0f}%  Loss {}'.format(100 * batch_idx / len(train_loader), loss.item()))
+            print('{:2.0f}%  Loss {}'.format(
+                100 * batch_idx / len(train_loader), loss.item()))
+
 
 def test(model, device, test_loader):
     model.eval()
@@ -167,7 +182,8 @@ def test(model, device, test_loader):
         for data, target in test_loader:
             data, target = data.to(device), target.to(device)
             output = model(data)
-            test_loss += F.cross_entropy(output, target, reduction='sum').item()
+            test_loss += F.cross_entropy(output,
+                                         target, reduction='sum').item()
             pred = output.argmax(dim=1, keepdim=True)
             correct += pred.eq(target.view_as(pred)).sum().item()
     test_loss /= len(test_loader.dataset)
@@ -177,20 +193,25 @@ def test(model, device, test_loader):
         test_loss, acc))
     return acc
 
+
 def main(args):
-    device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
+    device = torch.device(
+        'cuda') if torch.cuda.is_available() else torch.device('cpu')
     os.makedirs(args.checkpoints_dir, exist_ok=True)
 
     model_name = prune_config[args.pruner_name]['model_name']
     dataset_name = prune_config[args.pruner_name]['dataset_name']
     train_loader, test_loader = get_data_loaders(dataset_name, args.batch_size)
+    dummy_input, _ = next(iter(train_loader))
+    dummy_input = dummy_input.to(device)
     model = create_model(model_name).cuda()
     if args.resume_from is not None and os.path.exists(args.resume_from):
         print('loading checkpoint {} ...'.format(args.resume_from))
         model.load_state_dict(torch.load(args.resume_from))
         test(model, device, test_loader)
     else:
-        optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=1e-4)
+        optimizer = torch.optim.SGD(
+            model.parameters(), lr=0.1, momentum=0.9, weight_decay=1e-4)
         if args.multi_gpu and torch.cuda.device_count():
             model = nn.DataParallel(model)
 
@@ -204,17 +225,21 @@ def main(args):
 
     print('start model pruning...')
 
-    model_path = os.path.join(args.checkpoints_dir, 'pruned_{}_{}_{}.pth'.format(model_name, dataset_name, args.pruner_name))
-    mask_path = os.path.join(args.checkpoints_dir, 'mask_{}_{}_{}.pth'.format(model_name, dataset_name, args.pruner_name))
+    model_path = os.path.join(args.checkpoints_dir, 'pruned_{}_{}_{}.pth'.format(
+        model_name, dataset_name, args.pruner_name))
+    mask_path = os.path.join(args.checkpoints_dir, 'mask_{}_{}_{}.pth'.format(
+        model_name, dataset_name, args.pruner_name))
 
     # pruner needs to be initialized from a model not wrapped by DataParallel
     if isinstance(model, nn.DataParallel):
         model = model.module
 
-    optimizer_finetune = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9, weight_decay=1e-4)
+    optimizer_finetune = torch.optim.SGD(
+        model.parameters(), lr=0.001, momentum=0.9, weight_decay=1e-4)
     best_top1 = 0
 
-    pruner = create_pruner(model, args.pruner_name, optimizer_finetune)
+    pruner = create_pruner(model, args.pruner_name,
+                           optimizer_finetune, args.dependency_aware, dummy_input)
     model = pruner.compress()
 
     if args.multi_gpu and torch.cuda.device_count() > 1:
@@ -231,15 +256,23 @@ def main(args):
             # mask_path stores mask_dict of the pruned model
             pruner.export_model(model_path=model_path, mask_path=mask_path)
 
+
 if __name__ == '__main__':
     parser = argparse.ArgumentParser()
-    parser.add_argument("--pruner_name", type=str, default="level", help="pruner name")
+    parser.add_argument("--pruner_name", type=str,
+                        default="level", help="pruner name")
     parser.add_argument("--batch_size", type=int, default=256)
-    parser.add_argument("--pretrain_epochs", type=int, default=10, help="training epochs before model pruning")
-    parser.add_argument("--prune_epochs", type=int, default=10, help="training epochs for model pruning")
-    parser.add_argument("--checkpoints_dir", type=str, default="./checkpoints", help="checkpoints directory")
-    parser.add_argument("--resume_from", type=str, default=None, help="pretrained model weights")
-    parser.add_argument("--multi_gpu", action="store_true", help="Use multiple GPUs for training")
-
+    parser.add_argument("--pretrain_epochs", type=int,
+                        default=10, help="training epochs before model pruning")
+    parser.add_argument("--prune_epochs", type=int, default=10,
+                        help="training epochs for model pruning")
+    parser.add_argument("--checkpoints_dir", type=str,
+                        default="./checkpoints", help="checkpoints directory")
+    parser.add_argument("--resume_from", type=str,
+                        default=None, help="pretrained model weights")
+    parser.add_argument("--multi_gpu", action="store_true",
+                        help="Use multiple GPUs for training")
+    parser.add_argument("--dependency_aware", action="store_true", default=False,
+                        help="If enable the dependency_aware mode for the pruner")
     args = parser.parse_args()
     main(args)
diff --git a/examples/trials/mnist-tfv2/config_remote.yml b/examples/trials/mnist-tfv2/config_remote.yml
new file mode 100644
index 0000000000..0c39914dbc
--- /dev/null
+++ b/examples/trials/mnist-tfv2/config_remote.yml
@@ -0,0 +1,32 @@
+authorName: default
+experimentName: example_mnist
+trialConcurrency: 1
+maxExecDuration: 1h
+maxTrialNum: 10
+#choice: local, remote, pai
+trainingServicePlatform: remote
+searchSpacePath: search_space.json
+#choice: true, false
+useAnnotation: false
+tuner:
+  #choice: TPE, Random, Anneal, Evolution, BatchTuner, MetisTuner
+  #SMAC (SMAC should be installed through nnictl)
+  builtinTunerName: TPE
+  classArgs:
+    #choice: maximize, minimize
+    optimize_mode: maximize
+trial:
+  command: python3 mnist.py
+  codeDir: .
+  gpuNum: 0
+#machineList can be empty if the platform is local
+machineList:
+  - ip: ${replace_to_your_remote_machine_ip}
+    username: ${replace_to_your_remote_machine_username}
+    sshKeyPath: ${replace_to_your_remote_machine_sshKeyPath}
+    # Pre-command will be executed before the remote machine executes other commands.
+    # Below is an example of specifying python environment.
+    # If you want to execute multiple commands, please use "&&" to connect them.
+    # preCommand: source ${replace_to_absolute_path_recommended_here}/bin/activate
+    # preCommand: source ${replace_to_conda_path}/bin/activate ${replace_to_conda_env_name}
+    preCommand: export PATH=${replace_to_python_environment_path_in_your_remote_machine}:$PATH
diff --git a/examples/trials/sklearn/classification/search_space.json b/examples/trials/sklearn/classification/search_space.json
index f63e07bda6..c4b4ffb0c8 100644
--- a/examples/trials/sklearn/classification/search_space.json
+++ b/examples/trials/sklearn/classification/search_space.json
@@ -3,5 +3,5 @@
     "kernel": {"_type":"choice","_value":["linear", "rbf", "poly", "sigmoid"]},
     "degree": {"_type":"choice","_value":[1, 2, 3, 4]},
     "gamma": {"_type":"uniform","_value":[0.01, 0.1]},
-    "coef0 ": {"_type":"uniform","_value":[0.01, 0.1]}
+    "coef0": {"_type":"uniform","_value":[0.01, 0.1]}
 }
\ No newline at end of file
diff --git a/src/nni_manager/common/manager.ts b/src/nni_manager/common/manager.ts
index c003598abc..1f3972ae43 100644
--- a/src/nni_manager/common/manager.ts
+++ b/src/nni_manager/common/manager.ts
@@ -87,6 +87,7 @@ abstract class Manager {
     public abstract getExperimentProfile(): Promise<ExperimentProfile>;
     public abstract updateExperimentProfile(experimentProfile: ExperimentProfile, updateType: ProfileUpdateType): Promise<void>;
     public abstract importData(data: string): Promise<void>;
+    public abstract getImportedData(): Promise<string[]>;
     public abstract exportData(): Promise<string>;
 
     public abstract addCustomizedTrialJob(hyperParams: string): Promise<number>;
diff --git a/src/nni_manager/core/nnimanager.ts b/src/nni_manager/core/nnimanager.ts
index ad243f4835..6ec4d0e21d 100644
--- a/src/nni_manager/core/nnimanager.ts
+++ b/src/nni_manager/core/nnimanager.ts
@@ -108,6 +108,10 @@ class NNIManager implements Manager {
         return this.dataStore.storeTrialJobEvent('IMPORT_DATA', '', data);
     }
 
+    public getImportedData(): Promise<string[]> {
+        return this.dataStore.getImportedData();
+    }
+
     public async exportData(): Promise<string> {
         return this.dataStore.exportTrialHpConfigs();
     }
diff --git a/src/nni_manager/rest_server/restHandler.ts b/src/nni_manager/rest_server/restHandler.ts
index af44d71a01..1a4d87ec02 100644
--- a/src/nni_manager/rest_server/restHandler.ts
+++ b/src/nni_manager/rest_server/restHandler.ts
@@ -47,6 +47,7 @@ class NNIRestHandler {
         this.getExperimentProfile(router);
         this.updateExperimentProfile(router);
         this.importData(router);
+        this.getImportedData(router);
         this.startExperiment(router);
         this.getTrialJobStatistics(router);
         this.setClusterMetaData(router);
@@ -143,6 +144,16 @@ class NNIRestHandler {
         });
     }
 
+    private getImportedData(router: Router): void {
+        router.get('/experiment/imported-data', (req: Request, res: Response) => {
+            this.nniManager.getImportedData().then((importedData: string[]) => {
+                res.send(JSON.stringify(importedData));
+            }).catch((err: Error) => {
+                this.handleError(err, res);
+            });
+        });
+    }
+
     private startExperiment(router: Router): void {
         router.post('/experiment', expressJoi(ValidationSchemas.STARTEXPERIMENT), (req: Request, res: Response) => {
             if (isNewExperiment()) {
diff --git a/src/nni_manager/rest_server/restValidationSchemas.ts b/src/nni_manager/rest_server/restValidationSchemas.ts
index cb1a1282e7..337bdcf0c8 100644
--- a/src/nni_manager/rest_server/restValidationSchemas.ts
+++ b/src/nni_manager/rest_server/restValidationSchemas.ts
@@ -17,7 +17,8 @@ export namespace ValidationSchemas {
                 passphrase: joi.string(),
                 gpuIndices: joi.string(),
                 maxTrialNumPerGpu: joi.number(),
-                useActiveGpu: joi.boolean()
+                useActiveGpu: joi.boolean(),
+                preCommand: joi.string()
             })),
             local_config: joi.object({ // eslint-disable-line @typescript-eslint/camelcase
                 gpuIndices: joi.string(),
diff --git a/src/nni_manager/rest_server/test/mockedNNIManager.ts b/src/nni_manager/rest_server/test/mockedNNIManager.ts
index e45819d6cb..b3bd549c03 100644
--- a/src/nni_manager/rest_server/test/mockedNNIManager.ts
+++ b/src/nni_manager/rest_server/test/mockedNNIManager.ts
@@ -33,6 +33,10 @@ export class MockedNNIManager extends Manager {
     public importData(data: string): Promise<void> {
         return Promise.resolve();
     }
+    public getImportedData(): Promise<string[]> {
+        const ret: string[] = ["1", "2"];
+        return Promise.resolve(ret);
+    }
     public async exportData(): Promise<string> {
         const ret: string = '';
         return Promise.resolve(ret);
diff --git a/src/nni_manager/training_service/remote_machine/extends/linuxCommands.ts b/src/nni_manager/training_service/remote_machine/extends/linuxCommands.ts
index 76d4ac19fc..0dab8a63ea 100644
--- a/src/nni_manager/training_service/remote_machine/extends/linuxCommands.ts
+++ b/src/nni_manager/training_service/remote_machine/extends/linuxCommands.ts
@@ -123,11 +123,19 @@ class LinuxCommands extends OsCommands {
         if (isFile) {
             command = `bash '${script}'`;
         } else {
-            script = script.replace('"', '\\"');
+            script = script.replace(/"/g, '\\"');
             command = `bash -c "${script}"`;
         }
         return command;
     }
+
+    public addPreCommand(preCommand: string | undefined, command: string | undefined): string | undefined{
+        if (command === undefined || command === '' || preCommand === undefined || preCommand === ''){
+            return command;
+        } else {
+            return `${preCommand} && ${command}`;
+        }
+    }
 }
 
 export { LinuxCommands };
diff --git a/src/nni_manager/training_service/remote_machine/extends/windowsCommands.ts b/src/nni_manager/training_service/remote_machine/extends/windowsCommands.ts
index fd2fd2118b..2a81157e29 100644
--- a/src/nni_manager/training_service/remote_machine/extends/windowsCommands.ts
+++ b/src/nni_manager/training_service/remote_machine/extends/windowsCommands.ts
@@ -46,7 +46,7 @@ class WindowsCommands extends OsCommands {
     }
 
     public generateGpuStatsScript(scriptFolder: string): string {
-        return `powershell -command $env:METRIC_OUTPUT_DIR='${scriptFolder}';$app = Start-Process -FilePath python -NoNewWindow -passthru -ArgumentList '-m nni_gpu_tool.gpu_metrics_collector' -RedirectStandardOutput ${scriptFolder}\\scriptstdout -RedirectStandardError ${scriptFolder}\\scriptstderr;Write $PID ^| Out-File ${scriptFolder}\\pid -NoNewline -encoding utf8;wait-process $app.ID`;
+        return `powershell -command $env:Path=If($env:prePath){$env:prePath}Else{$env:Path};$env:METRIC_OUTPUT_DIR='${scriptFolder}';$app = Start-Process -FilePath python -NoNewWindow -passthru -ArgumentList '-m nni_gpu_tool.gpu_metrics_collector' -RedirectStandardOutput ${scriptFolder}\\scriptstdout -RedirectStandardError ${scriptFolder}\\scriptstderr;Write $PID ^| Out-File ${scriptFolder}\\pid -NoNewline -encoding utf8;wait-process $app.ID`;
     }
 
     public createFolder(folderName: string, sharedFolder: boolean = false): string {
@@ -122,6 +122,14 @@ class WindowsCommands extends OsCommands {
         const command = `${script}`;
         return command;
     }
+
+    public addPreCommand(preCommand: string | undefined, command: string | undefined): string | undefined{
+        if (command === undefined || command === '' || preCommand === undefined || preCommand === ''){
+            return command;
+        } else {
+            return `${preCommand} && set prePath=%path% && ${command}`;
+        }
+    }
 }
 
 export { WindowsCommands };
diff --git a/src/nni_manager/training_service/remote_machine/osCommands.ts b/src/nni_manager/training_service/remote_machine/osCommands.ts
index 3c4bcb0f1c..cb110c6694 100644
--- a/src/nni_manager/training_service/remote_machine/osCommands.ts
+++ b/src/nni_manager/training_service/remote_machine/osCommands.ts
@@ -28,6 +28,7 @@ abstract class OsCommands {
     public abstract killChildProcesses(pidFileName: string, killSelf: boolean): string;
     public abstract extractFile(tarFileName: string, targetFolder: string): string;
     public abstract executeScript(script: string, isFile: boolean): string;
+    public abstract addPreCommand(preCommand: string | undefined, command: string | undefined): string | undefined;
 
     public joinPath(...paths: string[]): string {
         let dir: string = paths.filter((path: any) => path !== '').join(this.pathSpliter);
diff --git a/src/nni_manager/training_service/remote_machine/remoteMachineData.ts b/src/nni_manager/training_service/remote_machine/remoteMachineData.ts
index 61024c1fdd..e48d1fbf57 100644
--- a/src/nni_manager/training_service/remote_machine/remoteMachineData.ts
+++ b/src/nni_manager/training_service/remote_machine/remoteMachineData.ts
@@ -23,6 +23,7 @@ export class RemoteMachineMeta {
     //TODO: initialize varialbe in constructor
     public occupiedGpuIndexMap?: Map<number, number>;
     public readonly useActiveGpu?: boolean = false;
+    public readonly preCommand?: string;
 }
 
 /**
diff --git a/src/nni_manager/training_service/remote_machine/shellExecutor.ts b/src/nni_manager/training_service/remote_machine/shellExecutor.ts
index 5f3a0bc78b..b093dfcd0e 100644
--- a/src/nni_manager/training_service/remote_machine/shellExecutor.ts
+++ b/src/nni_manager/training_service/remote_machine/shellExecutor.ts
@@ -32,6 +32,7 @@ class ShellExecutor {
     private tempPath: string = "";
     private isWindows: boolean = false;
     private channelDefaultOutputs: string[] = [];
+    private preCommand: string | undefined;
 
     constructor() {
         this.log = getLogger();
@@ -47,6 +48,7 @@ class ShellExecutor {
             username: rmMeta.username,
             tryKeyboard: true,
         };
+        this.preCommand = rmMeta.preCommand;
         this.name = `${rmMeta.username}@${rmMeta.ip}:${rmMeta.port}`;
         if (rmMeta.passwd !== undefined) {
             connectConfig.password = rmMeta.passwd;
@@ -349,6 +351,9 @@ class ShellExecutor {
         let exitCode: number;
 
         const commandIndex = randomInt(10000);
+        if(this.osCommands !== undefined){
+            command = this.osCommands.addPreCommand(this.preCommand, command);
+        }
         this.log.debug(`remoteExeCommand(${commandIndex}): [${command}]`);
 
         // Windows always uses shell, and it needs to disable to get it works.
diff --git a/src/nni_manager/training_service/remote_machine/test/shellExecutor.test.ts b/src/nni_manager/training_service/remote_machine/test/shellExecutor.test.ts
index 4e9d9ffb68..3aee8ba020 100644
--- a/src/nni_manager/training_service/remote_machine/test/shellExecutor.test.ts
+++ b/src/nni_manager/training_service/remote_machine/test/shellExecutor.test.ts
@@ -36,6 +36,7 @@ async function getRemoteFileContentLoop(executor: ShellExecutor): Promise<void>
 
 describe('ShellExecutor test', () => {
     let skip: boolean = false;
+    let isWindows: boolean;
     let rmMeta: any;
     try {
         rmMeta = JSON.parse(fs.readFileSync('../../.vscode/rminfo.json', 'utf8'));
@@ -86,4 +87,28 @@ describe('ShellExecutor test', () => {
         await getRemoteFileContentLoop(executor);
         await executor.close();
     });
+
+    it('Test preCommand-1', async () => {
+        if (skip) {
+            return;
+        }
+        const executor: ShellExecutor = new ShellExecutor();
+        await executor.initialize(rmMeta);
+        const result = await executor.executeScript("ver", false, false);
+        isWindows = result.exitCode == 0 && result.stdout.search("Windows") > -1;
+        await executor.close();
+    });
+
+    it('Test preCommand-2', async () => {
+        if (skip) {
+            return;
+        }
+        const executor: ShellExecutor = new ShellExecutor();
+        rmMeta.preCommand = isWindows ? "set TEST_PRE_COMMAND=test_pre_command" : "export TEST_PRE_COMMAND=test_pre_command";
+        await executor.initialize(rmMeta);
+        const command = isWindows ? "python -c \"import os; print(os.environ.get(\'TEST_PRE_COMMAND\'))\"" : "python3 -c \"import os; print(os.environ.get(\'TEST_PRE_COMMAND\'))\"";
+        const result = (await executor.executeScript(command, false, false)).stdout.replace(/[\ +\r\n]/g, "");
+        chai.expect(result).eq("test_pre_command");
+        await executor.close();
+    });
 });
diff --git a/src/nni_manager/training_service/test/remoteMachineTrainingService.test.ts b/src/nni_manager/training_service/test/remoteMachineTrainingService.test.ts
index d2446460ee..551d281f76 100644
--- a/src/nni_manager/training_service/test/remoteMachineTrainingService.test.ts
+++ b/src/nni_manager/training_service/test/remoteMachineTrainingService.test.ts
@@ -25,8 +25,8 @@ describe('Unit Test for RemoteMachineTrainingService', () => {
     Default/.vscode/rminfo.json,  whose content looks like:
     {
         "ip": "10.172.121.40",
-        "user": "user1",
-        "password": "mypassword"
+        "username": "user1",
+        "passwd": "mypassword"
     }
     */
     let skip: boolean = false;
diff --git a/src/sdk/pynni/nni/compression/torch/pruning/__init__.py b/src/sdk/pynni/nni/compression/torch/pruning/__init__.py
index 9787ba5291..7b6060d80f 100644
--- a/src/sdk/pynni/nni/compression/torch/pruning/__init__.py
+++ b/src/sdk/pynni/nni/compression/torch/pruning/__init__.py
@@ -13,4 +13,3 @@
 from .auto_compress_pruner import AutoCompressPruner
 from .sensitivity_pruner import SensitivityPruner
 from .amc import AMCPruner
-
diff --git a/src/sdk/pynni/nni/compression/torch/pruning/one_shot.py b/src/sdk/pynni/nni/compression/torch/pruning/one_shot.py
index b58477a653..3a096176d4 100644
--- a/src/sdk/pynni/nni/compression/torch/pruning/one_shot.py
+++ b/src/sdk/pynni/nni/compression/torch/pruning/one_shot.py
@@ -3,14 +3,19 @@
 
 import logging
 from schema import And, Optional
+from nni._graph_utils import TorchModuleGraph
+from nni.compression.torch.utils.shape_dependency import ChannelDependency, GroupDependency
 from .constants import MASKER_DICT
 from ..utils.config_validation import CompressorSchema
 from ..compressor import Pruner
 
-__all__ = ['LevelPruner', 'SlimPruner', 'L1FilterPruner', 'L2FilterPruner', 'FPGMPruner', \
-    'TaylorFOWeightFilterPruner', 'ActivationAPoZRankFilterPruner', 'ActivationMeanRankFilterPruner']
 
-logger = logging.getLogger('torch pruner')
+__all__ = ['LevelPruner', 'SlimPruner', 'L1FilterPruner', 'L2FilterPruner', 'FPGMPruner',
+           'TaylorFOWeightFilterPruner', 'ActivationAPoZRankFilterPruner', 'ActivationMeanRankFilterPruner']
+
+logger = logging.getLogger(__name__)
+logger.setLevel(logging.INFO)
+
 
 class OneshotPruner(Pruner):
     """
@@ -35,7 +40,8 @@ def __init__(self, model, config_list, pruning_algorithm='level', optimizer=None
 
         super().__init__(model, config_list, optimizer)
         self.set_wrappers_attribute("if_calculated", False)
-        self.masker = MASKER_DICT[pruning_algorithm](model, self, **algo_kwargs)
+        self.masker = MASKER_DICT[pruning_algorithm](
+            model, self, **algo_kwargs)
 
     def validate_config(self, model, config_list):
         """
@@ -75,7 +81,8 @@ def calc_mask(self, wrapper, wrapper_idx=None):
 
         sparsity = wrapper.config['sparsity']
         if not wrapper.if_calculated:
-            masks = self.masker.calc_mask(sparsity=sparsity, wrapper=wrapper, wrapper_idx=wrapper_idx)
+            masks = self.masker.calc_mask(
+                sparsity=sparsity, wrapper=wrapper, wrapper_idx=wrapper_idx)
 
             # masker.calc_mask returns None means calc_mask is not calculated sucessfully, can try later
             if masks is not None:
@@ -84,6 +91,7 @@ def calc_mask(self, wrapper, wrapper_idx=None):
         else:
             return None
 
+
 class LevelPruner(OneshotPruner):
     """
     Parameters
@@ -97,9 +105,11 @@ class LevelPruner(OneshotPruner):
     optimizer: torch.optim.Optimizer
             Optimizer used to train model
     """
+
     def __init__(self, model, config_list, optimizer=None):
         super().__init__(model, config_list, pruning_algorithm='level', optimizer=optimizer)
 
+
 class SlimPruner(OneshotPruner):
     """
     Parameters
@@ -113,6 +123,7 @@ class SlimPruner(OneshotPruner):
     optimizer: torch.optim.Optimizer
             Optimizer used to train model
     """
+
     def __init__(self, model, config_list, optimizer=None):
         super().__init__(model, config_list, pruning_algorithm='slim', optimizer=optimizer)
 
@@ -128,9 +139,50 @@ def validate_config(self, model, config_list):
         if len(config_list) > 1:
             logger.warning('Slim pruner only supports 1 configuration')
 
+
 class _StructuredFilterPruner(OneshotPruner):
-    def __init__(self, model, config_list, pruning_algorithm, optimizer=None, **algo_kwargs):
-        super().__init__(model, config_list, pruning_algorithm=pruning_algorithm, optimizer=optimizer, **algo_kwargs)
+    """
+    _StructuredFilterPruner has two ways to calculate the masks
+    for conv layers. In the normal way, the _StructuredFilterPruner
+    will calculate the mask of each layer separately. For example, each
+    conv layer determine which filters should be pruned according to its L1
+    norm. In constrast, in the dependency-aware way, the layers that in a
+    dependency group will be pruned jointly and these layers will be forced
+    to prune the same channels.
+    """
+
+    def __init__(self, model, config_list, pruning_algorithm, optimizer=None, dependency_aware=False, dummy_input=None, **algo_kwargs):
+        super().__init__(model, config_list, pruning_algorithm=pruning_algorithm,
+                         optimizer=optimizer, **algo_kwargs)
+        self.dependency_aware = dependency_aware
+        # set the dependency-aware switch for the masker
+        self.masker.dependency_aware = dependency_aware
+        self.dummy_input = dummy_input
+        if self.dependency_aware:
+            errmsg = "When dependency_aware is set, the dummy_input should not be None"
+            assert self.dummy_input is not None, errmsg
+            # Get the TorchModuleGraph of the target model
+            # to trace the model, we need to unwrap the wrappers
+            self._unwrap_model()
+            self.graph = TorchModuleGraph(model, dummy_input)
+            self._wrap_model()
+            self.channel_depen = ChannelDependency(
+                traced_model=self.graph.trace)
+            self.group_depen = GroupDependency(traced_model=self.graph.trace)
+            self.channel_depen = self.channel_depen.dependency_sets
+            self.channel_depen = {
+                name: sets for sets in self.channel_depen for name in sets}
+            self.group_depen = self.group_depen.dependency_sets
+
+    def update_mask(self):
+        if not self.dependency_aware:
+            # if we use the normal way to update the mask,
+            # then call the update_mask of the father class
+            super(_StructuredFilterPruner, self).update_mask()
+        else:
+            # if we update the mask in a dependency-aware way
+            # then we call _dependency_update_mask
+            self._dependency_update_mask()
 
     def validate_config(self, model, config_list):
         schema = CompressorSchema([{
@@ -141,6 +193,71 @@ def validate_config(self, model, config_list):
 
         schema.validate(config_list)
 
+    def _dependency_calc_mask(self, wrappers, channel_dsets, wrappers_idx=None):
+        """
+        calculate the masks for the conv layers in the same
+        channel dependecy set. All the layers passed in have
+        the same number of channels.
+
+        Parameters
+        ----------
+        wrappers: list
+            The list of the wrappers that in the same channel dependency
+            set.
+        wrappers_idx: list
+            The list of the indexes of wrapppers.
+        Returns
+        -------
+        masks: dict
+            A dict object that contains the masks of the layers in this
+            dependency group, the key is the name of the convolutional layers.
+        """
+        # The number of the groups for each conv layers
+        # Note that, this number may be different from its
+        # original number of groups of filters.
+        groups = [self.group_depen[_w.name] for _w in wrappers]
+        sparsities = [_w.config['sparsity'] for _w in wrappers]
+        masks = self.masker.calc_mask(
+            sparsities, wrappers, wrappers_idx, channel_dsets=channel_dsets, groups=groups)
+        if masks is not None:
+            # if masks is None, then the mask calculation fails.
+            # for example, in activation related maskers, we should
+            # pass enough batches of data to the model, so that the
+            # masks can be calculated successfully.
+            for _w in wrappers:
+                _w.if_calculated = True
+        return masks
+
+    def _dependency_update_mask(self):
+        """
+        In the original update_mask, the wraper of each layer will update its
+        own mask according to the sparsity specified in the config_list. However, in
+        the _dependency_update_mask, we may prune several layers at the same
+        time according the sparsities and the channel/group dependencies.
+        """
+        name2wrapper = {x.name: x for x in self.get_modules_wrapper()}
+        wrapper2index = {x: i for i, x in enumerate(self.get_modules_wrapper())}
+        for wrapper in self.get_modules_wrapper():
+            if wrapper.if_calculated:
+                continue
+            # find all the conv layers that have channel dependecy with this layer
+            # and prune all these layers at the same time.
+            _names = [x for x in self.channel_depen[wrapper.name]]
+            logger.info('Pruning the dependent layers: %s', ','.join(_names))
+            _wrappers = [name2wrapper[name]
+                         for name in _names if name in name2wrapper]
+            _wrapper_idxes = [wrapper2index[_w] for _w in _wrappers]
+
+            masks = self._dependency_calc_mask(
+                _wrappers, _names, wrappers_idx=_wrapper_idxes)
+            if masks is not None:
+                for layer in masks:
+                    for mask_type in masks[layer]:
+                        assert hasattr(
+                            name2wrapper[layer], mask_type), "there is no attribute '%s' in wrapper on %s" % (mask_type, layer)
+                        setattr(name2wrapper[layer], mask_type, masks[layer][mask_type])
+
+
 class L1FilterPruner(_StructuredFilterPruner):
     """
     Parameters
@@ -153,9 +270,23 @@ class L1FilterPruner(_StructuredFilterPruner):
             - op_types : Only Conv2d is supported in L1FilterPruner.
     optimizer: torch.optim.Optimizer
             Optimizer used to train model
+    dependency_aware: bool
+        If prune the model in a dependency-aware way. If it is `True`, this pruner will
+        prune the model according to the l2-norm of weights and the channel-dependency or
+        group-dependency of the model. In this way, the pruner will force the conv layers
+        that have dependencies to prune the same channels, so the speedup module can better
+        harvest the speed benefit from the pruned model. Note that, if this flag is set True
+        , the dummy_input cannot be None, because the pruner needs a dummy input to trace the
+        dependency between the conv layers.
+    dummy_input : torch.Tensor
+        The dummy input to analyze the topology constraints. Note that, the dummy_input
+        should on the same device with the model.
     """
-    def __init__(self, model, config_list, optimizer=None):
-        super().__init__(model, config_list, pruning_algorithm='l1', optimizer=optimizer)
+
+    def __init__(self, model, config_list, optimizer=None, dependency_aware=False, dummy_input=None):
+        super().__init__(model, config_list, pruning_algorithm='l1', optimizer=optimizer,
+                         dependency_aware=dependency_aware, dummy_input=dummy_input)
+
 
 class L2FilterPruner(_StructuredFilterPruner):
     """
@@ -169,9 +300,23 @@ class L2FilterPruner(_StructuredFilterPruner):
             - op_types : Only Conv2d is supported in L2FilterPruner.
     optimizer: torch.optim.Optimizer
             Optimizer used to train model
+    dependency_aware: bool
+        If prune the model in a dependency-aware way. If it is `True`, this pruner will
+        prune the model according to the l2-norm of weights and the channel-dependency or
+        group-dependency of the model. In this way, the pruner will force the conv layers
+        that have dependencies to prune the same channels, so the speedup module can better
+        harvest the speed benefit from the pruned model. Note that, if this flag is set True
+        , the dummy_input cannot be None, because the pruner needs a dummy input to trace the
+        dependency between the conv layers.
+    dummy_input : torch.Tensor
+        The dummy input to analyze the topology constraints. Note that, the dummy_input
+        should on the same device with the model.
     """
-    def __init__(self, model, config_list, optimizer=None):
-        super().__init__(model, config_list, pruning_algorithm='l2', optimizer=optimizer)
+
+    def __init__(self, model, config_list, optimizer=None, dependency_aware=False, dummy_input=None):
+        super().__init__(model, config_list, pruning_algorithm='l2', optimizer=optimizer,
+                         dependency_aware=dependency_aware, dummy_input=dummy_input)
+
 
 class FPGMPruner(_StructuredFilterPruner):
     """
@@ -185,9 +330,23 @@ class FPGMPruner(_StructuredFilterPruner):
             - op_types : Only Conv2d is supported in FPGM Pruner.
     optimizer: torch.optim.Optimizer
             Optimizer used to train model
+    dependency_aware: bool
+        If prune the model in a dependency-aware way. If it is `True`, this pruner will
+        prune the model according to the l2-norm of weights and the channel-dependency or
+        group-dependency of the model. In this way, the pruner will force the conv layers
+        that have dependencies to prune the same channels, so the speedup module can better
+        harvest the speed benefit from the pruned model. Note that, if this flag is set True
+        , the dummy_input cannot be None, because the pruner needs a dummy input to trace the
+        dependency between the conv layers.
+    dummy_input : torch.Tensor
+        The dummy input to analyze the topology constraints. Note that, the dummy_input
+        should on the same device with the model.
     """
-    def __init__(self, model, config_list, optimizer=None):
-        super().__init__(model, config_list, pruning_algorithm='fpgm', optimizer=optimizer)
+
+    def __init__(self, model, config_list, optimizer=None, dependency_aware=False, dummy_input=None):
+        super().__init__(model, config_list, pruning_algorithm='fpgm',
+                         dependency_aware=dependency_aware, dummy_input=dummy_input, optimizer=optimizer)
+
 
 class TaylorFOWeightFilterPruner(_StructuredFilterPruner):
     """
@@ -201,9 +360,28 @@ class TaylorFOWeightFilterPruner(_StructuredFilterPruner):
             - op_types : Currently only Conv2d is supported in TaylorFOWeightFilterPruner.
     optimizer: torch.optim.Optimizer
             Optimizer used to train model
+    statistics_batch_num: int
+        The number of batches to statistic the activation.
+    dependency_aware: bool
+        If prune the model in a dependency-aware way. If it is `True`, this pruner will
+        prune the model according to the l2-norm of weights and the channel-dependency or
+        group-dependency of the model. In this way, the pruner will force the conv layers
+        that have dependencies to prune the same channels, so the speedup module can better
+        harvest the speed benefit from the pruned model. Note that, if this flag is set True
+        , the dummy_input cannot be None, because the pruner needs a dummy input to trace the
+        dependency between the conv layers.
+    dummy_input : torch.Tensor
+        The dummy input to analyze the topology constraints. Note that, the dummy_input
+        should on the same device with the model.
+
     """
-    def __init__(self, model, config_list, optimizer=None, statistics_batch_num=1):
-        super().__init__(model, config_list, pruning_algorithm='taylorfo', optimizer=optimizer, statistics_batch_num=statistics_batch_num)
+
+    def __init__(self, model, config_list, optimizer=None, statistics_batch_num=1,
+                 dependency_aware=False, dummy_input=None):
+        super().__init__(model, config_list, pruning_algorithm='taylorfo',
+                         dependency_aware=dependency_aware, dummy_input=dummy_input,
+                         optimizer=optimizer, statistics_batch_num=statistics_batch_num)
+
 
 class ActivationAPoZRankFilterPruner(_StructuredFilterPruner):
     """
@@ -217,10 +395,30 @@ class ActivationAPoZRankFilterPruner(_StructuredFilterPruner):
             - op_types : Only Conv2d is supported in ActivationAPoZRankFilterPruner.
     optimizer: torch.optim.Optimizer
             Optimizer used to train model
+    activation: str
+        The activation type.
+    statistics_batch_num: int
+        The number of batches to statistic the activation.
+    dependency_aware: bool
+        If prune the model in a dependency-aware way. If it is `True`, this pruner will
+        prune the model according to the l2-norm of weights and the channel-dependency or
+        group-dependency of the model. In this way, the pruner will force the conv layers
+        that have dependencies to prune the same channels, so the speedup module can better
+        harvest the speed benefit from the pruned model. Note that, if this flag is set True
+        , the dummy_input cannot be None, because the pruner needs a dummy input to trace the
+        dependency between the conv layers.
+    dummy_input : torch.Tensor
+        The dummy input to analyze the topology constraints. Note that, the dummy_input
+        should on the same device with the model.
+
     """
-    def __init__(self, model, config_list, optimizer=None, activation='relu', statistics_batch_num=1):
-        super().__init__(model, config_list, pruning_algorithm='apoz', optimizer=optimizer, \
-            activation=activation, statistics_batch_num=statistics_batch_num)
+
+    def __init__(self, model, config_list, optimizer=None, activation='relu',
+                 statistics_batch_num=1, dependency_aware=False, dummy_input=None):
+        super().__init__(model, config_list, pruning_algorithm='apoz', optimizer=optimizer,
+                         dependency_aware=dependency_aware, dummy_input=dummy_input,
+                         activation=activation, statistics_batch_num=statistics_batch_num)
+
 
 class ActivationMeanRankFilterPruner(_StructuredFilterPruner):
     """
@@ -233,8 +431,26 @@ class ActivationMeanRankFilterPruner(_StructuredFilterPruner):
             - sparsity : How much percentage of convolutional filters are to be pruned.
             - op_types : Only Conv2d is supported in ActivationMeanRankFilterPruner.
     optimizer: torch.optim.Optimizer
-            Optimizer used to train model
+            Optimizer used to train model.
+    activation: str
+        The activation type.
+    statistics_batch_num: int
+        The number of batches to statistic the activation.
+    dependency_aware: bool
+        If prune the model in a dependency-aware way. If it is `True`, this pruner will
+        prune the model according to the l2-norm of weights and the channel-dependency or
+        group-dependency of the model. In this way, the pruner will force the conv layers
+        that have dependencies to prune the same channels, so the speedup module can better
+        harvest the speed benefit from the pruned model. Note that, if this flag is set True
+        , the dummy_input cannot be None, because the pruner needs a dummy input to trace the
+        dependency between the conv layers.
+    dummy_input : torch.Tensor
+        The dummy input to analyze the topology constraints. Note that, the dummy_input
+        should on the same device with the model.
     """
-    def __init__(self, model, config_list, optimizer=None, activation='relu', statistics_batch_num=1):
-        super().__init__(model, config_list, pruning_algorithm='mean_activation', optimizer=optimizer, \
-            activation=activation, statistics_batch_num=statistics_batch_num)
+
+    def __init__(self, model, config_list, optimizer=None, activation='relu',
+                 statistics_batch_num=1, dependency_aware=False, dummy_input=None):
+        super().__init__(model, config_list, pruning_algorithm='mean_activation', optimizer=optimizer,
+                         dependency_aware=dependency_aware, dummy_input=dummy_input,
+                         activation=activation, statistics_batch_num=statistics_batch_num)
diff --git a/src/sdk/pynni/nni/compression/torch/pruning/structured_pruning.py b/src/sdk/pynni/nni/compression/torch/pruning/structured_pruning.py
index e1b3dc12ce..4eec7844a7 100644
--- a/src/sdk/pynni/nni/compression/torch/pruning/structured_pruning.py
+++ b/src/sdk/pynni/nni/compression/torch/pruning/structured_pruning.py
@@ -7,12 +7,13 @@
 import torch
 from .weight_masker import WeightMasker
 
-__all__ = ['L1FilterPrunerMasker', 'L2FilterPrunerMasker', 'FPGMPrunerMasker', \
-    'TaylorFOWeightFilterPrunerMasker', 'ActivationAPoZRankFilterPrunerMasker', \
-    'ActivationMeanRankFilterPrunerMasker', 'SlimPrunerMasker', 'AMCWeightMasker']
+__all__ = ['L1FilterPrunerMasker', 'L2FilterPrunerMasker', 'FPGMPrunerMasker',
+           'TaylorFOWeightFilterPrunerMasker', 'ActivationAPoZRankFilterPrunerMasker',
+           'ActivationMeanRankFilterPrunerMasker', 'SlimPrunerMasker', 'AMCWeightMasker']
 
 logger = logging.getLogger('torch filter pruners')
 
+
 class StructuredWeightMasker(WeightMasker):
     """
     A structured pruning masker base class that prunes convolutional layer filters.
@@ -31,14 +32,48 @@ class StructuredWeightMasker(WeightMasker):
         be round up to 28 (which can be divided by 4) and only 4 filters are pruned.
 
     """
-    def __init__(self, model, pruner, preserve_round=1):
+
+    def __init__(self, model, pruner, preserve_round=1, dependency_aware=False):
         self.model = model
         self.pruner = pruner
         self.preserve_round = preserve_round
+        self.dependency_aware = dependency_aware
 
-    def calc_mask(self, sparsity, wrapper, wrapper_idx=None):
+    def calc_mask(self, sparsity, wrapper, wrapper_idx=None, **depen_kwargs):
         """
-        Calculate the mask of given layer.
+        calculate the mask for `wrapper`.
+        Parameters
+        ----------
+        sparsity: float/list of float
+            The target sparsity of the wrapper. If we calculate the mask in
+            the normal way, then sparsity is a float number. In contrast, if
+            we calculate the mask in the dependency-aware way, sparsity is a
+            list of float numbers, each float number corressponds to a sparsity
+            of a layer.
+        wrapper: PrunerModuleWrapper/list of PrunerModuleWrappers
+            The wrapper of the target layer. If we calculate the mask in the normal
+            way, then `wrapper` is an instance of PrunerModuleWrapper, else `wrapper`
+            is a list of PrunerModuleWrapper.
+        wrapper_idx: int/list of int
+            The index of the wrapper.
+        depen_kwargs: dict
+            The kw_args for the dependency-aware mode.
+        """
+        if not self.dependency_aware:
+            # calculate the mask in the normal way, each layer calculate its
+            # own mask separately
+            return self._normal_calc_mask(sparsity, wrapper, wrapper_idx)
+        else:
+            # if the dependency_aware switch is on, then calculate the mask
+            # in the dependency-aware way
+            return self._dependency_calc_mask(sparsity, wrapper, wrapper_idx, **depen_kwargs)
+
+    def _get_current_state(self, sparsity, wrapper, wrapper_idx=None):
+        """
+        Some pruner may prune the layers in a iterative way. In each pruning iteration,
+        we may get the current state of this wrapper/layer, and continue to prune this layer
+        based on the current state. This function is to get the current pruning state of the
+        target wrapper/layer.
         Parameters
         ----------
         sparsity: float
@@ -49,10 +84,14 @@ def calc_mask(self, sparsity, wrapper, wrapper_idx=None):
             index of this wrapper in pruner's all wrappers
         Returns
         -------
-        dict
-            dictionary for storing masks, keys of the dict:
-            'weight_mask':  weight mask tensor
-            'bias_mask': bias mask tensor (optional)
+        base_mask: dict
+            dict object that stores the mask of this wrapper in this iteration, if it is the
+            first iteration, then we create a new mask with all ones. If there is already a
+            mask in this wrapper, then we return the existing mask.
+        weight: tensor
+            the current weight of this layer
+        num_prune: int
+            how many filters we should prune
         """
         msg = 'module type {} is not supported!'.format(wrapper.type)
         assert wrapper.type == 'Conv2d', msg
@@ -78,17 +117,178 @@ def calc_mask(self, sparsity, wrapper, wrapper_idx=None):
         num_prune = int(num_total * sparsity)
         if self.preserve_round > 1:
             num_preserve = num_total - num_prune
-            num_preserve = int(math.ceil(num_preserve * 1. / self.preserve_round) * self.preserve_round)
+            num_preserve = int(
+                math.ceil(num_preserve * 1. / self.preserve_round) * self.preserve_round)
             if num_preserve > num_total:
-                num_preserve = int(math.floor(num_total * 1. / self.preserve_round) * self.preserve_round)
+                num_preserve = int(math.floor(
+                    num_total * 1. / self.preserve_round) * self.preserve_round)
             num_prune = num_total - num_preserve
+        # weight*mask_weight: apply base mask for iterative pruning
+        return mask, weight * mask_weight, num_prune
 
+    def _normal_calc_mask(self, sparsity, wrapper, wrapper_idx=None):
+        """
+        Calculate the mask of given layer.
+        Parameters
+        ----------
+        sparsity: float
+            pruning ratio,  preserved weight ratio is `1 - sparsity`
+        wrapper: PrunerModuleWrapper
+            layer wrapper of this layer
+        wrapper_idx: int
+            index of this wrapper in pruner's all wrappers
+        Returns
+        -------
+        dict
+            dictionary for storing masks, keys of the dict:
+            'weight_mask':  weight mask tensor
+            'bias_mask': bias mask tensor (optional)
+        """
+        mask, weight, num_prune = self._get_current_state(
+            sparsity, wrapper, wrapper_idx)
+        num_total = weight.size(0)
         if num_total < 2 or num_prune < 1:
             return mask
-        # weight*mask_weight: apply base mask for iterative pruning
-        return self.get_mask(mask, weight*mask_weight, num_prune, wrapper, wrapper_idx)
 
-    def get_mask(self, base_mask, weight, num_prune, wrapper, wrapper_idx):
+        return self.get_mask(mask, weight, num_prune, wrapper, wrapper_idx)
+
+    def _common_channel_to_prune(self, sparsities, wrappers, wrappers_idx, channel_dsets, groups):
+        """
+        Calculate the common channels should be pruned by all the layers in this group.
+        This function is for filter pruning of Conv layers. if want to support the dependency-aware
+        mode for others ops, you need to inherit this class and overwrite `_common_channel_to_prune`.
+
+        Parameters
+        ----------
+        sparsities : list
+            List of float that specify the sparsity for each conv layer.
+        wrappers : list
+            List of wrappers
+        groups : list
+            The number of the filter groups of each layer.
+        wrappers_idx : list
+            The indexes of the wrappers
+        """
+        # sparsity configs for each wrapper
+        # sparsities = [_w.config['sparsity'] for _w in wrappers]
+        # check the type of the input wrappers
+        for _w in wrappers:
+            msg = 'module type {} is not supported!'.format(_w.type)
+            assert _w.type == 'Conv2d', msg
+        # Among the dependent layers, the layer with smallest
+        # sparsity determines the final benefit of the speedup
+        # module. To better harvest the speed benefit, we need
+        # to ensure that these dependent layers have at least
+        # `min_sparsity` pruned channel are the same.
+        if len(channel_dsets) == len(wrappers):
+            # all the layers in the dependency sets are pruned
+            min_sparsity = min(sparsities)
+        else:
+            # not all the layers in the dependency set
+            # are pruned
+            min_sparsity = 0
+        # donnot prune the channels that we cannot harvest the speed from
+        sparsities = [min_sparsity] * len(sparsities)
+        # find the max number of the filter groups of the dependent
+        # layers. The group constraint of this dependency set is decided
+        # by the layer with the max groups.
+
+        # should use the least common multiple for all the groups
+        # the max_group is lower than the channel_count, because
+        # the number of the filter is always divisible by the number of the group
+        max_group = np.lcm.reduce(groups)
+        channel_count = wrappers[0].module.weight.data.size(0)
+        device = wrappers[0].module.weight.device
+        channel_sum = torch.zeros(channel_count).to(device)
+        for _w, _w_idx in zip(wrappers, wrappers_idx):
+            # calculate the L1/L2 sum for all channels
+            c_sum = self.get_channel_sum(_w, _w_idx)
+
+            if c_sum is None:
+                # if the channel sum cannot be calculated
+                # now, return None
+                return None
+            channel_sum += c_sum
+
+        # prune the same `min_sparsity` channels based on channel_sum
+        # for all the layers in the channel sparsity
+        target_pruned = int(channel_count * min_sparsity)
+        # pruned_per_group may be zero, for example dw conv
+        pruned_per_group = int(target_pruned / max_group)
+        group_step = int(channel_count / max_group)
+
+        channel_masks = []
+        for gid in range(max_group):
+            _start = gid * group_step
+            _end = (gid + 1) * group_step
+            if pruned_per_group > 0:
+                threshold = torch.topk(
+                    channel_sum[_start: _end], pruned_per_group, largest=False)[0].max()
+                group_mask = torch.gt(channel_sum[_start:_end], threshold)
+            else:
+                group_mask = torch.ones(group_step).to(device)
+            channel_masks.append(group_mask)
+        channel_masks = torch.cat(channel_masks, dim=0)
+        pruned_channel_index = (
+            channel_masks == False).nonzero().squeeze(1).tolist()
+        logger.info('Prune the %s channels for all dependent',
+                    ','.join([str(x) for x in pruned_channel_index]))
+        return channel_masks
+
+    def _dependency_calc_mask(self, sparsities, wrappers, wrappers_idx, channel_dsets, groups):
+        """
+        Calculate the masks for the layers in the same dependency sets.
+        Similar to the traditional original calc_mask, _dependency_calc_mask
+        will prune the target layers based on the L1/L2 norm of the weights.
+        However, StructuredWeightMasker prunes the filter completely based on the
+        L1/L2 norm of each filter. In contrast, _dependency_calc_mask
+        will try to satisfy the channel/group dependency(see nni.compression.torch.
+        utils.shape_dependency for details). Specifically, _dependency_calc_mask
+        will try to prune the same channels for the layers that have channel dependency.
+        In addition, this mask calculator will also ensure that the number of filters
+        pruned in each group is the same(meet the group dependency).
+
+        Parameters
+        ----------
+        sparsities : list
+            List of float that specify the sparsity for each conv layer.
+        wrappers : list
+            List of wrappers
+        groups : list
+            The number of the filter groups of each layer.
+        wrappers_idx : list
+            The indexes of the wrappers
+        """
+        channel_masks = self._common_channel_to_prune(
+            sparsities, wrappers, wrappers_idx, channel_dsets, groups)
+        # calculate the mask for each layer based on channel_masks, first
+        # every layer will prune the same channels masked in channel_masks.
+        # If the sparsity of a layers is larger than min_sparsity, then it
+        # will continue prune sparsity - min_sparsity channels to meet the sparsity
+        # config.
+        masks = {}
+        for _pos, _w in enumerate(wrappers):
+            _w_idx = wrappers_idx[_pos]
+            sparsity = sparsities[_pos]
+            name = _w.name
+
+            # _tmp_mask = self._normal_calc_mask(
+            #     sparsity, _w, _w_idx, channel_masks)
+            base_mask, current_weight, num_prune = self._get_current_state(
+                sparsity, _w, _w_idx)
+            num_total = current_weight.size(0)
+            if num_total < 2 or num_prune < 1:
+                return base_mask
+            _tmp_mask = self.get_mask(
+                base_mask, current_weight, num_prune, _w, _w_idx, channel_masks)
+
+            if _tmp_mask is None:
+                # if the mask calculation fails
+                return None
+            masks[name] = _tmp_mask
+        return masks
+
+    def get_mask(self, base_mask, weight, num_prune, wrapper, wrapper_idx, channel_masks=None):
         """
         Calculate the mask of given layer.
         Parameters
@@ -103,12 +303,38 @@ def get_mask(self, base_mask, weight, num_prune, wrapper, wrapper_idx):
             layer wrapper of this layer
         wrapper_idx: int
             index of this wrapper in pruner's all wrappers
+        channel_masks: Tensor
+            If mask some channels for this layer in advance. In the dependency-aware
+            mode, before calculating the masks for each layer, we will calculate a common
+            mask for all the layers in the dependency set. For the pruners that doesnot
+            support dependency-aware mode, they can just ignore this parameter.
         Returns
         -------
         dict
             dictionary for storing masks
         """
-        raise NotImplementedError('{} get_mask is not implemented'.format(self.__class__.__name__))
+        raise NotImplementedError(
+            '{} get_mask is not implemented'.format(self.__class__.__name__))
+
+    def get_channel_sum(self, wrapper, wrapper_idx):
+        """
+        Calculate the importance weight for each channel. If want to support the
+        dependency-aware mode for this one-shot pruner, this function must be
+        implemented.
+        Parameters
+        ----------
+        wrapper: PrunerModuleWrapper
+            layer wrapper of this layer
+        wrapper_idx: int
+            index of this wrapper in pruner's all wrappers
+        Returns
+        -------
+        tensor
+            Tensor that indicates the importance of each channel
+        """
+        raise NotImplementedError(
+            '{} get_channel_sum is not implemented'.format(self.__class__.__name__))
+
 
 class L1FilterPrunerMasker(StructuredWeightMasker):
     """
@@ -119,30 +345,56 @@ class L1FilterPrunerMasker(StructuredWeightMasker):
     https://arxiv.org/abs/1608.08710
     """
 
-    def get_mask(self, base_mask, weight, num_prune, wrapper, wrapper_idx):
+    def get_mask(self, base_mask, weight, num_prune, wrapper, wrapper_idx, channel_masks=None):
+        # get the l1-norm sum for each filter
+        w_abs_structured = self.get_channel_sum(wrapper, wrapper_idx)
+        if channel_masks is not None:
+            # if we need to mask some channels in advance
+            w_abs_structured = w_abs_structured * channel_masks
+        threshold = torch.topk(w_abs_structured.view(-1),
+                               num_prune, largest=False)[0].max()
+        mask_weight = torch.gt(w_abs_structured, threshold)[
+            :, None, None, None].expand_as(weight).type_as(weight)
+        mask_bias = torch.gt(w_abs_structured, threshold).type_as(
+            weight).detach() if base_mask['bias_mask'] is not None else None
+
+        return {'weight_mask': mask_weight.detach(), 'bias_mask': mask_bias}
+
+    def get_channel_sum(self, wrapper, wrapper_idx):
+        weight = wrapper.module.weight.data
         filters = weight.shape[0]
         w_abs = weight.abs()
         w_abs_structured = w_abs.view(filters, -1).sum(dim=1)
-        threshold = torch.topk(w_abs_structured.view(-1), num_prune, largest=False)[0].max()
-        mask_weight = torch.gt(w_abs_structured, threshold)[:, None, None, None].expand_as(weight).type_as(weight)
-        mask_bias = torch.gt(w_abs_structured, threshold).type_as(weight).detach() if base_mask['bias_mask'] is not None else None
+        return w_abs_structured
 
-        return {'weight_mask': mask_weight.detach(), 'bias_mask': mask_bias}
 
 class L2FilterPrunerMasker(StructuredWeightMasker):
     """
     A structured pruning algorithm that prunes the filters with the
     smallest L2 norm of the weights.
     """
-    def get_mask(self, base_mask, weight, num_prune, wrapper, wrapper_idx):
+
+    def get_mask(self, base_mask, weight, num_prune, wrapper, wrapper_idx, channel_masks=None):
+        # get the l2-norm sum for each filter
+        w_l2_norm = self.get_channel_sum(wrapper, wrapper_idx)
+        if channel_masks is not None:
+            # if we need to mask some channels in advance
+            w_l2_norm = w_l2_norm * channel_masks
+        threshold = torch.topk(
+            w_l2_norm.view(-1), num_prune, largest=False)[0].max()
+        mask_weight = torch.gt(w_l2_norm, threshold)[
+            :, None, None, None].expand_as(weight).type_as(weight)
+        mask_bias = torch.gt(w_l2_norm, threshold).type_as(
+            weight).detach() if base_mask['bias_mask'] is not None else None
+
+        return {'weight_mask': mask_weight.detach(), 'bias_mask': mask_bias}
+
+    def get_channel_sum(self, wrapper, wrapper_idx):
+        weight = wrapper.module.weight.data
         filters = weight.shape[0]
         w = weight.view(filters, -1)
         w_l2_norm = torch.sqrt((w ** 2).sum(dim=1))
-        threshold = torch.topk(w_l2_norm.view(-1), num_prune, largest=False)[0].max()
-        mask_weight = torch.gt(w_l2_norm, threshold)[:, None, None, None].expand_as(weight).type_as(weight)
-        mask_bias = torch.gt(w_l2_norm, threshold).type_as(weight).detach() if base_mask['bias_mask'] is not None else None
-
-        return {'weight_mask': mask_weight.detach(), 'bias_mask': mask_bias}
+        return w_l2_norm
 
 
 class FPGMPrunerMasker(StructuredWeightMasker):
@@ -151,22 +403,23 @@ class FPGMPrunerMasker(StructuredWeightMasker):
     "Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration",
     https://arxiv.org/pdf/1811.00250.pdf
     """
-    def get_mask(self, base_mask, weight, num_prune, wrapper, wrapper_idx):
-        min_gm_idx = self._get_min_gm_kernel_idx(weight, num_prune)
+
+    def get_mask(self, base_mask, weight, num_prune, wrapper, wrapper_idx, channel_masks=None):
+        min_gm_idx = self._get_min_gm_kernel_idx(
+            num_prune, wrapper, wrapper_idx, channel_masks)
         for idx in min_gm_idx:
             base_mask['weight_mask'][idx] = 0.
             if base_mask['bias_mask'] is not None:
                 base_mask['bias_mask'][idx] = 0.
         return base_mask
 
-    def _get_min_gm_kernel_idx(self, weight, n):
-        assert len(weight.size()) in [3, 4]
-
-        dist_list = []
-        for out_i in range(weight.size(0)):
-            dist_sum = self._get_distance_sum(weight, out_i)
-            dist_list.append((dist_sum, out_i))
-        min_gm_kernels = sorted(dist_list, key=lambda x: x[0])[:n]
+    def _get_min_gm_kernel_idx(self, num_prune, wrapper, wrapper_idx, channel_masks):
+        channel_dist = self.get_channel_sum(wrapper, wrapper_idx)
+        if channel_masks is not None:
+            channel_dist = channel_dist * channel_masks
+        dist_list = [(channel_dist[i], i)
+                     for i in range(channel_dist.size(0))]
+        min_gm_kernels = sorted(dist_list, key=lambda x: x[0])[:num_prune]
         return [x[1] for x in min_gm_kernels]
 
     def _get_distance_sum(self, weight, out_idx):
@@ -195,6 +448,16 @@ def _get_distance_sum(self, weight, out_idx):
         x = torch.sqrt(x)
         return x.sum()
 
+    def get_channel_sum(self, wrapper, wrapper_idx):
+        weight = wrapper.module.weight.data
+        assert len(weight.size()) in [3, 4]
+        dist_list = []
+        for out_i in range(weight.size(0)):
+            dist_sum = self._get_distance_sum(weight, out_i)
+            dist_list.append(dist_sum)
+        return torch.Tensor(dist_list).to(weight.device)
+
+
 class TaylorFOWeightFilterPrunerMasker(StructuredWeightMasker):
     """
     A structured pruning algorithm that prunes the filters with the smallest
@@ -203,6 +466,7 @@ class TaylorFOWeightFilterPrunerMasker(StructuredWeightMasker):
     "Importance Estimation for Neural Network Pruning", CVPR 2019.
     http://jankautz.com/publications/Importance4NNPruning_CVPR19.pdf
     """
+
     def __init__(self, model, pruner, statistics_batch_num=1):
         super().__init__(model, pruner)
         self.pruner.statistics_batch_num = statistics_batch_num
@@ -210,14 +474,14 @@ def __init__(self, model, pruner, statistics_batch_num=1):
         self.pruner.iterations = 0
         self.pruner.patch_optimizer(self.calc_contributions)
 
-    def get_mask(self, base_mask, weight, num_prune, wrapper, wrapper_idx):
-        if self.pruner.iterations < self.pruner.statistics_batch_num:
-            return None
-
-        if wrapper.contribution is None:
+    def get_mask(self, base_mask, weight, num_prune, wrapper, wrapper_idx, channel_masks=None):
+        channel_contribution = self.get_channel_sum(wrapper, wrapper_idx)
+        if channel_contribution is None:
+            # iteration is not enough
             return None
-
-        prune_indices = torch.argsort(wrapper.contribution)[:num_prune]
+        if channel_masks is not None:
+            channel_contribution = channel_contribution * channel_masks
+        prune_indices = torch.argsort(channel_contribution)[:num_prune]
         for idx in prune_indices:
             base_mask['weight_mask'][idx] = 0.
             if base_mask['bias_mask'] is not None:
@@ -233,7 +497,8 @@ def calc_contributions(self):
             return
         for wrapper in self.pruner.get_modules_wrapper():
             filters = wrapper.module.weight.size(0)
-            contribution = (wrapper.module.weight*wrapper.module.weight.grad).data.pow(2).view(filters, -1).sum(dim=1)
+            contribution = (
+                wrapper.module.weight*wrapper.module.weight.grad).data.pow(2).view(filters, -1).sum(dim=1)
             if wrapper.contribution is None:
                 wrapper.contribution = contribution
             else:
@@ -241,6 +506,13 @@ def calc_contributions(self):
 
         self.pruner.iterations += 1
 
+    def get_channel_sum(self, wrapper, wrapper_idx):
+        if self.pruner.iterations < self.pruner.statistics_batch_num:
+            return None
+        if wrapper.contribution is None:
+            return None
+        return wrapper.contribution
+
 
 class ActivationFilterPrunerMasker(StructuredWeightMasker):
     def __init__(self, model, pruner, statistics_batch_num=1, activation='relu'):
@@ -259,7 +531,8 @@ def __init__(self, model, pruner, statistics_batch_num=1, activation='relu'):
     def _add_activation_collector(self, pruner):
         def collector(collected_activation):
             def hook(module_, input_, output):
-                collected_activation.append(pruner.activation(output.detach().cpu()))
+                collected_activation.append(
+                    pruner.activation(output.detach().cpu()))
             return hook
         pruner.collected_activation = {}
         pruner._fwd_hook_id += 1
@@ -267,11 +540,13 @@ def hook(module_, input_, output):
 
         for wrapper_idx, wrapper in enumerate(pruner.get_modules_wrapper()):
             pruner.collected_activation[wrapper_idx] = []
-            handle = wrapper.register_forward_hook(collector(pruner.collected_activation[wrapper_idx]))
+            handle = wrapper.register_forward_hook(
+                collector(pruner.collected_activation[wrapper_idx]))
 
             pruner._fwd_hook_handles[pruner._fwd_hook_id].append(handle)
         return pruner._fwd_hook_id
 
+
 class ActivationAPoZRankFilterPrunerMasker(ActivationFilterPrunerMasker):
     """
     A structured pruning algorithm that prunes the filters with the
@@ -280,19 +555,22 @@ class ActivationAPoZRankFilterPrunerMasker(ActivationFilterPrunerMasker):
     "Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures", ICLR 2016.
     https://arxiv.org/abs/1607.03250
     """
-    def get_mask(self, base_mask, weight, num_prune, wrapper, wrapper_idx):
-        assert wrapper_idx is not None
-        activations = self.pruner.collected_activation[wrapper_idx]
-        if len(activations) < self.statistics_batch_num:
+
+    def get_mask(self, base_mask, weight, num_prune, wrapper, wrapper_idx, channel_masks=None):
+        apoz = self.get_channel_sum(wrapper, wrapper_idx)
+        if apoz is None:
+            # the collected activations are not enough
             return None
-        apoz = self._calc_apoz(activations)
-        prune_indices = torch.argsort(apoz, descending=True)[:num_prune]
+        if channel_masks is not None:
+            apoz = apoz * channel_masks
+
+        prune_indices = torch.argsort(apoz)[:num_prune]
         for idx in prune_indices:
             base_mask['weight_mask'][idx] = 0.
             if base_mask['bias_mask'] is not None:
                 base_mask['bias_mask'][idx] = 0.
 
-        if len(activations) >= self.statistics_batch_num and self.pruner.hook_id in self.pruner._fwd_hook_handles:
+        if self.pruner.hook_id in self.pruner._fwd_hook_handles:
             self.pruner.remove_activation_collector(self.pruner.hook_id)
 
         return base_mask
@@ -313,8 +591,18 @@ def _calc_apoz(self, activations):
         """
         activations = torch.cat(activations, 0)
         _eq_zero = torch.eq(activations, torch.zeros_like(activations))
-        _apoz = torch.sum(_eq_zero, dim=(0, 2, 3)) / torch.numel(_eq_zero[:, 0, :, :])
-        return _apoz
+        _apoz = torch.sum(_eq_zero, dim=(0, 2, 3), dtype=torch.float64) / \
+            torch.numel(_eq_zero[:, 0, :, :])
+        return torch.ones_like(_apoz) - _apoz
+
+    def get_channel_sum(self, wrapper, wrapper_idx):
+        assert wrapper_idx is not None
+        activations = self.pruner.collected_activation[wrapper_idx]
+        if len(activations) < self.statistics_batch_num:
+            # collected activations is not enough
+            return None
+        return self._calc_apoz(activations).to(wrapper.module.weight.device)
+
 
 class ActivationMeanRankFilterPrunerMasker(ActivationFilterPrunerMasker):
     """
@@ -324,19 +612,24 @@ class ActivationMeanRankFilterPrunerMasker(ActivationFilterPrunerMasker):
     "Pruning Convolutional Neural Networks for Resource Efficient Inference", ICLR 2017.
     https://arxiv.org/abs/1611.06440
     """
-    def get_mask(self, base_mask, weight, num_prune, wrapper, wrapper_idx):
-        assert wrapper_idx is not None
-        activations = self.pruner.collected_activation[wrapper_idx]
-        if len(activations) < self.statistics_batch_num:
+
+    def get_mask(self, base_mask, weight, num_prune, wrapper, wrapper_idx, channel_masks=None):
+
+        mean_activation = self.get_channel_sum(wrapper, wrapper_idx)
+        if mean_activation is None:
+            # the collected activation is not enough
             return None
-        mean_activation = self._cal_mean_activation(activations)
+        if channel_masks is not None:
+            mean_activation = mean_activation * channel_masks
+
         prune_indices = torch.argsort(mean_activation)[:num_prune]
         for idx in prune_indices:
             base_mask['weight_mask'][idx] = 0.
             if base_mask['bias_mask'] is not None:
                 base_mask['bias_mask'][idx] = 0.
-
-        if len(activations) >= self.statistics_batch_num and self.pruner.hook_id in self.pruner._fwd_hook_handles:
+        # if len(activations) < self.statistics_batch_num, the code
+        # cannot reach here
+        if self.pruner.hook_id in self.pruner._fwd_hook_handles:
             self.pruner.remove_activation_collector(self.pruner.hook_id)
 
         return base_mask
@@ -359,6 +652,17 @@ def _cal_mean_activation(self, activations):
         mean_activation = torch.mean(activations, dim=(0, 2, 3))
         return mean_activation
 
+    def get_channel_sum(self, wrapper, wrapper_idx):
+        assert wrapper_idx is not None
+        activations = self.pruner.collected_activation[wrapper_idx]
+        if len(activations) < self.statistics_batch_num:
+            return None
+        # the memory overhead here is acceptable, because only
+        # the mean_activation tensor returned by _cal_mean_activation
+        # is transfer to gpu.
+        return self._cal_mean_activation(activations).to(wrapper.module.weight.device)
+
+
 class SlimPrunerMasker(WeightMasker):
     """
     A structured pruning algorithm that prunes channels by pruning the weights of BN layers.
@@ -374,7 +678,8 @@ def __init__(self, model, pruner, **kwargs):
             weight_list.append(layer.module.weight.data.abs().clone())
         all_bn_weights = torch.cat(weight_list)
         k = int(all_bn_weights.shape[0] * pruner.config_list[0]['sparsity'])
-        self.global_threshold = torch.topk(all_bn_weights.view(-1), k, largest=False)[0].max()
+        self.global_threshold = torch.topk(
+            all_bn_weights.view(-1), k, largest=False)[0].max()
 
     def calc_mask(self, sparsity, wrapper, wrapper_idx=None):
         assert wrapper.type == 'BatchNorm2d', 'SlimPruner only supports 2d batch normalization layer pruning'
@@ -384,22 +689,27 @@ def calc_mask(self, sparsity, wrapper, wrapper_idx=None):
             weight = weight * wrapper.weight_mask
 
         base_mask = torch.ones(weight.size()).type_as(weight).detach()
-        mask = {'weight_mask': base_mask.detach(), 'bias_mask': base_mask.clone().detach()}
+        mask = {'weight_mask': base_mask.detach(
+        ), 'bias_mask': base_mask.clone().detach()}
         filters = weight.size(0)
         num_prune = int(filters * sparsity)
         if filters >= 2 and num_prune >= 1:
             w_abs = weight.abs()
-            mask_weight = torch.gt(w_abs, self.global_threshold).type_as(weight)
+            mask_weight = torch.gt(
+                w_abs, self.global_threshold).type_as(weight)
             mask_bias = mask_weight.clone()
-            mask = {'weight_mask': mask_weight.detach(), 'bias_mask': mask_bias.detach()}
+            mask = {'weight_mask': mask_weight.detach(
+            ), 'bias_mask': mask_bias.detach()}
         return mask
 
+
 def least_square_sklearn(X, Y):
     from sklearn.linear_model import LinearRegression
     reg = LinearRegression(fit_intercept=False)
     reg.fit(X, Y)
     return reg.coef_
 
+
 class AMCWeightMasker(WeightMasker):
     """
     Weight maskser class for AMC pruner. Currently, AMCPruner only supports pruning kernel
@@ -420,6 +730,7 @@ class AMCWeightMasker(WeightMasker):
         32 - 6 = 26 filters are preserved. If preserve_round is 4, preserved filters will
         be round up to 28 (which can be divided by 4) and only 4 filters are pruned.
     """
+
     def __init__(self, model, pruner, preserve_round=1):
         self.model = model
         self.pruner = pruner
@@ -467,7 +778,8 @@ def calc_mask(self, sparsity, wrapper, wrapper_idx=None, preserve_idx=None):
         num_prune = int(num_total * sparsity)
         if self.preserve_round > 1:
             num_preserve = num_total - num_prune
-            num_preserve = int(math.ceil(num_preserve * 1. / self.preserve_round) * self.preserve_round)
+            num_preserve = int(
+                math.ceil(num_preserve * 1. / self.preserve_round) * self.preserve_round)
             if num_preserve > num_total:
                 num_preserve = num_total
             num_prune = num_total - num_preserve
@@ -484,7 +796,8 @@ def get_mask(self, base_mask, weight, num_preserve, wrapper, wrapper_idx, preser
 
         if preserve_idx is None:
             importance = np.abs(w).sum((0, 2, 3))
-            sorted_idx = np.argsort(-importance)  # sum magnitude along C_in, sort descend
+            # sum magnitude along C_in, sort descend
+            sorted_idx = np.argsort(-importance)
             d_prime = num_preserve
             preserve_idx = sorted_idx[:d_prime]  # to preserve index
         else:
@@ -499,10 +812,13 @@ def get_mask(self, base_mask, weight, num_preserve, wrapper, wrapper_idx, preser
         masked_X = X[:, mask]
         if w.shape[2] == 1:  # 1x1 conv or fc
             rec_weight = least_square_sklearn(X=masked_X, Y=Y)
-            rec_weight = rec_weight.reshape(-1, 1, 1, d_prime)  # (C_out, K_h, K_w, C_in')
-            rec_weight = np.transpose(rec_weight, (0, 3, 1, 2))  # (C_out, C_in', K_h, K_w)
+            # (C_out, K_h, K_w, C_in')
+            rec_weight = rec_weight.reshape(-1, 1, 1, d_prime)
+            # (C_out, C_in', K_h, K_w)
+            rec_weight = np.transpose(rec_weight, (0, 3, 1, 2))
         else:
-            raise NotImplementedError('Current code only supports 1x1 conv now!')
+            raise NotImplementedError(
+                'Current code only supports 1x1 conv now!')
         rec_weight_pad = np.zeros_like(w)
         # pylint: disable=all
         rec_weight_pad[:, mask, :, :] = rec_weight
@@ -513,7 +829,8 @@ def get_mask(self, base_mask, weight, num_preserve, wrapper, wrapper_idx, preser
             assert len(rec_weight.shape) == 2
 
         # now assign
-        wrapper.module.weight.data = torch.from_numpy(rec_weight).to(weight.device)
+        wrapper.module.weight.data = torch.from_numpy(
+            rec_weight).to(weight.device)
 
         mask_weight = torch.zeros_like(weight)
         if wrapper.type == 'Linear':
diff --git a/src/sdk/pynni/nni/compression/torch/utils/counter.py b/src/sdk/pynni/nni/compression/torch/utils/counter.py
index 85141db7ac..f4a3db7aa7 100644
--- a/src/sdk/pynni/nni/compression/torch/utils/counter.py
+++ b/src/sdk/pynni/nni/compression/torch/utils/counter.py
@@ -12,7 +12,7 @@
     raise
 
 
-def count_flops_params(model: nn.Module, input_size, verbose=True):
+def count_flops_params(model: nn.Module, input_size, custom_ops=None, verbose=True):
     """
     Count FLOPs and Params of the given model.
     This function would identify the mask on the module
@@ -28,7 +28,10 @@ def count_flops_params(model: nn.Module, input_size, verbose=True):
         target model.
     input_size: list, tuple
         the input shape of data
-
+    custom_ops: dict
+        a mapping of (module: custom operation)
+        the custom operation will overwrite the default operation.
+        for reference, please see ``custom_mask_ops``.
 
     Returns
     -------
@@ -44,11 +47,14 @@ def count_flops_params(model: nn.Module, input_size, verbose=True):
     inputs = torch.randn(input_size).to(device)
 
     hook_module_list = []
+    if custom_ops is None:
+        custom_ops = {}
+    custom_mask_ops.update(custom_ops)
     prev_m = None
     for m in model.modules():
         weight_mask = None
         m_type = type(m)
-        if m_type in custom_ops:
+        if m_type in custom_mask_ops:
             if isinstance(prev_m, PrunerModuleWrapper):
                 weight_mask = prev_m.weight_mask
 
@@ -56,7 +62,7 @@ def count_flops_params(model: nn.Module, input_size, verbose=True):
             hook_module_list.append(m)
         prev_m = m
 
-    flops, params = profile(model, inputs=(inputs, ), custom_ops=custom_ops, verbose=verbose)
+    flops, params = profile(model, inputs=(inputs, ), custom_ops=custom_mask_ops, verbose=verbose)
 
 
     for m in hook_module_list:
@@ -74,7 +80,6 @@ def count_flops_params(model: nn.Module, input_size, verbose=True):
 def count_convNd_mask(m, x, y):
     """
     The forward hook to count FLOPs and Parameters of convolution operation.
-
     Parameters
     ----------
     m : torch.nn.Module
@@ -101,7 +106,6 @@ def count_convNd_mask(m, x, y):
 def count_linear_mask(m, x, y):
     """
     The forward hook to count FLOPs and Parameters of linear transformation.
-
     Parameters
     ----------
     m : torch.nn.Module
@@ -111,22 +115,21 @@ def count_linear_mask(m, x, y):
     y : torch.Tensor
         output data
     """
-    output_channel = y.size()[1]
-    output_size = torch.zeros(y.size()[2:]).numel()
+    output_channel = y.numel()
 
     bias_flops = 1 if m.bias is not None else 0
 
     if m.weight_mask is not None:
         output_channel = m.weight_mask.sum() // m.in_features
 
-    total_ops = output_channel * output_size * (m.in_features + bias_flops)
+    total_ops = output_channel * (m.in_features + bias_flops)
 
     m.total_ops += torch.DoubleTensor([int(total_ops)])
 
 
-custom_ops = {
+custom_mask_ops = {
     nn.Conv1d: count_convNd_mask,
     nn.Conv2d: count_convNd_mask,
     nn.Conv3d: count_convNd_mask,
     nn.Linear: count_linear_mask,
-}
+}
\ No newline at end of file
diff --git a/src/sdk/pynni/nni/compression/torch/utils/mask_conflict.py b/src/sdk/pynni/nni/compression/torch/utils/mask_conflict.py
index 15ef0cc521..ffbcfce3ad 100644
--- a/src/sdk/pynni/nni/compression/torch/utils/mask_conflict.py
+++ b/src/sdk/pynni/nni/compression/torch/utils/mask_conflict.py
@@ -290,4 +290,5 @@ def fix_mask(self):
             _logger.info('Pruned Filters after fixing conflict:')
             pruned_filters = set(list(range(ori_channels)))-channel_remain
             _logger.info(str(sorted(pruned_filters)))
+
         return self.masks
diff --git a/src/sdk/pynni/nni/compression/torch/utils/shape_dependency.py b/src/sdk/pynni/nni/compression/torch/utils/shape_dependency.py
index 40e445cf19..a238848d86 100644
--- a/src/sdk/pynni/nni/compression/torch/utils/shape_dependency.py
+++ b/src/sdk/pynni/nni/compression/torch/utils/shape_dependency.py
@@ -484,3 +484,6 @@ def export(self, filepath):
             for name in self.dependency:
                 group = self.dependency[name]
                 csv_w.writerow([name, group])
+    @property
+    def dependency_sets(self):
+        return self.dependency
diff --git a/src/sdk/pynni/nni/msg_dispatcher.py b/src/sdk/pynni/nni/msg_dispatcher.py
index d66aca458c..7e3232e4f9 100644
--- a/src/sdk/pynni/nni/msg_dispatcher.py
+++ b/src/sdk/pynni/nni/msg_dispatcher.py
@@ -114,6 +114,7 @@ def handle_import_data(self, data):
         data: a list of dictionaries, each of which has at least two keys, 'parameter' and 'value'
         """
         for entry in data:
+            entry['value'] = entry['value']  if type(entry['value']) is str else json_tricks.dumps(entry['value'])
             entry['value'] = json_tricks.loads(entry['value'])
         self.tuner.import_data(data)
 
diff --git a/src/sdk/pynni/tests/test_dependecy_aware.py b/src/sdk/pynni/tests/test_dependecy_aware.py
new file mode 100644
index 0000000000..769663f16d
--- /dev/null
+++ b/src/sdk/pynni/tests/test_dependecy_aware.py
@@ -0,0 +1,147 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+
+import random
+import unittest
+from unittest import TestCase, main
+import torch
+import torch.nn as nn
+import torchvision.models as models
+import numpy as np
+
+from nni.compression.torch import L1FilterPruner, L2FilterPruner, FPGMPruner, \
+    TaylorFOWeightFilterPruner, ActivationAPoZRankFilterPruner, \
+    ActivationMeanRankFilterPruner
+from nni.compression.torch import ModelSpeedup
+
+unittest.TestLoader.sortTestMethodsUsing = None
+
+MODEL_FILE, MASK_FILE = './model.pth', './mask.pth'
+
+def generate_random_sparsity(model):
+    """
+    generate a random sparsity for all conv layers in the
+    model.
+    """
+    cfg_list = []
+    for name, module in model.named_modules():
+        if isinstance(module, nn.Conv2d):
+            sparsity = np.random.uniform(0.5, 0.99)
+            cfg_list.append({'op_types': ['Conv2d'], 'op_names': [name],
+                             'sparsity': sparsity})
+    return cfg_list
+
+def generate_random_sparsity_v2(model):
+    """
+    only generate a random sparsity for some conv layers in
+    in the model.
+    """
+    cfg_list = []
+    for name, module in model.named_modules():
+        # randomly pick 50% layers
+        if isinstance(module, nn.Conv2d) and random.uniform(0, 1) > 0.5:
+            sparsity = np.random.uniform(0.5, 0.99)
+            cfg_list.append({'op_types': ['Conv2d'], 'op_names': [name],
+                             'sparsity': sparsity})
+    return cfg_list
+
+
+class DependencyawareTest(TestCase):
+    @unittest.skipIf(torch.__version__ < "1.3.0", "not supported")
+    def test_dependency_aware_pruning(self):
+        model_zoo = ['resnet18']
+        pruners = [L1FilterPruner, L2FilterPruner, FPGMPruner, TaylorFOWeightFilterPruner]
+        sparsity = 0.7
+        cfg_list = [{'op_types': ['Conv2d'], 'sparsity':sparsity}]
+        dummy_input = torch.ones(1, 3, 224, 224)
+        for model_name in model_zoo:
+            for pruner in pruners:
+                print('Testing on ', pruner)
+                ori_filters = {}
+                Model = getattr(models, model_name)
+                net = Model(pretrained=True, progress=False)
+                # record the number of the filter of each conv layer
+                for name, module in net.named_modules():
+                    if isinstance(module, nn.Conv2d):
+                        ori_filters[name] = module.out_channels
+
+                # for the pruners that based on the activations, we need feed
+                # enough data before we call the compress function.
+                optimizer = torch.optim.SGD(net.parameters(), lr=0.0001,
+                                 momentum=0.9,
+                                 weight_decay=4e-5)
+                criterion = torch.nn.CrossEntropyLoss()
+                tmp_pruner = pruner(
+                    net, cfg_list, optimizer, dependency_aware=True, dummy_input=dummy_input)
+                # train one single batch so that the the pruner can collect the
+                # statistic
+                optimizer.zero_grad()
+                out = net(dummy_input)
+                batchsize = dummy_input.size(0)
+                loss = criterion(out, torch.zeros(batchsize, dtype=torch.int64))
+                loss.backward()
+                optimizer.step()
+
+                tmp_pruner.compress()
+                tmp_pruner.export_model(MODEL_FILE, MASK_FILE)
+                # if we want to use the same model, we should unwrap the pruner before the speedup
+                tmp_pruner._unwrap_model()
+                ms = ModelSpeedup(net, dummy_input, MASK_FILE)
+                ms.speedup_model()
+                for name, module in net.named_modules():
+                    if isinstance(module, nn.Conv2d):
+                        expected = int(ori_filters[name] * (1-sparsity))
+                        filter_diff = abs(expected - module.out_channels)
+                        errmsg = '%s Ori: %d, Expected: %d, Real: %d' % (
+                            name, ori_filters[name], expected, module.out_channels)
+
+                        # because we are using the dependency-aware mode, so the number of the
+                        # filters after speedup should be ori_filters[name] * ( 1 - sparsity )
+                        print(errmsg)
+                        assert filter_diff <= 1, errmsg
+
+    @unittest.skipIf(torch.__version__ < "1.3.0", "not supported")
+    def test_dependency_aware_random_config(self):
+        model_zoo = ['resnet18']
+        pruners = [L1FilterPruner, L2FilterPruner, FPGMPruner, TaylorFOWeightFilterPruner,
+                   ActivationMeanRankFilterPruner, ActivationAPoZRankFilterPruner]
+        dummy_input = torch.ones(1, 3, 224, 224)
+        for model_name in model_zoo:
+            for pruner in pruners:
+                Model = getattr(models, model_name)
+                cfg_generator = [generate_random_sparsity, generate_random_sparsity_v2]
+                for _generator in cfg_generator:
+                    net = Model(pretrained=True, progress=False)
+                    cfg_list = _generator(net)
+
+                    print('\n\nModel:', model_name)
+                    print('Pruner', pruner)
+                    print('Config_list:', cfg_list)
+                    # for the pruners that based on the activations, we need feed
+                    # enough data before we call the compress function.
+                    optimizer = torch.optim.SGD(net.parameters(), lr=0.0001,
+                                    momentum=0.9,
+                                    weight_decay=4e-5)
+                    criterion = torch.nn.CrossEntropyLoss()
+                    tmp_pruner = pruner(
+                        net, cfg_list, optimizer, dependency_aware=True, dummy_input=dummy_input)
+                    # train one single batch so that the the pruner can collect the
+                    # statistic
+                    optimizer.zero_grad()
+                    out = net(dummy_input)
+                    batchsize = dummy_input.size(0)
+                    loss = criterion(out, torch.zeros(batchsize, dtype=torch.int64))
+                    loss.backward()
+                    optimizer.step()
+
+                    tmp_pruner.compress()
+                    tmp_pruner.export_model(MODEL_FILE, MASK_FILE)
+                    # if we want to use the same model, we should unwrap the pruner before the speedup
+                    tmp_pruner._unwrap_model()
+                    ms = ModelSpeedup(net, dummy_input, MASK_FILE)
+                    ms.speedup_model()
+
+
+if __name__ == '__main__':
+    main()
diff --git a/src/webui/src/static/json_util.ts b/src/webui/src/static/json_util.ts
index 91f5a31a21..0475096661 100644
--- a/src/webui/src/static/json_util.ts
+++ b/src/webui/src/static/json_util.ts
@@ -29,8 +29,7 @@ function batchFormat(
     width: number,
     keyOrKeys?: string | string[]
 ): string[] {
-
-    let keys: string[];  // dict key as prefix string
+    let keys: string[]; // dict key as prefix string
     if (keyOrKeys === undefined) {
         keys = objects.map(() => '');
     } else if (typeof keyOrKeys === 'string') {
@@ -50,7 +49,7 @@ function batchFormat(
 
     const hasNested = nonNull.some(obj => detectNested(obj));
 
-    if (!hasNested && lines.every(line => (line.length + curIndent.length < width))) {
+    if (!hasNested && lines.every(line => line.length + curIndent.length < width)) {
         return lines;
     }
 
@@ -62,7 +61,15 @@ function batchFormat(
             if (obj === null) {
                 return keys[i] + 'null';
             } else {
-                return keys[i] + createBlock(curIndent, indent, '[]', obj.map(() => iter.next().value));
+                return (
+                    keys[i] +
+                    createBlock(
+                        curIndent,
+                        indent,
+                        '[]',
+                        obj.map(() => iter.next().value)
+                    )
+                );
             }
         });
     }
@@ -79,10 +86,17 @@ function batchFormat(
                 if (obj === null) {
                     return keys[i] + 'null';
                 } else {
-                    return keys[i] + createBlock(curIndent, indent, '{}', Object.keys(obj).map(() => iter.next().value));
+                    return (
+                        keys[i] +
+                        createBlock(
+                            curIndent,
+                            indent,
+                            '{}',
+                            Object.keys(obj).map(() => iter.next().value)
+                        )
+                    );
                 }
             });
-
         } else {
             // these objects look like class instances, so we will try to group their fields
             const uniqueKeys = new Set(childrenKeys);
@@ -90,9 +104,11 @@ function batchFormat(
             for (const key of uniqueKeys) {
                 const fields = nonNull.map(obj => obj[key]).filter(v => v !== undefined);
                 let elements;
-                if (detectBatch(fields)) {  // look like same field of class instances
+                if (detectBatch(fields)) {
+                    // look like same field of class instances
                     elements = batchFormat(fields, curIndent + indent, indent, width, key);
-                } else {  // no idea what these are, fallback to format them independently
+                } else {
+                    // no idea what these are, fallback to format them independently
                     elements = fields.map(field => batchFormat([field], curIndent + indent, indent, width, key));
                 }
                 iters.set(key, elements[Symbol.iterator]());
@@ -138,18 +154,18 @@ function detectBatch(objects: any[]): boolean {
         return sameType(concat(nonNull));
     }
 
-    if (nonNull.every(obj => (typeof obj === 'object' && !Array.isArray(obj)))) {
+    if (nonNull.every(obj => typeof obj === 'object' && !Array.isArray(obj))) {
         const totalKeys = new Set(concat(nonNull.map(obj => Object.keys(obj)))).size;
-        const missKeys = nonNull.map(obj => (totalKeys - Object.keys(obj).length));
+        const missKeys = nonNull.map(obj => totalKeys - Object.keys(obj).length);
         const missSum = missKeys.reduce((a, b) => a + b, 0);
-        return missSum < (totalKeys * nonNull.length) * batchThreshold;
+        return missSum < totalKeys * nonNull.length * batchThreshold;
     }
 
     return sameType(nonNull);
 }
 
 function detectNested(obj: any): boolean {
-    return typeof(obj) == 'object' && Object.values(obj).some(child => typeof(child) == 'object');
+    return typeof obj == 'object' && Object.values(obj).some(child => typeof child == 'object');
 }
 
 function concat(arrays: any[][]): any[] {
@@ -168,7 +184,7 @@ function createBlock(curIndent: string, indent: string, brackets: string, elemen
 
 function sameType(objects: any[]): boolean {
     const nonNull = objects.filter(obj => obj !== undefined);
-    return nonNull.length > 0 ? nonNull.every(obj => (typeof obj === typeof nonNull[0])) : true;
+    return nonNull.length > 0 ? nonNull.every(obj => typeof obj === typeof nonNull[0]) : true;
 }
 
 function stringifySingleLine(obj: any): string {
@@ -179,9 +195,15 @@ function stringifySingleLine(obj: any): string {
     } else if (typeof obj === 'string') {
         return `"${obj}"`;
     } else if (Array.isArray(obj)) {
-        return '[' + obj.map(x => stringifySingleLine(x)).join(', ') + ']'
+        return '[' + obj.map(x => stringifySingleLine(x)).join(', ') + ']';
     } else {
-        return '{' + Object.keys(obj).map(key => `"${key}": ${stringifySingleLine(obj[key])}`).join(', ') + '}';
+        return (
+            '{' +
+            Object.keys(obj)
+                .map(key => `"${key}": ${stringifySingleLine(obj[key])}`)
+                .join(', ') +
+            '}'
+        );
     }
 }
 
diff --git a/test/config/integration_tests.yml b/test/config/integration_tests.yml
index 97e3e8adad..fa17fd26c3 100644
--- a/test/config/integration_tests.yml
+++ b/test/config/integration_tests.yml
@@ -135,6 +135,13 @@ testCases:
   validator:
     class: ExportValidator 
 
+- name: experiment-import
+  configFile: test/config/nnictl_experiment/sklearn-classification.yml
+  validator:
+    class: ImportValidator
+    kwargs:
+      import_data_file_path: config/nnictl_experiment/test_import.json
+
 - name: nnicli
   configFile: test/config/examples/sklearn-regression.yml
   config:
diff --git a/test/config/nnictl_experiment/sklearn-classification.yml b/test/config/nnictl_experiment/sklearn-classification.yml
new file mode 100644
index 0000000000..5a803e40d2
--- /dev/null
+++ b/test/config/nnictl_experiment/sklearn-classification.yml
@@ -0,0 +1,23 @@
+authorName: nni
+experimentName: default_test
+maxExecDuration: 5m
+maxTrialNum: 4
+trialConcurrency: 2
+searchSpacePath: ../../../examples/trials/sklearn/classification/search_space.json
+
+tuner:
+  builtinTunerName: TPE
+assessor:
+  builtinAssessorName: Medianstop
+  classArgs:
+    optimize_mode: maximize
+trial:
+  codeDir: ../../../examples/trials/sklearn/classification
+  command: python3 main.py
+  gpuNum: 0
+
+useAnnotation: false
+multiPhase: false
+multiThread: false
+
+trainingServicePlatform: local
diff --git a/test/config/nnictl_experiment/test_import.json b/test/config/nnictl_experiment/test_import.json
new file mode 100644
index 0000000000..6a8c396b9e
--- /dev/null
+++ b/test/config/nnictl_experiment/test_import.json
@@ -0,0 +1,4 @@
+[
+    {"parameter": {"C": 0.15940134774738896, "kernel": "sigmoid", "degree": 3, "gamma": 0.07295826917955316, "coef0": 0.0978204758732429}, "value": 0.6},
+    {"parameter": {"C": 0.5556430724708544, "kernel": "linear", "degree": 3, "gamma": 0.04957496655414671, "coef0": 0.08520868779907687}, "value": 0.7}
+]
diff --git a/test/nni_test/nnitest/utils.py b/test/nni_test/nnitest/utils.py
index 795bd8aefe..bdd3fd7fea 100644
--- a/test/nni_test/nnitest/utils.py
+++ b/test/nni_test/nnitest/utils.py
@@ -24,6 +24,7 @@
 STATUS_URL = API_ROOT_URL + '/check-status'
 TRIAL_JOBS_URL = API_ROOT_URL + '/trial-jobs'
 METRICS_URL = API_ROOT_URL + '/metric-data'
+GET_IMPORTED_DATA_URL = API_ROOT_URL + '/experiment/imported-data'
 
 def read_last_line(file_name):
     '''read last line of a file and return None if file not found'''
diff --git a/test/nni_test/nnitest/validators.py b/test/nni_test/nnitest/validators.py
index 5ad9090c18..ff349ebc2b 100644
--- a/test/nni_test/nnitest/validators.py
+++ b/test/nni_test/nnitest/validators.py
@@ -7,7 +7,8 @@
 import json
 import requests
 from nnicli import Experiment
-from utils import METRICS_URL
+from nni_cmd.updater import load_search_space
+from utils import METRICS_URL, GET_IMPORTED_DATA_URL
 
 
 class ITValidator:
@@ -33,6 +34,17 @@ def __call__(self, rest_endpoint, experiment_dir, nni_source_dir, **kwargs):
             print('\n\n')
         remove('report.json')
 
+class ImportValidator(ITValidator):
+    def __call__(self, rest_endpoint, experiment_dir, nni_source_dir, **kwargs):
+        exp_id = osp.split(experiment_dir)[-1]
+        import_data_file_path = kwargs.get('import_data_file_path')
+        proc = subprocess.run(['nnictl', 'experiment', 'import', exp_id, '-f', import_data_file_path])
+        assert proc.returncode == 0, \
+            '`nnictl experiment import {0} -f {1}` failed with code {2}'.format(exp_id, import_data_file_path, proc.returncode)
+        imported_data = requests.get(GET_IMPORTED_DATA_URL).json()
+        origin_data = load_search_space(import_data_file_path).replace(' ', '')
+        assert origin_data in imported_data
+
 class MetricsValidator(ITValidator):
     def __call__(self, rest_endpoint, experiment_dir, nni_source_dir, **kwargs):
         self.check_metrics(nni_source_dir, **kwargs)
diff --git a/tools/nni_cmd/config_schema.py b/tools/nni_cmd/config_schema.py
index 81dc1c77a5..2a7d929779 100644
--- a/tools/nni_cmd/config_schema.py
+++ b/tools/nni_cmd/config_schema.py
@@ -382,7 +382,8 @@ def validate(self, data):
             Optional('passphrase'): setType('passphrase', str),
             Optional('gpuIndices'): Or(int, And(str, lambda x: len([int(i) for i in x.split(',')]) > 0), error='gpuIndex format error!'),
             Optional('maxTrialNumPerGpu'): setType('maxTrialNumPerGpu', int),
-            Optional('useActiveGpu'): setType('useActiveGpu', bool)
+            Optional('useActiveGpu'): setType('useActiveGpu', bool),
+            Optional('preCommand'): setType('preCommand', str)
         },
         {
             'ip': setType('ip', str),
@@ -391,7 +392,8 @@ def validate(self, data):
             'passwd': setType('passwd', str),
             Optional('gpuIndices'): Or(int, And(str, lambda x: len([int(i) for i in x.split(',')]) > 0), error='gpuIndex format error!'),
             Optional('maxTrialNumPerGpu'): setType('maxTrialNumPerGpu', int),
-            Optional('useActiveGpu'): setType('useActiveGpu', bool)
+            Optional('useActiveGpu'): setType('useActiveGpu', bool),
+            Optional('preCommand'): setType('preCommand', str)
         })]
 }
 
diff --git a/tools/nni_cmd/nnictl.py b/tools/nni_cmd/nnictl.py
index 213554d5e8..cc5d4cdd11 100644
--- a/tools/nni_cmd/nnictl.py
+++ b/tools/nni_cmd/nnictl.py
@@ -159,6 +159,8 @@ def parse_args():
     parser_load_experiment.add_argument('--codeDir', '-c', required=True, help='the path of codeDir for loaded experiment, \
                                    this path will also put the code in the loaded experiment package')
     parser_load_experiment.add_argument('--logDir', '-l', required=False, help='the path of logDir for loaded experiment')
+    parser_load_experiment.add_argument('--searchSpacePath', '-s', required=False, help='the path of search space file for \
+                                   loaded experiment, this path contains file name. Default in $codeDir/search_space.json')
     parser_load_experiment.set_defaults(func=load_experiment)
 
     #parse platform command
diff --git a/tools/nni_cmd/nnictl_utils.py b/tools/nni_cmd/nnictl_utils.py
index 3833c9317f..e59c293e2a 100644
--- a/tools/nni_cmd/nnictl_utils.py
+++ b/tools/nni_cmd/nnictl_utils.py
@@ -827,7 +827,18 @@ def save_experiment(args):
         temp_code_dir = os.path.join(temp_root_dir, 'code')
         shutil.copytree(nni_config.get_config('experimentConfig')['trial']['codeDir'], temp_code_dir)
 
-    # Step4. Archive folder
+    # Step4. Copy searchSpace file
+    search_space_path = nni_config.get_config('experimentConfig').get('searchSpacePath')
+    if search_space_path:
+        if not os.path.exists(search_space_path):
+            print_warning('search space %s does not exist!' % search_space_path)
+        else:
+            temp_search_space_dir = os.path.join(temp_root_dir, 'searchSpace')
+            os.makedirs(temp_search_space_dir, exist_ok=True)
+            search_space_name = os.path.basename(search_space_path)
+            shutil.copyfile(search_space_path, os.path.join(temp_search_space_dir, search_space_name))
+
+    # Step5. Archive folder
     zip_package_name = 'nni_experiment_%s' % args.id
     if args.path:
         os.makedirs(args.path, exist_ok=True)
@@ -844,6 +855,9 @@ def load_experiment(args):
     if not os.path.exists(args.path):
         print_error('file path %s does not exist!' % args.path)
         exit(1)
+    if args.searchSpacePath and os.path.isdir(args.searchSpacePath):
+        print_error('search space path should be a full path with filename, not a directory!')
+        exit(1)
     temp_root_dir = generate_temp_dir()
     shutil.unpack_archive(package_path, temp_root_dir)
     print_normal('Loading...')
@@ -929,7 +943,32 @@ def load_experiment(args):
             else:
                 shutil.copy(src_path, target_path)
 
-    # Step5. Create experiment metadata
+    # Step5. Copy searchSpace file
+    archive_search_space_dir = os.path.join(temp_root_dir, 'searchSpace')
+    if args.searchSpacePath:
+        target_path = os.path.expanduser(args.searchSpacePath)
+    else:
+        # set default path to codeDir
+        target_path = os.path.join(codeDir, 'search_space.json')
+    if not os.path.isabs(target_path):
+        target_path = os.path.join(os.getcwd(), target_path)
+        print_normal('Expand search space path to %s' % target_path)
+    nnictl_exp_config['searchSpacePath'] = target_path
+    # if the path already has a search space file, use the original one, otherwise use archived one
+    if not os.path.isfile(target_path):
+        if len(os.listdir(archive_search_space_dir)) == 0:
+            print_error('Archive file does not contain search space file!')
+            exit(1)
+        else:
+            for file in os.listdir(archive_search_space_dir):
+                source_path = os.path.join(archive_search_space_dir, file)
+                os.makedirs(os.path.dirname(target_path), exist_ok=True)
+                shutil.copyfile(source_path, target_path)
+                break
+    elif not args.searchSpacePath:
+        print_warning('%s exist, will not load search_space file' % target_path)
+
+    # Step6. Create experiment metadata
     nni_config.set_config('experimentConfig', nnictl_exp_config)
     experiment_config.add_experiment(experiment_id,
                                      experiment_metadata.get('port'),
diff --git a/tools/nni_cmd/updater.py b/tools/nni_cmd/updater.py
index c9991b8bab..071a167f70 100644
--- a/tools/nni_cmd/updater.py
+++ b/tools/nni_cmd/updater.py
@@ -7,7 +7,7 @@
 from .url_utils import experiment_url, import_data_url
 from .config_utils import Config
 from .common_utils import get_json_content, print_normal, print_error, print_warning
-from .nnictl_utils import get_experiment_port, get_config_filename
+from .nnictl_utils import get_experiment_port, get_config_filename, detect_process
 from .launcher_utils import parse_time
 from .constants import REST_TIME_OUT, TUNERS_SUPPORTING_IMPORT_DATA, TUNERS_NO_NEED_TO_IMPORT_DATA
 
@@ -115,7 +115,19 @@ def import_data(args):
     validate_file(args.filename)
     validate_dispatcher(args)
     content = load_search_space(args.filename)
-    args.port = get_experiment_port(args)
+
+    nni_config = Config(get_config_filename(args))
+    rest_port = nni_config.get_config('restServerPort')
+    rest_pid = nni_config.get_config('restServerPid')
+    if not detect_process(rest_pid):
+        print_error('Experiment is not running...')
+        return
+    running, _ = check_rest_server_quick(rest_port)
+    if not running:
+        print_error('Restful server is not running')
+        return
+
+    args.port = rest_port
     if args.port is not None:
         if import_data_to_restful_server(args, content):
             pass