Merge pull request #72 from microsoft/master

pull code
chicm-ms · Feb 25, 2020 · 9e97bed · 9e97bed
2 parents 0856813 + ff2728c
commit 9e97bed
Show file tree

Hide file tree

Showing 12 changed files with 120 additions and 24 deletions.
diff --git a/README.md b/README.md
@@ -124,10 +124,12 @@ Within the following table, we summarized the current NNI capabilities, we are g
           <a href="docs/en_US/NAS/Overview.md">Neural Architecture Search</a>
           <ul>                        
             <ul>
-              <li><a href="docs/en_US/NAS/Overview.md#enas">ENAS</a></li>
-              <li><a href="docs/en_US/NAS/Overview.md#darts">DARTS</a></li>
-              <li><a href="docs/en_US/NAS/Overview.md#p-darts">P-DARTS</a></li>
-              <li><a href="docs/en_US/NAS/Overview.md#cdarts">CDARTS</a></li>
+              <li><a href="docs/en_US/NAS/ENAS.md">ENAS</a></li>
+              <li><a href="docs/en_US/NAS/DARTS.md">DARTS</a></li>
+              <li><a href="docs/en_US/NAS/PDARTS.md">P-DARTS</a></li>
+              <li><a href="docs/en_US/NAS/CDARTS.md">CDARTS</a></li>
+              <li><a href="docs/en_US/NAS/SPOS.md">SPOS</a></li>
+              <li><a href="docs/en_US/NAS/Proxylessnas.md">ProxylessNAS</a></li>
               <li><a href="docs/en_US/Tuner/BuiltinTuner.md#NetworkMorphism">Network Morphism</a> </li>
             </ul>    
           </ul>
@@ -224,7 +226,7 @@ Note:
 
 * If there is any privilege issue, add `--user` to install NNI in the user directory.
 * Currently NNI on Windows supports local, remote and pai mode. Anaconda or Miniconda is highly recommended to install NNI on Windows.
-* If there is any error like `Segmentation fault`, please refer to [FAQ](docs/en_US/Tutorial/FAQ.md). For FAQ on Windows, please refer to [NNI on Windows](docs/en_US/Tutorial/NniOnWindows.md).
+* If there is any error like `Segmentation fault`, please refer to [FAQ](docs/en_US/Tutorial/FAQ.md). For FAQ on Windows, please refer to [NNI on Windows](docs/en_US/Tutorial/InstallationWin.md#faq).
 
 ### **Verify installation**
 
@@ -288,7 +290,7 @@ You can use these commands to get more information about the experiment
 ## **Documentation**
 * To learn about what's NNI, read the [NNI Overview](https://nni.readthedocs.io/en/latest/Overview.html). 
 * To get yourself familiar with how to use NNI, read the [documentation](https://nni.readthedocs.io/en/latest/index.html). 
-* To get started and install NNI on your system, please refer to [Install NNI](docs/en_US/Tutorial/Installation.md).
+* To get started and install NNI on your system, please refer to [Install NNI](https://nni.readthedocs.io/en/latest/installation.html).
 
 ## **Contributing**
 This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.
@@ -304,7 +306,7 @@ After getting familiar with contribution agreements, you are ready to create you
 * If you have any questions on usage, review [FAQ](https://github.com/microsoft/nni/blob/master/docs/en_US/Tutorial/FAQ.md) first, if there are no relevant issues and answers to your question, try contact NNI dev team and users in [Gitter](https://gitter.im/Microsoft/nni?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge) or [File an issue](https://github.com/microsoft/nni/issues/new/choose) on GitHub.
 * [Customize your own Tuner](docs/en_US/Tuner/CustomizeTuner.md)
 * [Implement customized TrainingService](docs/en_US/TrainingService/HowToImplementTrainingService.md)
-* [Implement a new NAS trainer on NNI](https://github.com/microsoft/nni/blob/master/docs/en_US/NAS/NasInterface.md#implement-a-new-nas-trainer-on-nni)
+* [Implement a new NAS trainer on NNI](docs/en_US/NAS/Advanced.md)
 * [Customize your own Advisor](docs/en_US/Tuner/CustomizeAdvisor.md)
 
 ## **External Repositories and References**

diff --git a/docs/en_US/Compressor/ModelSpeedup.md b/docs/en_US/Compressor/ModelSpeedup.md
@@ -14,7 +14,7 @@ There are two types of pruning. One is fine-grained pruning, it does not change
 
 ## Design and Implementation
 
-To speed up a model, the pruned layers should be replaced, either replaced with smaller layer for coarse-grained mask, or replaced with sparse kernel for fine-grained mask. Coarse-grained mask usually changes the shape of weights or input/output tensors, thus, we should do shape inference to check are there other unpruned layers should be replaced as well due to shape change. Therefore, in our design, there are two main steps: first, do shape inference to find out all the modules that should be replaced; second, replace the modules. The first step requires topology (i.e., connections) of the model, we use `jit.trace` to obtain the model grpah for PyTorch.
+To speed up a model, the pruned layers should be replaced, either replaced with smaller layer for coarse-grained mask, or replaced with sparse kernel for fine-grained mask. Coarse-grained mask usually changes the shape of weights or input/output tensors, thus, we should do shape inference to check are there other unpruned layers should be replaced as well due to shape change. Therefore, in our design, there are two main steps: first, do shape inference to find out all the modules that should be replaced; second, replace the modules. The first step requires topology (i.e., connections) of the model, we use `jit.trace` to obtain the model graph for PyTorch.
 
 For each module, we should prepare four functions, three for shape inference and one for module replacement. The three shape inference functions are: given weight shape infer input/output shape, given input shape infer weight/output shape, given output shape infer weight/input shape. The module replacement function returns a newly created module which is smaller.
 
@@ -102,4 +102,4 @@ input tensor: `torch.randn(64, 3, 32, 32)`
 | 4 | 0.02521 | 0.014008 |
 | 8 | 0.03386 | 0.023923 |
 | 16 | 0.06042 | 0.046183 |
-| 32 | 0.12421 | 0.087113 |
+| 32 | 0.12421 | 0.087113 |
diff --git a/examples/model_compress/APoZ_torch_cifar10.py b/examples/model_compress/APoZ_torch_cifar10.py
@@ -41,7 +41,7 @@ def test(model, device, test_loader):
 
 def main():
     torch.manual_seed(0)
-    device = torch.device('cuda')
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
     train_loader = torch.utils.data.DataLoader(
         datasets.CIFAR10('./data.cifar10', train=True, download=True,
                          transform=transforms.Compose([

diff --git a/examples/model_compress/BNN_quantizer_cifar10.py b/examples/model_compress/BNN_quantizer_cifar10.py
@@ -105,7 +105,7 @@ def adjust_learning_rate(optimizer, epoch):
 
 def main():
     torch.manual_seed(0)
-    device = torch.device('cuda')
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
     train_loader = torch.utils.data.DataLoader(
         datasets.CIFAR10('./data.cifar10', train=True, download=True,
                          transform=transforms.Compose([

diff --git a/examples/model_compress/DoReFaQuantizer_torch_mnist.py b/examples/model_compress/DoReFaQuantizer_torch_mnist.py
@@ -0,0 +1,89 @@
+import torch
+import torch.nn.functional as F
+from torchvision import datasets, transforms
+from nni.compression.torch import DoReFaQuantizer
+
+
+class Mnist(torch.nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.conv1 = torch.nn.Conv2d(1, 20, 5, 1)
+        self.conv2 = torch.nn.Conv2d(20, 50, 5, 1)
+        self.fc1 = torch.nn.Linear(4 * 4 * 50, 500)
+        self.fc2 = torch.nn.Linear(500, 10)
+        self.relu1 = torch.nn.ReLU6()
+        self.relu2 = torch.nn.ReLU6()
+        self.relu3 = torch.nn.ReLU6()
+
+    def forward(self, x):
+        x = self.relu1(self.conv1(x))
+        x = F.max_pool2d(x, 2, 2)
+        x = self.relu2(self.conv2(x))
+        x = F.max_pool2d(x, 2, 2)
+        x = x.view(-1, 4 * 4 * 50)
+        x = self.relu3(self.fc1(x))
+        x = self.fc2(x)
+        return F.log_softmax(x, dim=1)
+
+
+def train(model, quantizer, device, train_loader, optimizer):
+    model.train()
+    for batch_idx, (data, target) in enumerate(train_loader):
+        data, target = data.to(device), target.to(device)
+        optimizer.zero_grad()
+        output = model(data)
+        loss = F.nll_loss(output, target)
+        loss.backward()
+        optimizer.step()
+        if batch_idx % 100 == 0:
+            print('{:2.0f}%  Loss {}'.format(100 * batch_idx / len(train_loader), loss.item()))
+
+def test(model, device, test_loader):
+    model.eval()
+    test_loss = 0
+    correct = 0
+    with torch.no_grad():
+        for data, target in test_loader:
+            data, target = data.to(device), target.to(device)
+            output = model(data)
+            test_loss += F.nll_loss(output, target, reduction='sum').item()
+            pred = output.argmax(dim=1, keepdim=True)
+            correct += pred.eq(target.view_as(pred)).sum().item()
+    test_loss /= len(test_loader.dataset)
+
+    print('Loss: {}  Accuracy: {}%)\n'.format(
+        test_loss, 100 * correct / len(test_loader.dataset)))
+
+def main():
+    torch.manual_seed(0)
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+
+    trans = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
+    train_loader = torch.utils.data.DataLoader(
+        datasets.MNIST('data', train=True, download=True, transform=trans),
+        batch_size=64, shuffle=True)
+    test_loader = torch.utils.data.DataLoader(
+        datasets.MNIST('data', train=False, transform=trans),
+        batch_size=1000, shuffle=True)
+
+    model = Mnist()
+    model = model.to(device)
+    configure_list = [{
+        'quant_types': ['weight'],
+        'quant_bits': {
+            'weight': 8,
+        }, # you can just use `int` here because all `quan_types` share same bits length, see config for `ReLu6` below.
+        'op_types':['Conv2d', 'Linear']
+    }]
+    quantizer = DoReFaQuantizer(model, configure_list)
+    quantizer.compress()
+
+    optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.5)
+    for epoch in range(10):
+        print('# Epoch {} #'.format(epoch))
+        train(model, quantizer, device, train_loader, optimizer)
+        test(model, device, test_loader)
+
+
+if __name__ == '__main__':
+    main()
diff --git a/examples/model_compress/L1_torch_cifar10.py b/examples/model_compress/L1_torch_cifar10.py
@@ -41,7 +41,7 @@ def test(model, device, test_loader):
 
 def main():
     torch.manual_seed(0)
-    device = torch.device('cuda')
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
     train_loader = torch.utils.data.DataLoader(
         datasets.CIFAR10('./data.cifar10', train=True, download=True,
                          transform=transforms.Compose([

diff --git a/examples/model_compress/MeanActivation_torch_cifar10.py b/examples/model_compress/MeanActivation_torch_cifar10.py
@@ -1,4 +1,5 @@
 import math
+import os
 import argparse
 import torch
 import torch.nn as nn
@@ -48,7 +49,7 @@ def main():
 
     args = parser.parse_args()
     torch.manual_seed(0)
-    device = torch.device('cuda')
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
     train_loader = torch.utils.data.DataLoader(
         datasets.CIFAR10('./data.cifar10', train=True, download=True,
                          transform=transforms.Compose([
@@ -79,10 +80,11 @@ def main():
             test(model, device, test_loader)
             lr_scheduler.step(epoch)
         torch.save(model.state_dict(), 'vgg16_cifar10.pth')
-
+    else:
+        assert os.path.isfile('vgg16_cifar10.pth'), "can not find checkpoint 'vgg16_cifar10.pth'"
+        model.load_state_dict(torch.load('vgg16_cifar10.pth'))
     # Test base model accuracy
     print('=' * 10 + 'Test on the original model' + '=' * 10)
-    model.load_state_dict(torch.load('vgg16_cifar10.pth'))
     test(model, device, test_loader)
     # top1 = 93.51%
 

diff --git a/examples/model_compress/QAT_torch_quantizer.py b/examples/model_compress/QAT_torch_quantizer.py
@@ -56,7 +56,7 @@ def test(model, device, test_loader):
 
 def main():
     torch.manual_seed(0)
-    device = torch.device('cpu')
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
 
     trans = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
     train_loader = torch.utils.data.DataLoader(
@@ -67,7 +67,6 @@ def main():
         batch_size=1000, shuffle=True)
 
     model = Mnist()
-
     '''you can change this to DoReFaQuantizer to implement it
     DoReFaQuantizer(configure_list).compress(model)
     '''
@@ -86,6 +85,7 @@ def main():
     quantizer = QAT_Quantizer(model, configure_list)
     quantizer.compress()
 
+    model.to(device)
     optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.5)
     for epoch in range(10):
         print('# Epoch {} #'.format(epoch))

diff --git a/examples/model_compress/fpgm_torch_mnist.py b/examples/model_compress/fpgm_torch_mnist.py
@@ -72,7 +72,7 @@ def test(model, device, test_loader):
 
 def main():
     torch.manual_seed(0)
-    device = torch.device('cpu')
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
 
     trans = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
     train_loader = torch.utils.data.DataLoader(
@@ -83,6 +83,7 @@ def main():
         batch_size=1000, shuffle=True)
 
     model = Mnist()
+    model.to(device)
     model.print_conv_filter_sparsity()
 
     configure_list = [{
@@ -92,7 +93,7 @@ def main():
 
     pruner = FPGMPruner(model, configure_list)
     pruner.compress()
-
+    model.to(device)
     optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.5)
     for epoch in range(10):
         pruner.update_epoch(epoch)

diff --git a/examples/model_compress/main_torch_pruner.py b/examples/model_compress/main_torch_pruner.py
@@ -55,7 +55,7 @@ def test(model, device, test_loader):
 
 def main():
     torch.manual_seed(0)
-    device = torch.device('cuda')
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
 
     trans = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
     train_loader = torch.utils.data.DataLoader(

diff --git a/examples/model_compress/pruning_kd.py b/examples/model_compress/pruning_kd.py
@@ -49,7 +49,7 @@ def test(model, device, test_loader):
 
 def main():
     torch.manual_seed(0)
-    device = torch.device('cuda')
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
     train_loader = torch.utils.data.DataLoader(
         datasets.CIFAR10('./data.cifar10', train=True, download=True,
                          transform=transforms.Compose([

diff --git a/examples/model_compress/slim_torch_cifar10.py b/examples/model_compress/slim_torch_cifar10.py
@@ -1,4 +1,5 @@
 import math
+import os
 import argparse
 import torch
 import torch.nn as nn
@@ -57,7 +58,7 @@ def main():
     args = parser.parse_args()
 
     torch.manual_seed(0)
-    device = torch.device('cuda')
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
     train_loader = torch.utils.data.DataLoader(
         datasets.CIFAR10('./data.cifar10', train=True, download=True,
                          transform=transforms.Compose([
@@ -90,10 +91,11 @@ def main():
             train(model, device, train_loader, optimizer, True)
             test(model, device, test_loader)
         torch.save(model.state_dict(), 'vgg19_cifar10.pth')
-
+    else:
+        assert os.path.isfile('vgg19_cifar10.pth'), "can not find checkpoint 'vgg19_cifar10.pth'"
+        model.load_state_dict(torch.load('vgg19_cifar10.pth'))
     # Test base model accuracy
     print('=' * 10 + 'Test the original model' + '=' * 10)
-    model.load_state_dict(torch.load('vgg19_cifar10.pth'))
     test(model, device, test_loader)
     # top1 = 93.60%