Merge pull request #216 from microsoft/master

merge master
SparkSnail · Nov 21, 2019 · 9ce751d · 9ce751d
2 parents d16dbe9 + cb52d44
commit 9ce751d
Show file tree

Hide file tree

Showing 24 changed files with 802 additions and 86 deletions.
diff --git a/README.md b/README.md
@@ -352,6 +352,7 @@ With authors' permission, we listed a set of NNI usage examples and relevant art
    * Run [Neural Network Architecture Search](examples/trials/nas_cifar10/README.md) with NNI 
    * [Automatic Feature Engineering](examples/trials/auto-feature-engineering/README.md) with NNI 
    * [Hyperparameter Tuning for Matrix Factorization](https://github.com/microsoft/recommenders/blob/master/notebooks/04_model_select_and_optimize/nni_surprise_svd.ipynb) with NNI
+   * [scikit-nni](https://github.com/ksachdeva/scikit-nni) Hyper-parameter search for scikit-learn pipelines using NNI
 
 * ### **Relevant Articles** ###
 
@@ -360,6 +361,7 @@ With authors' permission, we listed a set of NNI usage examples and relevant art
   * [Parallelizing a Sequential Algorithm TPE](docs/en_US/CommunitySharings/ParallelizingTpeSearch.md)
   * [Automatically tuning SVD with NNI](docs/en_US/CommunitySharings/RecommendersSvd.md)
   * [Automatically tuning SPTAG with NNI](docs/en_US/CommunitySharings/SptagAutoTune.md)
+  * [Find thy hyper-parameters for scikit-learn pipelines using Microsoft NNI](https://towardsdatascience.com/find-thy-hyper-parameters-for-scikit-learn-pipelines-using-microsoft-nni-f1015b1224c1)
   * **Blog (in Chinese)** - [AutoML tools (Advisor, NNI and Google Vizier) comparison](http://gaocegege.com/Blog/%E6%9C%BA%E5%99%A8%E5%AD%A6%E4%B9%A0/katib-new#%E6%80%BB%E7%BB%93%E4%B8%8E%E5%88%86%E6%9E%90) by [@gaocegege](https://github.com/gaocegege) - 总结与分析 section of design and implementation of kubeflow/katib
 
 ## **Feedback**

diff --git a/docs/en_US/Compressor/LotteryTicketHypothesis.md b/docs/en_US/Compressor/LotteryTicketHypothesis.md
@@ -0,0 +1,23 @@
+Lottery Ticket Hypothesis on NNI
+===
+
+## Introduction
+
+The paper [The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks](https://arxiv.org/abs/1803.03635) is mainly a measurement and analysis paper, it delivers very interesting insights. To support it on NNI, we mainly implement the training approach for finding *winning tickets*.
+
+In this paper, the authors use the following process to prune a model, called *iterative prunning*:
+>1. Randomly initialize a neural network f(x;theta_0) (where theta_0 follows D_{theta}).
+>2. Train the network for j iterations, arriving at parameters theta_j.
+>3. Prune p% of the parameters in theta_j, creating a mask m.
+>4. Reset the remaining parameters to their values in theta_0, creating the winning ticket f(x;m*theta_0).
+>5. Repeat step 2, 3, and 4.
+
+If the configured final sparsity is P (e.g., 0.8) and there are n times iterative pruning, each iterative pruning prunes 1-(1-P)^(1/n) of the weights that survive the previous round.
+
+## Reproduce Results
+
+We try to reproduce the experiment result of the fully connected network on MNIST using the same configuration as in the paper. The code can be referred [here](https://github.com/microsoft/nni/tree/master/examples/model_compress/lottery_torch_mnist_fc.py). In this experiment, we prune 10 times, for each pruning we train the pruned model for 50 epochs.
+
+![](../../img/lottery_ticket_mnist_fc.png)
+
+The above figure shows the result of the fully connected network. `round0-sparsity-0.0` is the performance without pruning. Consistent with the paper, pruning around 80% also obtain similar performance compared to non-pruning, and converges a little faster. If pruning too much, e.g., larger than 94%, the accuracy becomes lower and convergence becomes a little slower. A little different from the paper, the trend of the data in the paper is relatively more clear.
diff --git a/docs/en_US/Compressor/Overview.md b/docs/en_US/Compressor/Overview.md
@@ -12,6 +12,7 @@ We have provided two naive compression algorithms and three popular ones for use
 |---|---|
 | [Level Pruner](./Pruner.md#level-pruner) | Pruning the specified ratio on each weight based on absolute values of weights |
 | [AGP Pruner](./Pruner.md#agp-pruner) | Automated gradual pruning (To prune, or not to prune: exploring the efficacy of pruning for model compression) [Reference Paper](https://arxiv.org/abs/1710.01878)|
+| [Lottery Ticket Pruner](./Pruner.md#agp-pruner) | The pruning process used by "The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks". It prunes a model iteratively. [Reference Paper](https://arxiv.org/abs/1803.03635)|
 | [FPGM Pruner](./Pruner.md#fpgm-pruner) | Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration [Reference Paper](https://arxiv.org/pdf/1811.00250.pdf)|
 | [Naive Quantizer](./Quantizer.md#naive-quantizer) |  Quantize weights to default 8 bits |
 | [QAT Quantizer](./Quantizer.md#qat-quantizer) | Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. [Reference Paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Jacob_Quantization_and_Training_CVPR_2018_paper.pdf)|

diff --git a/docs/en_US/Compressor/Pruner.md b/docs/en_US/Compressor/Pruner.md
@@ -92,6 +92,47 @@ You can view example for more information
 
 ***
 
+## Lottery Ticket Hypothesis
+[The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks](https://arxiv.org/abs/1803.03635), authors Jonathan Frankle and Michael Carbin,provides comprehensive measurement and analysis, and articulate the *lottery ticket hypothesis*: dense, randomly-initialized, feed-forward networks contain subnetworks (*winning tickets*) that -- when trained in isolation -- reach test accuracy comparable to the original network in a similar number of iterations.
+
+In this paper, the authors use the following process to prune a model, called *iterative prunning*:
+>1. Randomly initialize a neural network f(x;theta_0) (where theta_0 follows D_{theta}).
+>2. Train the network for j iterations, arriving at parameters theta_j.
+>3. Prune p% of the parameters in theta_j, creating a mask m.
+>4. Reset the remaining parameters to their values in theta_0, creating the winning ticket f(x;m*theta_0).
+>5. Repeat step 2, 3, and 4.
+
+If the configured final sparsity is P (e.g., 0.8) and there are n times iterative pruning, each iterative pruning prunes 1-(1-P)^(1/n) of the weights that survive the previous round.
+
+### Usage
+
+PyTorch code
+```python
+from nni.compression.torch import LotteryTicketPruner
+config_list = [{
+    'prune_iterations': 5,
+    'sparsity': 0.8,
+    'op_types': ['default']
+}]
+pruner = LotteryTicketPruner(model, config_list, optimizer)
+pruner.compress()
+for _ in pruner.get_prune_iterations():
+    pruner.prune_iteration_start()
+    for epoch in range(epoch_num):
+        ...
+```
+
+The above configuration means that there are 5 times of iterative pruning. As the 5 times iterative pruning are executed in the same run, LotteryTicketPruner needs `model` and `optimizer` (**Note that should add `lr_scheduler` if used**) to reset their states every time a new prune iteration starts. Please use `get_prune_iterations` to get the pruning iterations, and invoke `prune_iteration_start` at the beginning of each iteration. `epoch_num` is better to be large enough for model convergence, because the hypothesis is that the performance (accuracy) got in latter rounds with high sparsity could be comparable with that got in the first round. Simple reproducing results can be found [here](./LotteryTicketHypothesis.md).
+
+
+*Tensorflow version will be supported later.*
+
+#### User configuration for LotteryTicketPruner
+
+* **prune_iterations:** The number of rounds for the iterative pruning, i.e., the number of iterative pruning.
+* **sparsity:** The final sparsity when the compression is done.
+
+***
 ## FPGM Pruner
 FPGM Pruner is an implementation of paper [Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration](https://arxiv.org/pdf/1811.00250.pdf)
 

diff --git a/docs/img/lottery_ticket_mnist_fc.png b/docs/img/lottery_ticket_mnist_fc.png
diff --git a/examples/model_compress/lottery_torch_mnist_fc.py b/examples/model_compress/lottery_torch_mnist_fc.py
@@ -0,0 +1,83 @@
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+import torch.utils.data
+import torchvision.datasets as datasets
+import torchvision.transforms as transforms
+from nni.compression.torch import LotteryTicketPruner
+
+class fc1(nn.Module):
+
+    def __init__(self, num_classes=10):
+        super(fc1, self).__init__()
+        self.classifier = nn.Sequential(
+            nn.Linear(28*28, 300),
+            nn.ReLU(inplace=True),
+            nn.Linear(300, 100),
+            nn.ReLU(inplace=True),
+            nn.Linear(100, num_classes),
+        )
+
+    def forward(self, x):
+        x = torch.flatten(x, 1)
+        x = self.classifier(x)
+        return x
+
+def train(model, train_loader, optimizer, criterion):
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+    model.train()
+    for batch_idx, (imgs, targets) in enumerate(train_loader):
+        optimizer.zero_grad()
+        imgs, targets = imgs.to(device), targets.to(device)
+        output = model(imgs)
+        train_loss = criterion(output, targets)
+        train_loss.backward()
+        optimizer.step()
+    return train_loss.item()
+
+def test(model, test_loader, criterion):
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+    model.eval()
+    test_loss = 0
+    correct = 0
+    with torch.no_grad():
+        for data, target in test_loader:
+            data, target = data.to(device), target.to(device)
+            output = model(data)
+            test_loss += F.nll_loss(output, target, reduction='sum').item()  # sum up batch loss
+            pred = output.data.max(1, keepdim=True)[1]  # get the index of the max log-probability
+            correct += pred.eq(target.data.view_as(pred)).sum().item()
+        test_loss /= len(test_loader.dataset)
+        accuracy = 100. * correct / len(test_loader.dataset)
+    return accuracy
+
+
+if __name__ == '__main__':
+    transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
+    traindataset = datasets.MNIST('./data', train=True, download=True, transform=transform)
+    testdataset = datasets.MNIST('./data', train=False, transform=transform)
+    train_loader = torch.utils.data.DataLoader(traindataset, batch_size=60, shuffle=True, num_workers=0, drop_last=False)
+    test_loader = torch.utils.data.DataLoader(testdataset, batch_size=60, shuffle=False, num_workers=0, drop_last=True)
+
+    model = fc1().to("cuda" if torch.cuda.is_available() else "cpu")
+    optimizer = torch.optim.Adam(model.parameters(), lr=1.2e-3)
+    criterion = nn.CrossEntropyLoss()
+
+    configure_list = [{
+        'prune_iterations': 10,
+        'sparsity': 0.96,
+        'op_types': ['default']
+    }]
+    pruner = LotteryTicketPruner(model, configure_list, optimizer)
+    pruner.compress()
+
+    for i in pruner.get_prune_iterations():
+        pruner.prune_iteration_start()
+        loss = 0
+        accuracy = 0
+        for epoch in range(50):
+            loss = train(model, train_loader, optimizer, criterion)
+            accuracy = test(model, test_loader, criterion)
+            print('current epoch: {0}, loss: {1}, accuracy: {2}'.format(epoch, loss, accuracy))
+        print('prune iteration: {0}, loss: {1}, accuracy: {2}'.format(i, loss, accuracy))
+    pruner.export_model('model.pth', 'mask.pth')
diff --git a/examples/trials/sklearn/classification/main.py b/examples/trials/sklearn/classification/main.py
@@ -23,13 +23,13 @@
 import logging
 import numpy as np
 
-
 LOG = logging.getLogger('sklearn_classification')
 
 def load_data():
     '''Load dataset, use 20newsgroups dataset'''
     digits = load_digits()
-    X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target, random_state=99, test_size=0.25)
+    X_train, X_test, y_train, y_test = train_test_split(
+        digits.data, digits.target, random_state=99, test_size=0.25)
 
     ss = StandardScaler()
     X_train = ss.fit_transform(X_train)
@@ -59,7 +59,7 @@ def get_model(PARAMS):
 
     return model
 
-def run(X_train, X_test, y_train, y_test, PARAMS):
+def run(X_train, X_test, y_train, y_test, model):
     '''Train model and predict result'''
     model.fit(X_train, y_train)
     score = model.score(X_test, y_test)

diff --git a/examples/trials/sklearn/regression/main.py b/examples/trials/sklearn/regression/main.py
@@ -33,23 +33,22 @@
 def load_data():
     '''Load dataset, use boston dataset'''
     boston = load_boston()
-    X_train, X_test, y_train, y_test = train_test_split(boston.data, boston.target, random_state=99, test_size=0.25)
+    X_train, X_test, y_train, y_test = train_test_split(
+        boston.data, boston.target, random_state=99, test_size=0.25)
     #normalize data
     ss_X = StandardScaler()
     ss_y = StandardScaler()
 
     X_train = ss_X.fit_transform(X_train)
     X_test = ss_X.transform(X_test)
-    y_train = ss_y.fit_transform(y_train[:, None])[:,0]
-    y_test = ss_y.transform(y_test[:, None])[:,0]
+    y_train = ss_y.fit_transform(y_train[:, None])[:, 0]
+    y_test = ss_y.transform(y_test[:, None])[:, 0]
 
     return X_train, X_test, y_train, y_test
 
 def get_default_parameters():
     '''get default parameters'''
-    params = {
-        'model_name': 'LinearRegression'
-    }
+    params = {'model_name': 'LinearRegression'}
     return params
 
 def get_model(PARAMS):
@@ -76,8 +75,7 @@ def get_model(PARAMS):
         raise
     return model
 
-
-def run(X_train, X_test, y_train, y_test, PARAMS):
+def run(X_train, X_test, y_train, y_test, model):
     '''Train model and predict result'''
     model.fit(X_train, y_train)
     predict_y = model.predict(X_test)

diff --git a/src/nni_manager/common/manager.ts b/src/nni_manager/common/manager.ts
@@ -105,7 +105,7 @@ abstract class Manager {
     public abstract importData(data: string): Promise<void>;
     public abstract exportData(): Promise<string>;
 
-    public abstract addCustomizedTrialJob(hyperParams: string): Promise<void>;
+    public abstract addCustomizedTrialJob(hyperParams: string): Promise<number>;
     public abstract cancelTrialJobByUser(trialJobId: string): Promise<void>;
 
     public abstract listTrialJobs(status?: TrialJobStatus): Promise<TrialJobInfo[]>;

diff --git a/src/nni_manager/common/trainingService.ts b/src/nni_manager/common/trainingService.ts
@@ -58,11 +58,6 @@ interface TrialJobDetail {
     isEarlyStopped?: boolean;
 }
 
-interface HostJobDetail {
-    readonly id: string;
-    readonly status: string;
-}
-
 /**
  * define TrialJobMetric
  */

diff --git a/src/nni_manager/core/nnimanager.ts b/src/nni_manager/core/nnimanager.ts
@@ -50,13 +50,12 @@ class NNIManager implements Manager {
     private dispatcher: IpcInterface | undefined;
     private currSubmittedTrialNum: number;  // need to be recovered
     private trialConcurrencyChange: number; // >0: increase, <0: decrease
-    private customizedTrials: string[]; // need to be recovered
     private log: Logger;
     private dataStore: DataStore;
     private experimentProfile: ExperimentProfile;
     private dispatcherPid: number;
     private status: NNIManagerStatus;
-    private waitingTrials: string[];
+    private waitingTrials: TrialJobApplicationForm[];
     private trialJobs: Map<string, TrialJobDetail>;
     private trialDataForTuner: string;
     private readonly: boolean;
@@ -66,7 +65,6 @@ class NNIManager implements Manager {
     constructor() {
         this.currSubmittedTrialNum = 0;
         this.trialConcurrencyChange = 0;
-        this.customizedTrials = [];
         this.trainingService = component.get(TrainingService);
         assert(this.trainingService);
         this.dispatcherPid = 0;
@@ -131,19 +129,34 @@ class NNIManager implements Manager {
         return this.dataStore.exportTrialHpConfigs();
     }
 
-    public addCustomizedTrialJob(hyperParams: string): Promise<void> {
+    public addCustomizedTrialJob(hyperParams: string): Promise<number> {
         if (this.readonly) {
             return Promise.reject(new Error('Error: can not add customized trial job in readonly mode!'));
         }
         if (this.currSubmittedTrialNum >= this.experimentProfile.params.maxTrialNum) {
-            return Promise.reject(
-                new Error('reach maxTrialNum')
-            );
+            return Promise.reject(new Error('reach maxTrialNum'));
         }
-        this.customizedTrials.push(hyperParams);
+
+        // TODO: NNI manager should not peek tuner's internal protocol, let's refactor this later
+        const packedParameter = {
+            parameter_id: null,
+            parameter_source: 'customized',
+            parameters: JSON.parse(hyperParams)
+        }
+
+        const form: TrialJobApplicationForm = {
+            sequenceId: this.experimentProfile.nextSequenceId++,
+            hyperParameters: {
+                value: JSON.stringify(packedParameter),
+                index: 0
+            }
+        };
+        this.waitingTrials.push(form);
 
         // trial id has not been generated yet, thus use '' instead
-        return this.dataStore.storeTrialJobEvent('ADD_CUSTOMIZED', '', hyperParams);
+        this.dataStore.storeTrialJobEvent('ADD_CUSTOMIZED', '', hyperParams);
+
+        return Promise.resolve(form.sequenceId);
     }
 
     public async cancelTrialJobByUser(trialJobId: string): Promise<void> {
@@ -560,18 +573,7 @@ class NNIManager implements Manager {
                 this.trialConcurrencyChange = requestTrialNum;
             }
 
-            const requestCustomTrialNum: number = Math.min(requestTrialNum, this.customizedTrials.length);
-            for (let i: number = 0; i < requestCustomTrialNum; i++) {
-                // ask tuner for more trials
-                if (this.customizedTrials.length > 0) {
-                    const hyperParams: string | undefined = this.customizedTrials.shift();
-                    this.dispatcher.sendCommand(ADD_CUSTOMIZED_TRIAL_JOB, hyperParams);
-                }
-            }
-
-            if (requestTrialNum - requestCustomTrialNum > 0) {
-                this.requestTrialJobs(requestTrialNum - requestCustomTrialNum);
-            }
+            this.requestTrialJobs(requestTrialNum);
 
             // check maxtrialnum and maxduration here
             // NO_MORE_TRIAL is more like a subset of RUNNING, because during RUNNING tuner
@@ -609,26 +611,16 @@ class NNIManager implements Manager {
                         this.currSubmittedTrialNum >= this.experimentProfile.params.maxTrialNum) {
                         break;
                     }
-                    const hyperParams: string | undefined = this.waitingTrials.shift();
-                    if (hyperParams === undefined) {
-                        throw new Error(`Error: invalid hyper-parameters for job submission: ${hyperParams}`);
-                    }
+                    const form = this.waitingTrials.shift() as TrialJobApplicationForm;
                     this.currSubmittedTrialNum++;
-                    const trialJobAppForm: TrialJobApplicationForm = {
-                        sequenceId: this.experimentProfile.nextSequenceId++,
-                        hyperParameters: {
-                            value: hyperParams,
-                            index: 0
-                        }
-                    };
-                    this.log.info(`submitTrialJob: form: ${JSON.stringify(trialJobAppForm)}`);
-                    const trialJobDetail: TrialJobDetail = await this.trainingService.submitTrialJob(trialJobAppForm);
+                    this.log.info(`submitTrialJob: form: ${JSON.stringify(form)}`);
+                    const trialJobDetail: TrialJobDetail = await this.trainingService.submitTrialJob(form);
                     await this.storeExperimentProfile();
                     this.trialJobs.set(trialJobDetail.id, Object.assign({}, trialJobDetail));
                     const trialJobDetailSnapshot: TrialJobDetail | undefined = this.trialJobs.get(trialJobDetail.id);
                     if (trialJobDetailSnapshot != undefined) {
                         await this.dataStore.storeTrialJobEvent(
-                            trialJobDetailSnapshot.status, trialJobDetailSnapshot.id, hyperParams, trialJobDetailSnapshot);
+                            trialJobDetailSnapshot.status, trialJobDetailSnapshot.id, form.hyperParameters.value, trialJobDetailSnapshot);
                     } else {
                         assert(false, `undefined trialJobDetail in trialJobs: ${trialJobDetail.id}`);
                     }
@@ -734,7 +726,14 @@ class NNIManager implements Manager {
                     this.log.warning('It is not supposed to receive more trials after NO_MORE_TRIAL is set');
                     this.setStatus('RUNNING');
                 }
-                this.waitingTrials.push(content);
+                const form: TrialJobApplicationForm = {
+                    sequenceId: this.experimentProfile.nextSequenceId++,
+                    hyperParameters: {
+                        value: content,
+                        index: 0
+                    }
+                };
+                this.waitingTrials.push(form);
                 break;
             case SEND_TRIAL_JOB_PARAMETER:
                 const tunerCommand: any = JSON.parse(content);