From afb9d61d7ed891920cbd1d5a241e04856871fce1 Mon Sep 17 00:00:00 2001 From: cjkkkk Date: Thu, 26 Dec 2019 23:52:36 +0800 Subject: [PATCH 1/9] update docs --- docs/en_US/Compressor/Overview.md | 66 ++++++++++++++++++++++++++---- docs/en_US/Compressor/Quantizer.md | 38 +---------------- 2 files changed, 59 insertions(+), 45 deletions(-) diff --git a/docs/en_US/Compressor/Overview.md b/docs/en_US/Compressor/Overview.md index f277de5c0f..b03cd2afec 100644 --- a/docs/en_US/Compressor/Overview.md +++ b/docs/en_US/Compressor/Overview.md @@ -30,6 +30,7 @@ We have provided several compression algorithms, including several pruning and q | [Naive Quantizer](./Quantizer.md#naive-quantizer) | Quantize weights to default 8 bits | | [QAT Quantizer](./Quantizer.md#qat-quantizer) | Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. [Reference Paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Jacob_Quantization_and_Training_CVPR_2018_paper.pdf)| | [DoReFa Quantizer](./Quantizer.md#dorefa-quantizer) | DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients. [Reference Paper](https://arxiv.org/abs/1606.06160)| +| [BNN Quantizer](./Quantizer.md#BNN-Quantizer) | Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1. [Reference Paper](https://arxiv.org/abs/1602.02830)| ## Usage of built-in compression algorithms @@ -61,17 +62,27 @@ The function call `pruner.compress()` modifies user defined model (in Tensorflow When instantiate a compression algorithm, there is `config_list` passed in. We describe how to write this config below. ### User configuration for a compression algorithm +When compressing a model, users may want to specify the ratio for sparsity, to specify different ratios for different types of operations, to exclude certain types of operations, or to compress only a certain types of operations. For users to express these kinds of requirements, we define a configuration specification. It can be seen as a python `list` object, where each element is a `dict` object. -When compressing a model, users may want to specify the ratio for sparsity, to specify different ratios for different types of operations, to exclude certain types of operations, or to compress only a certain types of operations. For users to express these kinds of requirements, we define a configuration specification. It can be seen as a python `list` object, where each element is a `dict` object. In each `dict`, there are some keys commonly supported by NNI compression: +The `dict`s in the `list` are applied one by one, that is, the configurations in latter `dict` will overwrite the configurations in former ones on the operations that are within the scope of both of them. + +#### Common keys +In each `dict`, there are some keys commonly supported by NNI compression: * __op_types__: This is to specify what types of operations to be compressed. 'default' means following the algorithm's default setting. * __op_names__: This is to specify by name what operations to be compressed. If this field is omitted, operations will not be filtered by it. * __exclude__: Default is False. If this field is True, it means the operations with specified types and names will be excluded from the compression. +#### Keys for quantization algorithms +**If you use quantization algorithms, you need to especify more keys. If you use pruning algorithms, you can safely skip these keys** + +* __quant_types__ : list of string. Type of quantization you want to apply, currently support 'weight', 'input', 'output'. + +#### Other keys specified for every compression algorithm There are also other keys in the `dict`, but they are specific for every compression algorithm. For example, some , some. -The `dict`s in the `list` are applied one by one, that is, the configurations in latter `dict` will overwrite the configurations in former ones on the operations that are within the scope of both of them. +#### example A simple example of configuration is shown below: ```python @@ -183,11 +194,9 @@ Some algorithms may want global information for generating masks, for example, a The interface for customizing quantization algorithm is similar to that of pruning algorithms. The only difference is that `calc_mask` is replaced with `quantize_weight`. `quantize_weight` directly returns the quantized weights rather than mask, because for quantization the quantized weights cannot be obtained by applying mask. ```python -# This is writing a Quantizer in tensorflow. -# For writing a Quantizer in PyTorch, you can simply replace -# nni.compression.tensorflow.Quantizer with -# nni.compression.torch.Quantizer -class YourQuantizer(nni.compression.tensorflow.Quantizer): +from nni.compression.torch.compressor import Quantizer + +class YourQuantizer(Quantizer): def __init__(self, model, config_list): """ Suggest you to use the NNI defined spec for config @@ -257,7 +266,46 @@ class YourQuantizer(nni.compression.tensorflow.Quantizer): """ pass ``` +#### customize backward function +Sometimes it's necessary for a quantization operation to have a customized backward function, such as Straight-Through Estimator, +user can customize a backward function as follow: + +```python +from nni.compression.torch.compressor import Quantizer, QuantGrad, QuantType -### Usage of user customized compression algorithm +class ClipGrad(QuantGrad): + @staticmethod + def quant_backward(tensor, grad_output, quant_type): + """ + This method should be overrided by subclass to provide customized backward function, + default implementation is Straight-Through Estimator + Parameters + ---------- + tensor : Tensor + input of quantization operation + grad_output : Tensor + gradient of the output of quantization operation + quant_type : QuantType + the type of quantization, it can be `QuantType.QUANT_INPUT`, `QuantType.QUANT_WEIGHT`, `QuantType.QUANT_OUTPUT`, + you can define different behavior for different types. + Returns + ------- + tensor + gradient of the input of quantization operation + """ + + # for quant_output function, set grad to zero if the absolute value of tensor is larger than 1 + if quant_type == QuantType.QUANT_OUTPUT: + grad_output[torch.abs(tensor) > 1] = 0 + return grad_output + + +class YourQuantizer(Quantizer): + def __init__(self, model, config_list): + super().__init__(model, config_list) + # set your customized backward function to overwrite default backward function + self.quant_grad = ClipGrad + +``` -__[TODO]__ ... +The default backward function for quant_weight, quant_input, quant_output is Straight-Through Estimator. \ No newline at end of file diff --git a/docs/en_US/Compressor/Quantizer.md b/docs/en_US/Compressor/Quantizer.md index 67791117e1..9f63f63681 100644 --- a/docs/en_US/Compressor/Quantizer.md +++ b/docs/en_US/Compressor/Quantizer.md @@ -51,16 +51,6 @@ quantizer.compress() You can view example for more information #### User configuration for QAT Quantizer -* **quant_types:** : list of string - -type of quantization you want to apply, currently support 'weight', 'input', 'output'. - -* **op_types:** list of string - -specify the type of modules that will be quantized. eg. 'Conv2D' - -* **op_names:** list of string - specify the name of modules that will be quantized. eg. 'conv1' * **quant_bits:** int or dict of {str : int} @@ -98,18 +88,6 @@ quantizer.compress() You can view example for more information #### User configuration for DoReFa Quantizer -* **quant_types:** : list of string - -type of quantization you want to apply, currently support 'weight', 'input', 'output'. - -* **op_types:** list of string - -specify the type of modules that will be quantized. eg. 'Conv2D' - -* **op_names:** list of string - -specify the name of modules that will be quantized. eg. 'conv1' - * **quant_bits:** int or dict of {str : int} bits length of quantization, key is the quantization type, value is the length, eg. {'weight': 8}, @@ -130,13 +108,13 @@ from nni.compression.torch import BNNQuantizer model = VGG_Cifar10(num_classes=10) configure_list = [{ - 'quant_types': ['weight'], 'quant_bits': 1, + 'quant_types': ['weight'], 'op_types': ['Conv2d', 'Linear'], 'op_names': ['features.0', 'features.3', 'features.7', 'features.10', 'features.14', 'features.17', 'classifier.0', 'classifier.3'] }, { - 'quant_types': ['output'], 'quant_bits': 1, + 'quant_types': ['output'], 'op_types': ['Hardtanh'], 'op_names': ['features.6', 'features.9', 'features.13', 'features.16', 'features.20', 'classifier.2', 'classifier.5'] }] @@ -148,18 +126,6 @@ model = quantizer.compress() You can view example [examples/model_compress/BNN_quantizer_cifar10.py]( https://github.com/microsoft/nni/tree/master/examples/model_compress/BNN_quantizer_cifar10.py) for more information. #### User configuration for BNN Quantizer -* **quant_types:** : list of string - -type of quantization you want to apply, currently support 'weight', 'input', 'output'. - -* **op_types:** list of string - -specify the type of modules that will be quantized. eg. 'Conv2D' - -* **op_names:** list of string - -specify the name of modules that will be quantized. eg. 'conv1' - * **quant_bits:** int or dict of {str : int} bits length of quantization, key is the quantization type, value is the length, eg. {'weight': 8}, From 0d2cf1a85f695b6e97836c08533050db25eb700b Mon Sep 17 00:00:00 2001 From: cjkkkk Date: Fri, 27 Dec 2019 00:01:07 +0800 Subject: [PATCH 2/9] update docs --- docs/en_US/Compressor/Overview.md | 9 ++++++++- docs/en_US/Compressor/Quantizer.md | 17 ++++++----------- 2 files changed, 14 insertions(+), 12 deletions(-) diff --git a/docs/en_US/Compressor/Overview.md b/docs/en_US/Compressor/Overview.md index b03cd2afec..3c4f856c25 100644 --- a/docs/en_US/Compressor/Overview.md +++ b/docs/en_US/Compressor/Overview.md @@ -76,7 +76,14 @@ In each `dict`, there are some keys commonly supported by NNI compression: #### Keys for quantization algorithms **If you use quantization algorithms, you need to especify more keys. If you use pruning algorithms, you can safely skip these keys** -* __quant_types__ : list of string. Type of quantization you want to apply, currently support 'weight', 'input', 'output'. +* __quant_types__ : list of string. + +Type of quantization you want to apply, currently support 'weight', 'input', 'output'. + +* __quant_bits__ : int or dict of {str : int} + +bits length of quantization, key is the quantization type, value is the length, eg. {'weight': 8}, +when the type is int, all quantization types share same bits length. #### Other keys specified for every compression algorithm There are also other keys in the `dict`, but they are specific for every compression algorithm. For example, some , some. diff --git a/docs/en_US/Compressor/Quantizer.md b/docs/en_US/Compressor/Quantizer.md index 9f63f63681..4eedcd3bb5 100644 --- a/docs/en_US/Compressor/Quantizer.md +++ b/docs/en_US/Compressor/Quantizer.md @@ -51,12 +51,9 @@ quantizer.compress() You can view example for more information #### User configuration for QAT Quantizer -specify the name of modules that will be quantized. eg. 'conv1' +common configuration needed by compression algorithms can be found at : [Common configuration](./Overview.md#User-configuration-for-a-compression-algorithm) -* **quant_bits:** int or dict of {str : int} - -bits length of quantization, key is the quantization type, value is the length, eg. {'weight': 8}, -when the type is int, all quantization types share same bits length. +configuration needed by this algorithm : * **quant_start_step:** int @@ -88,10 +85,9 @@ quantizer.compress() You can view example for more information #### User configuration for DoReFa Quantizer -* **quant_bits:** int or dict of {str : int} +common configuration needed by compression algorithms can be found at : [Common configuration](./Overview.md#User-configuration-for-a-compression-algorithm) -bits length of quantization, key is the quantization type, value is the length, eg. {'weight': 8}, -when the type is int, all quantization types share same bits length. +configuration needed by this algorithm : ## BNN Quantizer @@ -126,10 +122,9 @@ model = quantizer.compress() You can view example [examples/model_compress/BNN_quantizer_cifar10.py]( https://github.com/microsoft/nni/tree/master/examples/model_compress/BNN_quantizer_cifar10.py) for more information. #### User configuration for BNN Quantizer -* **quant_bits:** int or dict of {str : int} +common configuration needed by compression algorithms can be found at : [Common configuration](./Overview.md#User-configuration-for-a-compression-algorithm) -bits length of quantization, key is the quantization type, value is the length, eg. {'weight': 8}, -when the type is int, all quantization types share same bits length. +configuration needed by this algorithm : ### Experiment We implemented one of the experiments in [Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1](https://arxiv.org/abs/1602.02830), we quantized the **VGGNet** for CIFAR-10 in the paper. Our experiments results are as follows: From 7eb05b69d5938084b44190227d8b863355eeb10e Mon Sep 17 00:00:00 2001 From: cjkkkk Date: Fri, 27 Dec 2019 10:42:31 +0800 Subject: [PATCH 3/9] update doc --- docs/en_US/Compressor/Overview.md | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/docs/en_US/Compressor/Overview.md b/docs/en_US/Compressor/Overview.md index 3c4f856c25..cd969e2040 100644 --- a/docs/en_US/Compressor/Overview.md +++ b/docs/en_US/Compressor/Overview.md @@ -86,7 +86,7 @@ bits length of quantization, key is the quantization type, value is the length, when the type is int, all quantization types share same bits length. #### Other keys specified for every compression algorithm -There are also other keys in the `dict`, but they are specific for every compression algorithm. For example, some , some. +There are also other keys in the `dict`, but they are specific for every compression algorithm. For example, [Level Pruner](./Pruner.md#level-pruner) requires `sparsity` key to specify how much a model should be pruned. #### example @@ -273,9 +273,8 @@ class YourQuantizer(Quantizer): """ pass ``` -#### customize backward function -Sometimes it's necessary for a quantization operation to have a customized backward function, such as Straight-Through Estimator, -user can customize a backward function as follow: +#### Customize backward function +Sometimes it's necessary for a quantization operation to have a customized backward function, such as [Straight-Through Estimator](https://stackoverflow.com/questions/38361314/the-concept-of-straight-through-estimator-ste), user can customize a backward function as follow: ```python from nni.compression.torch.compressor import Quantizer, QuantGrad, QuantType @@ -315,4 +314,4 @@ class YourQuantizer(Quantizer): ``` -The default backward function for quant_weight, quant_input, quant_output is Straight-Through Estimator. \ No newline at end of file +If you do not customize `QuantGrad`, the default backward is Straight-Through Estimator. \ No newline at end of file From dbbebafd79a95aef4b0775b0efd6cc33bc88e52b Mon Sep 17 00:00:00 2001 From: cjkkkk Date: Fri, 27 Dec 2019 10:51:22 +0800 Subject: [PATCH 4/9] update doc --- docs/en_US/Compressor/Overview.md | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/docs/en_US/Compressor/Overview.md b/docs/en_US/Compressor/Overview.md index cd969e2040..44e08c9692 100644 --- a/docs/en_US/Compressor/Overview.md +++ b/docs/en_US/Compressor/Overview.md @@ -261,12 +261,10 @@ class YourQuantizer(Quantizer): return new_input - # note for pytorch version, there is no sess in input arguments - def update_epoch(self, epoch_num, sess): + def update_epoch(self, epoch_num): pass - # note for pytorch version, there is no sess in input arguments - def step(self, sess): + def step(self): """ Can do some processing based on the model or weights binded in the func bind_model From 27593954ff79bbf428c2b617fddacded7b5482f1 Mon Sep 17 00:00:00 2001 From: cjkkkk Date: Fri, 27 Dec 2019 11:41:23 +0800 Subject: [PATCH 5/9] add explain quant_types and quant_bits --- docs/en_US/Compressor/Overview.md | 23 ++++++++++++++++++----- 1 file changed, 18 insertions(+), 5 deletions(-) diff --git a/docs/en_US/Compressor/Overview.md b/docs/en_US/Compressor/Overview.md index 44e08c9692..3b26053df0 100644 --- a/docs/en_US/Compressor/Overview.md +++ b/docs/en_US/Compressor/Overview.md @@ -74,17 +74,30 @@ In each `dict`, there are some keys commonly supported by NNI compression: * __exclude__: Default is False. If this field is True, it means the operations with specified types and names will be excluded from the compression. #### Keys for quantization algorithms -**If you use quantization algorithms, you need to especify more keys. If you use pruning algorithms, you can safely skip these keys** +**If you use quantization algorithms, you need to specify more keys. If you use pruning algorithms, you can safely skip these keys** * __quant_types__ : list of string. -Type of quantization you want to apply, currently support 'weight', 'input', 'output'. +Type of quantization you want to apply, currently support 'weight', 'input', 'output'. 'weight' means applying quantization operation +to the weight parameter of modules. 'input' means applying quantization operation to the input of module forward method. 'output' means applying quantization operation to the output of module forward method, which is often called as 'activation' in some papers. * __quant_bits__ : int or dict of {str : int} -bits length of quantization, key is the quantization type, value is the length, eg. {'weight': 8}, -when the type is int, all quantization types share same bits length. - +bits length of quantization, key is the quantization type, value is the quantization bits length, eg. +``` +{ + quant_bits: { + 'weight': 8, + 'output': 4, + }, +} +``` +when the key is int type, all quantization types share same bits length. eg. +``` +{ + quant_bits: 8, # weight or output quantization are all 8 bits +} +``` #### Other keys specified for every compression algorithm There are also other keys in the `dict`, but they are specific for every compression algorithm. For example, [Level Pruner](./Pruner.md#level-pruner) requires `sparsity` key to specify how much a model should be pruned. From bda7ff5ebcd909131fe556cd1b9d9e3b43b31087 Mon Sep 17 00:00:00 2001 From: cjkkkk Date: Fri, 27 Dec 2019 13:55:17 +0800 Subject: [PATCH 6/9] update doc --- docs/en_US/Compressor/Overview.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/en_US/Compressor/Overview.md b/docs/en_US/Compressor/Overview.md index 3b26053df0..d2960075db 100644 --- a/docs/en_US/Compressor/Overview.md +++ b/docs/en_US/Compressor/Overview.md @@ -92,7 +92,7 @@ bits length of quantization, key is the quantization type, value is the quantiza }, } ``` -when the key is int type, all quantization types share same bits length. eg. +when the value is int type, all quantization types share same bits length. eg. ``` { quant_bits: 8, # weight or output quantization are all 8 bits From 2b2ad9949a7715919f54df19cbeeb95cdd296e12 Mon Sep 17 00:00:00 2001 From: cjkkkk Date: Fri, 27 Dec 2019 14:21:56 +0800 Subject: [PATCH 7/9] update doc --- docs/en_US/Compressor/Overview.md | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/docs/en_US/Compressor/Overview.md b/docs/en_US/Compressor/Overview.md index d2960075db..ac7df38028 100644 --- a/docs/en_US/Compressor/Overview.md +++ b/docs/en_US/Compressor/Overview.md @@ -1,8 +1,11 @@ # Compressor +As larger neural networks with more layers and nodes are considered, reducing their storage and computational cost becomes critical, especially for some real-time applications. Model compression can be used to address this problem. We are glad to announce the alpha release for model compression toolkit on top of NNI, it's still in the experiment phase which might evolve based on usage feedback. We'd like to invite you to use, feedback and even contribute. -NNI provides an easy-to-use toolkit to help user design and use compression algorithms. It supports Tensorflow and PyTorch with unified interface. For users to compress their models, they only need to add several lines in their code. There are some popular model compression algorithms built-in in NNI. Users could further use NNI's auto tuning power to find the best compressed model, which is detailed in [Auto Model Compression](./AutoCompression.md). On the other hand, users could easily customize their new compression algorithms using NNI's interface, refer to the tutorial [here](#customize-new-compression-algorithms). +NNI provides an easy-to-use toolkit to help user design and use compression algorithms. It currently supports PyTorch with unified interface. For users to compress their models, they only need to add several lines in their code. There are some popular model compression algorithms built-in in NNI. Users could further use NNI's auto tuning power to find the best compressed model, which is detailed in [Auto Model Compression](./AutoCompression.md). On the other hand, users could easily customize their new compression algorithms using NNI's interface, refer to the tutorial [here](#customize-new-compression-algorithms). + +For a survey of model compression, you can refer to this paper: [Recent Advances in Efficient Computation of Deep Convolutional Neural Networks](https://arxiv.org/pdf/1802.00939.pdf). ## Supported algorithms @@ -10,6 +13,8 @@ We have provided several compression algorithms, including several pruning and q **Pruning** +Pruning algorithms compresses the original network by removing redundant weights, which can reduce model complexity and address the over-fitting issue. + |Name|Brief Introduction of Algorithm| |---|---| | [Level Pruner](./Pruner.md#level-pruner) | Pruning the specified ratio on each weight based on absolute values of weights | @@ -25,6 +30,8 @@ We have provided several compression algorithms, including several pruning and q **Quantization** +Quantization algorithms compresses the original network by reducing the number of bits required to represent weights or activations, which can reduce the computations and the inference time. + |Name|Brief Introduction of Algorithm| |---|---| | [Naive Quantizer](./Quantizer.md#naive-quantizer) | Quantize weights to default 8 bits | From 2c6def24f9c9df8a00200df410fd93ef1e807f81 Mon Sep 17 00:00:00 2001 From: cjkkkk Date: Fri, 27 Dec 2019 14:24:47 +0800 Subject: [PATCH 8/9] update doc --- docs/en_US/Compressor/Overview.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/en_US/Compressor/Overview.md b/docs/en_US/Compressor/Overview.md index ac7df38028..b8e2903afb 100644 --- a/docs/en_US/Compressor/Overview.md +++ b/docs/en_US/Compressor/Overview.md @@ -13,7 +13,7 @@ We have provided several compression algorithms, including several pruning and q **Pruning** -Pruning algorithms compresses the original network by removing redundant weights, which can reduce model complexity and address the over-fitting issue. +Pruning algorithms compress the original network by removing redundant weights or channels of layers, which can reduce model complexity and address the over-fitting issue. |Name|Brief Introduction of Algorithm| |---|---| @@ -30,7 +30,7 @@ Pruning algorithms compresses the original network by removing redundant weights **Quantization** -Quantization algorithms compresses the original network by reducing the number of bits required to represent weights or activations, which can reduce the computations and the inference time. +Quantization algorithms compress the original network by reducing the number of bits required to represent weights or activations, which can reduce the computations and the inference time. |Name|Brief Introduction of Algorithm| |---|---| From fc9c62c991e5354342a1d4c5dd6bfb6228507997 Mon Sep 17 00:00:00 2001 From: cjkkkk Date: Fri, 27 Dec 2019 21:27:27 +0800 Subject: [PATCH 9/9] fix issue #1884 --- docs/en_US/Compressor/Quantizer.md | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/docs/en_US/Compressor/Quantizer.md b/docs/en_US/Compressor/Quantizer.md index 4eedcd3bb5..3308f25c1b 100644 --- a/docs/en_US/Compressor/Quantizer.md +++ b/docs/en_US/Compressor/Quantizer.md @@ -6,12 +6,10 @@ We provide Naive Quantizer to quantizer weight to default 8 bits, you can use it ### Usage tensorflow -```python -nni.compressors.tensorflow.NaiveQuantizer(model_graph).compress() +```python nni.compression.tensorflow.NaiveQuantizer(model_graph).compress() ``` pytorch -```python -nni.compressors.torch.NaiveQuantizer(model).compress() +```python nni.compression.torch.NaiveQuantizer(model).compress() ``` *** @@ -29,7 +27,7 @@ You can quantize your model to 8 bits with the code below before your training c PyTorch code ```python -from nni.compressors.torch import QAT_Quantizer +from nni.compression.torch import QAT_Quantizer model = Mnist() config_list = [{ @@ -72,7 +70,7 @@ To implement DoReFa Quantizer, you can add code below before your training code PyTorch code ```python -from nni.compressors.torch import DoReFaQuantizer +from nni.compression.torch import DoReFaQuantizer config_list = [{ 'quant_types': ['weight'], 'quant_bits': 8,