microsoft · chicm-ms · Jun 29, 2020 · Jun 24, 2020 · Jun 24, 2020 · Jun 24, 2020
diff --git a/docs/en_US/Compressor/Framework.md b/docs/en_US/Compressor/Framework.md
@@ -1,6 +1,16 @@
-# Design Doc
+# Customize A New Compression Algorithm
 
-## Overview
+```eval_rst
+.. contents::
+```
+
+To simplify writing a new compression algorithm, we design programming interfaces which are simple but flexible enough. There are interfaces for pruning and quantization respectively. Below, we first demonstrate how to customize a new pruning algorithm and then demonstrate how to customize a new quantization algorithm.
+
+## Customize a new pruning algorithm
+
+To better demonstrate how to customize a new pruning algorithm, it is necessary for users to first understand the framework for supporting various pruning algorithms in NNI.
+
+### Framework overview for pruning algorithms
 
 Following example shows how to use a pruner:
 
@@ -26,11 +36,11 @@ A pruner receives `model`, `config_list` and `optimizer` as arguments. It prunes
 
 From implementation perspective, a pruner consists of a `weight masker` instance and multiple `module wrapper` instances.
 
-### Weight masker
+#### Weight masker
 
 A `weight masker` is the implementation of pruning algorithms, it can prune a specified layer wrapped by `module wrapper` with specified sparsity.
 
-### Module wrapper
+#### Module wrapper
 
 A `module wrapper` is a module containing:
 
@@ -43,7 +53,7 @@ the reasons to use `module wrapper`:
 1. some buffers are needed by `calc_mask` to calculate masks and these buffers should be registered in `module wrapper` so that the original modules are not contaminated.
 2. a new `forward` method is needed to apply masks to weight before calling the real `forward` method.
 
-### Pruner
+#### Pruner
 
 A `pruner` is responsible for:
 
@@ -52,9 +62,9 @@ A `pruner` is responsible for:
 3. Use `weight masker` to calculate masks of layers while pruning.
 4. Export pruned model weights and masks.
 
-## Implement a new pruning algorithm
+### Implement a new pruning algorithm
 
-Implementing a new pruning algorithm requires implementing a `weight masker` class which shoud be a subclass of `WeightMasker`, and a `pruner` class, which should a subclass `Pruner`.
+Implementing a new pruning algorithm requires implementing a `weight masker` class which shoud be a subclass of `WeightMasker`, and a `pruner` class, which should be a subclass `Pruner`.
 
 An implementation of `weight masker` may look like this:
 
@@ -74,7 +84,7 @@ class MyMasker(WeightMasker):
 
 You can reference nni provided [weight masker](https://github.com/microsoft/nni/blob/master/src/sdk/pynni/nni/compression/torch/pruning/structured_pruning.py) implementations to implement your own weight masker.
 
-A basic pruner looks likes this:
+A basic `pruner` looks likes this:
 
 ```python
 class MyPruner(Pruner):
@@ -142,3 +152,126 @@ self.pruner.remove_activation_collector(collector_id)
 
 On multi-GPU training, buffers and parameters are copied to multiple GPU every time the `forward` method runs on multiple GPU. If buffers and parameters are updated in the `forward` method, an `in-place` update is needed to ensure the update is effective.
 Since `calc_mask` is called in the `optimizer.step` method, which happens after the `forward` method and happens only on one GPU, it supports multi-GPU naturally.
+
+***
+
+## Customize a new quantization algorithm
+
+To write a new quantization algorithm, you can write a class that inherits `nni.compression.torch.Quantizer`. Then, override the member functions with the logic of your algorithm. The member function to override is `quantize_weight`. `quantize_weight` directly returns the quantized weights rather than mask, because for quantization the quantized weights cannot be obtained by applying mask.
+
+```python
+from nni.compression.torch import Quantizer
+
+class YourQuantizer(Quantizer):
+    def __init__(self, model, config_list):
+        """
+        Suggest you to use the NNI defined spec for config
+        """
+        super().__init__(model, config_list)
+
+    def quantize_weight(self, weight, config, **kwargs):
+        """
+        quantize should overload this method to quantize weight tensors.
+        This method is effectively hooked to :meth:`forward` of the model.
+
+        Parameters
+        ----------
+        weight : Tensor
+            weight that needs to be quantized
+        config : dict
+            the configuration for weight quantization
+        """
+
+        # Put your code to generate `new_weight` here
+
+        return new_weight
+
+    def quantize_output(self, output, config, **kwargs):
+        """
+        quantize should overload this method to quantize output.
+        This method is effectively hooked to `:meth:`forward` of the model.
+
+        Parameters
+        ----------
+        output : Tensor
+            output that needs to be quantized
+        config : dict
+            the configuration for output quantization
+        """
+
+        # Put your code to generate `new_output` here
+
+        return new_output
+
+    def quantize_input(self, *inputs, config, **kwargs):
+        """
+        quantize should overload this method to quantize input.
+        This method is effectively hooked to :meth:`forward` of the model.
+
+        Parameters
+        ----------
+        inputs : Tensor
+            inputs that needs to be quantized
+        config : dict
+            the configuration for inputs quantization
+        """
+
+        # Put your code to generate `new_input` here
+
+        return new_input
+
+    def update_epoch(self, epoch_num):
+        pass
+
+    def step(self):
+        """
+        Can do some processing based on the model or weights binded
+        in the func bind_model
+        """
+        pass
+```
+
+### Customize backward function
+
+Sometimes it's necessary for a quantization operation to have a customized backward function, such as [Straight-Through Estimator](https://stackoverflow.com/questions/38361314/the-concept-of-straight-through-estimator-ste), user can customize a backward function as follow:
+
+```python
+from nni.compression.torch.compressor import Quantizer, QuantGrad, QuantType
+
+class ClipGrad(QuantGrad):
+    @staticmethod
+    def quant_backward(tensor, grad_output, quant_type):
+        """
+        This method should be overrided by subclass to provide customized backward function,
+        default implementation is Straight-Through Estimator
+        Parameters
+        ----------
+        tensor : Tensor
+            input of quantization operation
+        grad_output : Tensor
+            gradient of the output of quantization operation
+        quant_type : QuantType
+            the type of quantization, it can be `QuantType.QUANT_INPUT`, `QuantType.QUANT_WEIGHT`, `QuantType.QUANT_OUTPUT`,
+            you can define different behavior for different types.
+        Returns
+        -------
+        tensor
+            gradient of the input of quantization operation
+        """
+
+        # for quant_output function, set grad to zero if the absolute value of tensor is larger than 1
+        if quant_type == QuantType.QUANT_OUTPUT: 
+            grad_output[torch.abs(tensor) > 1] = 0
+        return grad_output
+
+
+class YourQuantizer(Quantizer):
+    def __init__(self, model, config_list):
+        super().__init__(model, config_list)
+        # set your customized backward function to overwrite default backward function
+        self.quant_grad = ClipGrad
+
+```
+
+If you do not customize `QuantGrad`, the default backward is Straight-Through Estimator. 
+_Coming Soon_ ...
diff --git a/docs/en_US/Compressor/LotteryTicketHypothesis.md b/docs/en_US/Compressor/LotteryTicketHypothesis.md
diff --git a/docs/en_US/Compressor/ModelSpeedup.md b/docs/en_US/Compressor/ModelSpeedup.md
@@ -1,6 +1,6 @@
 # Speed up Masked Model
 
-*This feature is still in Alpha version.*
+*This feature is in Beta version.*
 
 ## Introduction