-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Conversation
docs/en_US/Compressor/Pruner.md
Outdated
- **trainer:** Function used for the first optimization subproblem. | ||
This function should include `model, optimizer, criterion, epoch, callback` as parameters, where callback should be inserted after loss.backward of the normal training process. | ||
- **optimize_iteration:** ADMM optimize iterations. | ||
- **training_epochs:** training epochs of the first optimization subproblem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is not clear what is "the first optimization subproblem". better to give a little more description in the introduction of this pruner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
docs/en_US/Compressor/Pruner.md
Outdated
- **optimize_iteration:** ADMM optimize iterations. | ||
- **training_epochs:** training epochs of the first optimization subproblem. | ||
- **row:** penalty parameters for ADMM training. | ||
- **base_algo:** base pruning algorithm. 'level', 'l1' or 'l2', by default 'l1'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this one does not have experiment_data_dir
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ADMMPruner is not an auto pruner, there is thus no experiment data generated. More explanation on what is included as experiment data added for the auto pruners.
docs/en_US/Compressor/Pruner.md
Outdated
|
||
|
||
## AutoCompress Pruner | ||
For each round t, AutoCompressPruner prune the model for the same sparsity each round to achive the ovrall sparsity: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ovrall -> overall
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
docs/en_US/Compressor/Pruner.md
Outdated
- **sparsity:** How much percentage of convolutional filters are to be pruned. | ||
- **op_types:** "Conv2d" or "default". | ||
- **trainer:** Function used for the first optimization subproblem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is not clear how to write trainer
. who should provide callback
? what is the reason to provide callback
? why it should be put behind loss.backward
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
docs/en_US/Compressor/Pruner.md
Outdated
This function should include `model, optimizer, criterion, epoch, callback` as parameters, where callback should be inserted after loss.backward of the normal training process. | ||
- **evaluator:** Function to evaluate the masked model. This function should include `model` as the only parameter, and returns a scalar value. | ||
- **dummy_input:** The dummy input for model speed up, users should put it on right device before pass in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why there is model speed up here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
speedup
is called inside AutoCompress
to keep the model un-masked and realize real pruning after each iteration.
docs/en_US/Compressor/Pruner.md
Outdated
- **dummy_input:** The dummy input for model speed up, users should put it on right device before pass in. | ||
- **iterations:** The number of overall iterations. | ||
- **optimize_mode:** Optimize mode, 'maximize' or 'minimize', by default 'maximize'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only this auto pruner supports optimize_mode
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optimize_mode
is supported in NetAdapt
SimualatedAnnealing
and AucoCompress
. Sorry for have missed this arg for NetAdaptPruner
.
docs/en_US/Compressor/Pruner.md
Outdated
- **cool_down_rate:** Simualated Annealing related parameter. | ||
- **perturbation_magnitude:** Initial perturbation magnitude to the sparsities. The magnitude decreases with current temperature. | ||
- **optimize_iteration:** ADMM optimize iterations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the relation with ADMM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AutoCompress Pruner call SimualtedAnnealing Pruner
and ADMM Pruner
iteratively.
docs/en_US/Compressor/Pruner.md
Outdated
- **perturbation_magnitude:** Initial perturbation magnitude to the sparsities. The magnitude decreases with current temperature. | ||
- **optimize_iteration:** ADMM optimize iterations. | ||
- **epochs:** training epochs of the first optimization subproblem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one also has two subproblems?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are args for ADMM
""" | ||
_logger.info('Starting AutoCompress pruning...') | ||
|
||
sparsity_each_round = 1 - pow(1-self._sparsity, 1/self._optimize_iterations) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use this sparsity strategy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This strategy is used to ensure that same number of weights will be pruned in each iteration.
1. Con = Res_i - delta_Res | ||
2. for every layer: | ||
Choose Num Filters to prune | ||
Choose which filter to prunee |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prunee -> prune
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
and fine tune the model for a short term after each pruning iteration. | ||
optimize_mode : str | ||
optimize mode, 'maximize' or 'minimize', by default 'maximize' | ||
base_algo : str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better to add a description that we use base_algo to choose which filter to prune
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
@@ -398,5 +402,176 @@ We try to reproduce the experiment result of the fully connected network on MNIS | |||
The above figure shows the result of the fully connected network. `round0-sparsity-0.0` is the performance without pruning. Consistent with the paper, pruning around 80% also obtain similar performance compared to non-pruning, and converges a little faster. If pruning too much, e.g., larger than 94%, the accuracy becomes lower and convergence becomes a little slower. A little different from the paper, the trend of the data in the paper is relatively more clear. | |||
|
|||
|
|||
## NetAdapt Pruner |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The order of each section is better consistent with the content directory/list at beginning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
|
||
# use speed up to prune the model before next iteration, because SimulatedAnnealingPruner & ADMMPruner don't take masked models | ||
self._model_to_prune.load_state_dict(torch.load(os.path.join( | ||
self._experiment_data_dir, 'model_admm_masked.pth'))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why reload the checkpoint?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The model weights have changed after admm pruner.
Penalty parameters for ADMM training. | ||
base_algo : str | ||
Base pruning algorithm. `level`, `l1` or `l2`, by default `l1`. | ||
Given the sparsity distrution among the ops, the assigned `base_algo` is used to decide which filters/channels/weights to prune. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
distrution -> distribution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
Add algo implementation / examples / test / doc for the following pruning algos: