ImageNet training code of ResNet, ShuffleNet, DoReFa-Net, AlexNet, Inception, VGG with tensorpack.
To train any of the models, just do ./{model}.py --data /path/to/ilsvrc
.
More options are available in ./{model}.py -h
.
Expected format of data directory is described in docs.
Some pretrained models can be downloaded at tensorpack model zoo.
Reproduce ImageNet results of the following two papers:
- ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices
- ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design
Model | Flops | Top 1 Error | Flags |
---|---|---|---|
ShuffleNetV1 0.5x ⬇️ | 40M | 40.8% | -r=0.5 |
ShuffleNetV1 1x ⬇️ | 140M | 32.6% | -r=1 |
ShuffleNetV2 0.5x ⬇️ | 41M | 39.5% | -r=0.5 --v2 |
ShuffleNetV2 1x ⬇️ | 146M | 30.6% | -r=1 --v2 |
To print flops:
./shufflenet.py --flops [--other-flags]
Download and evaluate a pretrained model:
wget http://models.tensorpack.com/ImageNetModels/ShuffleNetV2-0.5x.npz
./shufflenet.py --eval --data /path/to/ilsvrc --load ShuffleNetV2-0.5x.npz --v2 -r=0.5
This AlexNet script is quite close to the settings in its original
paper.
Trained with 64x2 batch size, the script reaches 58% single-crop validation
accuracy after 100 epochs (21 hours on 2 V100s).
It also puts in tensorboard the first-layer filter visualizations similar to the paper.
See ./alexnet.py --help
for usage.
This VGG16 script, when trained with 32x8 batch size, reaches the following
validation error after 100 epochs (30h with 8 P100s). This is the code for the VGG
experiments in the paper Group Normalization.
See ./vgg16.py --help
for usage.
No Normalization | Batch Normalization | Group Normalization |
---|---|---|
29~30% (large variation with random seed) | 28% | 27.6% |
Note that the purpose of this experiment in the paper is not to claim GroupNorm is better than BatchNorm, therefore the training settings and hyperpameters have not been individually tuned for best accuracy.
This Inception-BN script reaches 27% single-crop validation error after 300k steps with 6 GPUs. The training recipe is very different from the original paper because the paper is a bit vague on these details.
See ResNet examples. It includes variants like pre-activation ResNet, squeeze-and-excitation networks.
See DoReFa-Net examples. It includes other quantization methods such as Binary Weight Network, Trained Ternary Quantization.