Training code of 4 variants of ResNet on ImageNet:
The training follows the exact standard recipe used by the Training ImageNet in 1 Hour paper and gets the same performance. Distributed training code & results can be found at tensorpack/benchmarks.
This recipe has better performance than most open source implementations. In fact, many papers that claim to "improve" ResNet by .5% only compete with a lower baseline and they actually cannot beat this standard ResNet recipe.
Model | Top 5 Error | Top 1 Error | Download |
---|---|---|---|
ResNet18 | 10.50% | 29.66% | ⬇️ |
ResNet34 | 8.56% | 26.17% | ⬇️ |
ResNet50 | 6.85% | 23.61% | ⬇️ |
ResNet50-SE | 6.24% | 22.64% | ⬇️ |
ResNet101 | 6.04% | 21.95% | ⬇️ |
ResNeXt101-32x4d | 5.73% | 21.05% | ⬇️ |
ResNet152 | 5.78% | 21.51% | ⬇️ |
To reproduce training or evaluation in the above table, first decompress ImageNet data into this structure, then:
./imagenet-resnet.py --data /directory/of/ILSVRC -d 50 --batch 512
./imagenet-resnet.py --data /directory/of/ILSVRC -d 50 --load ResNet50.npz --eval
# See ./imagenet-resnet.py -h for other options.
You should be able to see good GPU utilization (95%~99%) in training, if your data is fast enough. With batch=64x8, ResNet50 training can finish 100 epochs in 16 hours on AWS p3.16xlarge (8 V100s).
The default data pipeline is probably OK for machines with SSD & 20 CPU cores. See the tutorial on other options to speed up your data.
This script only converts and runs ImageNet-ResNet{50,101,152} Caffe models released by MSRA.
Note that the architecture is different from the imagenet-resnet.py
script and the models are not compatible.
ResNets have evolved, generally you'd better not cite these old numbers as baselines in your paper.
Usage:
# download and convert caffe model to npz format
python -m tensorpack.utils.loadcaffe PATH/TO/{ResNet-101-deploy.prototxt,ResNet-101-model.caffemodel} ResNet101.npz
# run on an image
./load-resnet.py --load ResNet-101.npz --input cat.jpg --depth 101
The converted models are verified on ILSVRC12 validation set. The per-pixel mean used here is slightly different from the original, but has negligible effect.
Model | Top 5 Error | Top 1 Error |
---|---|---|
ResNet 50 | 7.78% | 24.77% |
ResNet 101 | 7.11% | 23.54% |
ResNet 152 | 6.71% | 23.21% |
Reproduce pre-activation ResNet on CIFAR10.
Also see a DenseNet implementation of the paper Densely Connected Convolutional Networks.
Reproduce the mixup pre-act ResNet-18 CIFAR10 experiment, in the paper:
This implementation follows exact settings from the author's code. Note that the architecture is different from the offcial preact-ResNet18 in the ResNet paper.
Usage:
./cifar10-preact18-mixup.py # train without mixup
./cifar10-preact18-mixup.py --mixup # with mixup
Results of the reference code can be reproduced. In one run it gives me: 5.48% without mixup; 4.17% with mixup (alpha=1).