Name		Name	Last commit message	Last commit date
parent directory ..
dataset		dataset
models		models
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
args.py		args.py
benchmarks.yml		benchmarks.yml
checkpoint.py		checkpoint.py
configs.yml		configs.yml
finetune.py		finetune.py
ipu_options.py		ipu_options.py
log.py		log.py
metrics.py		metrics.py
mpi_utils.py		mpi_utils.py
optimization.py		optimization.py
pretrain.py		pretrain.py
requirements.txt		requirements.txt
run_multihosts.sh		run_multihosts.sh
run_singlehost.sh		run_singlehost.sh
test_vit.py		test_vit.py
validation.py		validation.py

README.md

ViT (Vision Transformer)

Vision Transformer for image recognition, optimised for Graphcore's IPU. Based on the models provided by the transformers library and from jeonsworld

Run our ViT on Paperspace.

Framework	Domain	Model	Datasets	Tasks	Training	Inference	Reference
PyTorch	Vision	ViT	ImageNet LSVRC 2012, CIFAR-10	Image recognition	✅ Min. 16 IPUs (POD16) required	❌	An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Instructions summary

Install and enable the Poplar SDK (see Poplar SDK setup)
Install the system and Python requirements (see Environment setup)
Download the ImageNet LSVRC 2012 dataset (See Dataset setup)

Poplar SDK setup

To check if your Poplar SDK has already been enabled, run:

 echo $POPLAR_SDK_ENABLED

If no path is provided, then follow these steps:

Navigate to your Poplar SDK root directory
Enable the Poplar SDK with:

cd poplar-<OS version>-<SDK version>-<hash>
. enable.sh

Additionally, enable PopART with:

cd popart-<OS version>-<SDK version>-<hash>
. enable.sh

More detailed instructions on setting up your Poplar environment are available in the Poplar quick start guide.

Environment setup

To prepare your environment, follow these steps:

Create and activate a Python3 virtual environment:

python3 -m venv <venv name>
source <venv path>/bin/activate

Navigate to the Poplar SDK root directory
Install the PopTorch (PyTorch) wheel:

cd <poplar sdk root dir>
pip3 install poptorch...x86_64.whl

Navigate to this example's root directory
Install the Python requirements:

pip3 install -r requirements.txt

More detailed instructions on setting up your PyTorch environment are available in the PyTorch quick start guide.

Dataset setup

ImageNet LSVRC 2012

Download the ImageNet LSVRC 2012 dataset from the source or via kaggle

Disk space required: 144GB

.
├── bounding_boxes
├── imagenet_2012_bounding_boxes.csv
├── train
└── validation

3 directories, 1 file

Running and benchmarking

To run a tested and optimised configuration and to reproduce the performance shown on our performance results page, use the examples_utils module (installed automatically as part of the environment setup) to run one or more benchmarks. The benchmarks are provided in the benchmarks.yml file in this example's root directory.

For example:

python3 -m examples_utils benchmark --spec <path to benchmarks.yml file>

Or to run a specific benchmark in the benchmarks.yml file provided:

python3 -m examples_utils benchmark --spec <path to benchmarks.yml file> --benchmark <name of benchmark>

For more information on using the examples-utils benchmarking module, please refer to the README.

Custom training

Pretraining

In pre-training Imagenet1k, micro_batch_size is set to 8, and all other parameters are tuned to reach a validation accuracy that is higher than 74.79% (released in google official repository). To achieve maximum throughput, micro_batch_size can be set to 14, then hyperparameters tuning is required to reach this validation accuracy.

Fine-Tuning

You can run ViT fine-tuning on either CIFAR10 or ImageNet1k datasets. The default pretrained checkpoint is loaded from google/vit-base-patch16-224-in21k. The commands for fine-tuning are:

CIFAR10 fine-tuning:

python finetune.py --config b16_cifar10

Once the fine-tuning finishes, you can validate:

python validation.py --config b16_cifar10_valid

To run ImageNet1k fine-tuning you need to first download the data as described above.

ImageNet1k fine-tuning:

python finetune.py --config b16_imagenet1k

Afterwards run ImageNet1k validation:

python validation.py --config b16_imagenet1k_valid

Other features

Employing automatic loss scaling (ALS) for half precision training

ALS is a feature in the Poplar SDK which brings stability to training large models in half precision, specially when gradient accumulation and reduction across replicas also happen in half precision.

NB. This feature expects the poptorch training option accumulationAndReplicationReductionType to be set to poptorch.ReductionType.Mean, and for accumulation by the optimizer to be done in half precision (using accum_type=torch.float16 when instantiating the optimizer), or else it may lead to unexpected behaviour.

To employ ALS for ImageNet1k fine-tuning on a POD16, the following command can be used:

python3 finetune.py --config b16_imagenet1k_ALS

Licensing

This application is licensed under Apache License 2.0. Please see the LICENSE file in this directory for full details of the license conditions.

The following files are created by Graphcore and are licensed under Apache License, Version 2.0 (^* means additional license information stated following this list):

dataset/__init__.py
dataset/customized_randaugment.py^*
dataset/dataset.py
dataset/mixup_utils.py^*
dataset/preprocess.py
models/__init__.py
models/modules.py^*
models/pipeline_model.py
models/utils.py
.gitignore
args.py
checkpoint.py
configs.yaml
finetune.py
ipu_options.py
LICENSE
log.py
metrics.py
optimization.py
pretrain.py
README.md
requirements.txt
run_singlehost.sh
run_multihosts.sh
test_vit.py
validation.py

The following file include code derived from this file which is CC-BY-NC-licensed:

dataset/mixup_utils.py

The following file include code derived from this file which is MIT licensed, and from this file which is Apache Version 2.0 licensed:

models/modules.py

External packages:

transformers and horovod are licenced under Apache License, Version 2.0
pyyaml, wandb, pytest, pytest-pythonpath, randaugment and attrdict are licensed under MIT License
torchvision is licensed under BSD 3-Clause License
pillow is licensed under the open source HPND License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pytorch

pytorch

README.md

ViT (Vision Transformer)

Instructions summary

Poplar SDK setup

Environment setup

Dataset setup

ImageNet LSVRC 2012

Running and benchmarking

Custom training

Pretraining

Fine-Tuning

Other features

Employing automatic loss scaling (ALS) for half precision training

Licensing

Files

pytorch

Directory actions

More options

Directory actions

More options

Latest commit

History

pytorch

Folders and files

parent directory

README.md

ViT (Vision Transformer)

Instructions summary

Poplar SDK setup

Environment setup

Dataset setup

ImageNet LSVRC 2012

Running and benchmarking

Custom training

Pretraining

Fine-Tuning

Other features

Employing automatic loss scaling (ALS) for half precision training

Licensing