This is a small repo illustrating how to use WebDataset on ImageNet. using the PyTorch Lightning framework.
First, create the virtualenv:
$ ./run venv # make virtualenv
Next, you need to shard the ImageNet data:
$ ln -s /some/imagenet/directory data
$ mkdir shards
$ ./run makeshards # create shards
Run the training script:
$ ./run train -b 128 --gpus 2 # run the training jobs using PyTorch lightning
Of course, for local data, there is no need to go through this trouble. However, you can now easily train remotely, for example by putting the data on a webserver:
$ rsync -av shards webserver:/var/www/html/shards
$ ./run train --gpus 2 --bucket http://webserver/shards
The AIStore server is a high performance S3-compatible storage server (and web server) that works very with WebDataset.