A Transfer Learning Approach using Pytorch
The Evolving amount of Data and processing level of GPU's helped the researchers in the field of Deep Learning to perform better computations using the largely available data in order to produce better results regarding the tasks of Deep Learning like Compter Vision and Natural Language Processing.
One such evolution in the field of Computer Vision is AlexNet - ImageNet Classification with Deep Convolutional Neural Networks
This architecture was designed in the year 2012 by Alex Krizhevsky in collaboration with his Ph.D Advisor - Geoffrey Hinton and Ilya Sutskever with 89098 citations as of today. It competed in ILSVRC'2010 and ILSVRC'2012.
This paper is considered as one of the most influential paper in the field of Computer Vision. The architecture of the model is comparitively similar to that of LeNet with some additional depth of layers and regularization method called Dropout which helps in reducing the effect of overfitting. This paper provides an intuition about working on Deep Convolutional Layers along with usage of Non-Saturating non-linearity called as ReLU and regularizations like Data Augmentation and Dropout.
To predict the class label of an image given as input from the provided dataset (CIFAR-10).
CIFAR-10 Dataset
This Dataset involves 50000 training samples and 10000 testing samples classified into 10 different classes.
Each image is a 3-channeled sample (RGB)
Python >= 3.0
PyTorch Version >= 0.4.0
torchvision >= 0.2.1
Consists of 8 Layers - 5 Convolutional Layers + 3 Fully-Connected Layers
Number of Image Channels = 3
Activation = ReLU
256x256 Input Size (Resized to 224x224 during preprocessing)
Convolutional Layer - Feature Maps : 64, Kernel Size : 11x11, Stride : 4, Padding : 2
ReLU Activation
Max Pooling layers - Kernel Size : 3x3, Stride : 2
Convolutional Layer - Feature Maps : 192, Kernel Size : 5x5, Padding : 2
ReLU Activation
Max Pooling layers - Kernel Size : 3x3, Stride : 2
Convolutional Layer - Feature Maps : 384, Kernel Size : 3x3, Padding : 1
ReLU Activation
Convolutional Layer - Feature Maps : 256, Kernel Size : 3x3, Padding : 1
ReLU Activation
Convolutional Layer - Feature Maps : 256, Kernel Size : 11x11, Padding : 1
ReLU Activation
Max Pooling layers - Kernel Size : 3x3, Stride : 2
Dropout - 0.5 (Probability of Dropping Neurons)
Fully Connected - 9216 --> 4096
ReLU Activation
Dropout - 0.5
Fully Connected - 4096 --> 1024
ReLU Activation
Fully Connected - 1024 --> 10
NOTE - In the Classifier, Second fully connected layer is modified from 4096 --> 4096 to 4096 --> 1024 in order to reduce overfitting and heavy losses during training as it is being trained for the first time on the data producing 10 classes instead of 1000 in case of ImageNet.
Accuracy Obtained after Pre-Training = 86.57 %
Accuracy Obtained after Fine-Tuning = 87.18 %