-
Overview of Project
-
Data Description
-
Libraries used
-
Steps followed
-
Conclusion
-
How to replicate on your device
We are given Helen Dataset which contains images of faces of different persons. Our target is to classify each pixel as
-
bg (background)
-
face
-
lb (left brow)
-
rb (right brow)
-
le (left eye)
-
re (right eye)
-
nose
-
ulip
-
imouth
-
llip
-
hair
For this task we will be using the famous U-Net Architecture. U-Net Paper.
For this project , the Helen Dataset used can be downloaded from Helen Dataset.
For each image. It has 3 types of files. One is image.jpg which has the file which will be loaded to the model. Second is the label.png file which has all pixel by pixel classications of the image. The viz.jpg file is just for demonstation purpose and is not of any use to the model.
Following is the directory structure
. └── helenstar_release ├── train │ ├── image.jpg │ ├── label.png ├ ├── viz.jpg │ └── ... (1999 sets of 3 images i.e 5997 images total) └── test ├── image.jpg ├── label.png ├── viz.jpg └── ... (100 sets of 3 images i.e 300 images total)
Total number of images in dataset : 2099
Number of images in train set : 1999
Number of images in test set : 100
For convenience I will be performing some shiftings to put all image.jpg files in one folders , label.png in other. I will be doing this using the shutil module of python
The final directory strucutre will be as follows
. └── splitted_Data ├── train │ ├── images │ │ ├── image1.jpg │ │ ├── image2.jpg │ │ └── ... (1999 files) │ └── labels │ ├── label1.jpg │ ├── label2.jpg │ └── ... (1999 files) │ └── test ├── images │ ├── image1.jpg │ ├── image2.jpg │ └── ... (100 files) └── labels ├── label1.jpg ├── label2.jpg └── ... (100 files)
-
Numpy
-
Matplotlib
-
torch
-
torchvision
-
PIL
-
os module of python
-
tqdm
-
shutil
Getting all the required python libraries required for the implementation of the project
As shown above, the directory strucutre in the link is changed so that it is easy to execute in the later part
Now all the images have different dimensions. But to feed them into the model, all the images need to be of the same size. I resized all the images to 256x256. Also when we load images , they are usually loaded in the form of numpy array with dtype = uint8 . They need to be converted to tensors with dtype = torch.float32
Now the train dataset has 1999 images which can not be fed in one go. I use Mini Batch Gradient Descent. with a batch size of 10.
The model architecture is shown in the picture below
It has a encoding path and a decoding path. The architecture is difficult to code in one class UNet(nn.Module. So I define some classes before hand which can help to make our code concise and simple to read
The input to the model is of shape 10x3x256x256 and output is 10x11x256x256
For this problem we will be defining the DiceLoss. As there is not pre defined Diceloss in pytorch, We will be defining it on our own. The code is inspired from An overview of semantic image segmentation.
I perform forward propagation for 30 epochs and print losses. Based on the trend of losses, I have occationally interrupted execution and reduced learning rate
These are predictions on train set, We'll see predictions on test sets in conclusion part
Just store the data as per the directory structures shown in the code. Run the code You can get the pre trained weights for the model at this link Pre-Trained weights.