Skip to content

Tanmay-21/Semanitic-Segmentation-with-U-Net

Repository files navigation

Semanitic-Segmentation-with-U-Net

Table of Contents:

  • Overview of Project

  • Data Description

  • Libraries used

  • Steps followed

  • Conclusion

  • How to replicate on your device

Overview of Project:

We are given Helen Dataset which contains images of faces of different persons. Our target is to classify each pixel as

  • bg (background)

  • face

  • lb (left brow)

  • rb (right brow)

  • le (left eye)

  • re (right eye)

  • nose

  • ulip

  • imouth

  • llip

  • hair

For this task we will be using the famous U-Net Architecture. U-Net Paper.

Data Description:

For this project , the Helen Dataset used can be downloaded from Helen Dataset.
For each image. It has 3 types of files. One is image.jpg which has the file which will be loaded to the model. Second is the label.png file which has all pixel by pixel classications of the image. The viz.jpg file is just for demonstation purpose and is not of any use to the model.
Following is the directory structure

.
└── helenstar_release
     ├── train
     │   ├── image.jpg
     │   ├── label.png
     ├   ├── viz.jpg
     │   └── ... (1999 sets of 3 images i.e 5997 images total)           
     └── test
         ├── image.jpg
         ├── label.png
         ├── viz.jpg
         └── ... (100 sets of 3 images i.e 300 images total)

Total number of images in dataset : 2099
Number of images in train set : 1999
Number of images in test set : 100

For convenience I will be performing some shiftings to put all image.jpg files in one folders , label.png in other. I will be doing this using the shutil module of python

The final directory strucutre will be as follows

.
└── splitted_Data
        ├── train
        │   ├── images
        │   │   ├── image1.jpg
        │   │   ├── image2.jpg
        │   │   └── ... (1999 files)
        │   └── labels
        │       ├── label1.jpg
        │       ├── label2.jpg
        │       └── ... (1999 files)       
        │           
        └── test
            ├── images
            │   ├── image1.jpg
            │   ├── image2.jpg
            │   └── ... (100 files)
            └── labels
                ├── label1.jpg
                ├── label2.jpg
                └── ... (100 files)

Libraries used:

  • Numpy

  • Matplotlib

  • torch

  • torchvision

  • PIL

  • os module of python

  • tqdm

  • shutil

Steps Followed

  • 1. Importing Necessary Libraries

Getting all the required python libraries required for the implementation of the project

  • 2. Looking at directory structure and making desired shiftings

As shown above, the directory strucutre in the link is changed so that it is easy to execute in the later part

  • 3. Data Preprocessing

Now all the images have different dimensions. But to feed them into the model, all the images need to be of the same size. I resized all the images to 256x256. Also when we load images , they are usually loaded in the form of numpy array with dtype = uint8 . They need to be converted to tensors with dtype = torch.float32

  • 4. Defining Train and Test Dataloaders

Now the train dataset has 1999 images which can not be fed in one go. I use Mini Batch Gradient Descent. with a batch size of 10.

  • 5. Defining model

The model architecture is shown in the picture below

It has a encoding path and a decoding path. The architecture is difficult to code in one class UNet(nn.Module. So I define some classes before hand which can help to make our code concise and simple to read

The input to the model is of shape 10x3x256x256 and output is 10x11x256x256

  • 6. Defining Dice Loss and optimizer

For this problem we will be defining the DiceLoss. As there is not pre defined Diceloss in pytorch, We will be defining it on our own. The code is inspired from An overview of semantic image segmentation.

  • 7. Performing Forward Propagation

I perform forward propagation for 30 epochs and print losses. Based on the trend of losses, I have occationally interrupted execution and reduced learning rate

  • 8. Visualizing train loss

  • 9. Visualizing predictions


These are predictions on train set, We'll see predictions on test sets in conclusion part

How to replicate on your device

Just store the data as per the directory structures shown in the code. Run the code You can get the pre trained weights for the model at this link Pre-Trained weights.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published