Skip to content

Latest commit

 

History

History
157 lines (106 loc) · 7.32 KB

File metadata and controls

157 lines (106 loc) · 7.32 KB

Chapter 01 - Road Segmentation using Fully Convolutional Network (FCN)

Overview

  • The course is about road segmentation using fully convolutional networks (FCN), which are a type of deep learning model for image segmentation.
  • We use the KITTI road data set, which contains images of roads and their segmentation masks.
  • The goal of Road Segmentation is to identify the drivable area of the road in an image, which is essential for Self-Driving Cars to plan their path and avoid obstacles.

Requirements

The Road Segmentation Problem

  • Input image (from KITTI Road Dataset):

  • It is straightforward for humans/drivers to discern between the drivable and non-drivable areas of the road.
  • Nevertheless, it is more challenging for cars to perceive the road in the same manner as humans do.
  • To achieve this, the car/machine utilizes a segmentation mask image to visually interpret the road, as illustrated below:

Solution: Computer Vision traditional Techniques vs FCN

  • Traditional* CV required hardcoding of segmentation values of each image:

    • lots of errors
    • does not generalized very good for a dynamic environment such as road vehicle with multiple scenarios
  • Deep learning technique such as FCN allows to learn multiples scenarios of the road from a large volume of data (datasets) and predict/ "readapt" to new scenarions.

FCN Architecture

Src: FCN architecture from the original paper

Fully Conv Net for Road Segmentation

FCN-8 architecture

  • We use the VGG-16 network as a backbone for the FCN model. VGG-16 is a pre-trained image classification network that can extract features from images.

VGG customized to FCN

  • We modify the VGG-16 network by adding upsampling layers and skip connections to produce a segmentation mask of the same size as the input image.

Upsampling operation increases the size of a given input (length x height).

There are four different methods of upsampling shown below (ranked from good to best):

  • Bed of nails: Assign the original pixel value to one cell and zero to the rest in the upsampled block.
  • Nearest neighbor: Assign the original pixel value to all the cells in the upsampled block.
  • Interpolation: Assign a weighted average of the nearest pixels in the original image to the upsampled cell.
  • Transpose convolution: Assign a learnable filter to the upsampled cell based on the original pixel value and its neighbors.

There are three variants of the FCN model:

  • FCN-32: Upsample the last convolutional layer of VGG-16 by 32 times to get the segmentation mask.
  • FCN-16: Upsample the last convolutional layer of VGG-16 by 2 times and add it to the second last convolutional layer, then upsample the result by 16 times to get the segmentation mask.
  • FCN-8: Upsample the last convolutional layer of VGG-16 by 2 times and add it to the second last convolutional layer, then upsample the result by 2 times and add it to the third last convolutional layer, then upsample the final result by 8 times to get the segmentation mask.

Lab: Resources & Notebooks

Resources

Paper Implementation Pipeline

1. Import Libraries
2. Load Dataset
    - Preprocessing: load image-seg mask pairs
    - train test validation split
3. Apply Transformations
    - normalization
    - load img train
    - load img test
    - display samples
4. Define Network (Build)
    - create object VGG
    - Customize to FCN
5. Train
    - Loss function
    - Check model
        - create a mask of the network prediction
        - show prediction
6. Test & Evaluation
    - weighted img: compute mask over img
    - process img mask: mask to compute single img
    - save prediction
    - save samples
7. Examples
    - Testing videos
        - play
        - process_image

Notebooks

Lab (local) Colab Kaggle
Lab repo Open notebook in Colab Kaggle

Additional Resources

Retrospective

We compare the results of the different upsampling methods and FCN variants on the KITTI road data set and shows that:

  • Interpolation works better than bed of nails, nearest neighbor, and transpose convolution for upsampling.
  • FCN-8 works better than FCN-32 and FCN-16 for segmentation, as it combines the coarse and fine features from different layers of VGG-16.
  • Add works better than concatenate for combining the skip connections, as it preserves the spatial information better.

References