Skip to content

Tensorflow implementation : U-net and FCN with global convolution

Notifications You must be signed in to change notification settings

preritj/segmentation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Segmentation

This repository contains tensorflow implemenation of two models for semantic segmentations known to give high accuracy:

  • U-net (https://arxiv.org/abs/1505.04597) : The network can be found in u_net.py. Here is the architecture which I have borrowed from the paper : unet There are a few minor differences in my implementation. I have used 'same' padding to simplify things. For the upsampling, I have simply used tf.image.resize_images function (see layers_unet.py) . The full transpose convolution (deconvolution) layer is implemented for FCN described next.

  • FCN with global convolution (https://arxiv.org/abs/1703.02719) : The network can be found in fcn_gcn_net.py. Here is the architecture which I have borrowed from the paper : fcn_gcn Again, there are a few minor differences in my implementation. In particular, I have used VGG style encoder instead of ResNet blocks. All the layers/blocks used in the architecture (including the deconvolution layer) can be found in layers_fcn_gcn_net.py.

Kaggle - Carvana image masking challenge

I applied these models to one of the Kaggle competetions where the background behind the object (in this case : cars) had to be removed. More details can be found here : Kaggle : Carvana image masking challenge. Due to lack of time and resources, I ended up making only a single submission and got a score of 99.2% (winning solution had a score of 99.7%). For this particular challenge, since there is only one class, U-net is a better model choice. Here is a sample result when U-net is applied to test image: carvana_test
carvana_test_overlay

Scope for improvement : There are several strategies that could have improved the score but I did not use due to lack of time:

  • Image preprocessing and data augmentation
  • Use higher resolution images : I was using AWS which wasn't fast enough to handle high resolution images. So, I had to scale down images considerably which leads to lower accuracy.
  • Tiling sub-regions of high resolution image : This strategy will ensure that each tile can fit in the GPU but is obviously more time consuming.
  • Apply Conditional Random Field post-processing