Online implementation of the MASK R-CNN paper using Python 3, Keras and TF. The implementation extracts a desired label (out of 80 classes) and emphasize its ROI by converting other classes into B&W [link].
The model was pretrained on MS COCO dataset for segmented objectes with context. Each frame in the footage undergoes a detection process that returns a python dictionary containing bounding boxes, segmentation masks, likeliest detected class and score. See the following image of a prime time broadcasting, where the reporter has encountered an uninvited visitors :
The Mask R-CNN returns a label map after filtering the likeliest classes out of all estimated :
An auxiliary function was defined to extract full colors for the 'person' and 'dog' classes, leaving other classes as "0" / background (BG). By default, the Mask R-CNN returns any detectable classes (at color), bounding box and confidence level :
Using tensorflow's open-source library for object detection I implemented two models on frozen images, for classification and localization [link] :
- Mask R-CNN inception resnet v2 (Instance segmentation) :
That R-CNN is different from regular CNN for image classification, as it is used to focus on regions, since determining the location of multiple objects is essential to this type of model. The image is splitted into a dozens of different boxes (regions) to check if any of them have signs of an object at desired class. It then uses region proposal networks (RPN) which ranks the specific regions that most likely has the object [link]:
- DeepLab_v3 implmentation (Instance segmentation) [link] :
- Google Inc. state of the art implementation for DeepLab :
@ARTICLE{7913730,
author={L. {Chen} and G. {Papandreou} and I. {Kokkinos} and K. {Murphy} and A. L. {Yuille}},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
title={DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution,
and Fully Connected CRFs}, year={2018}, volume={40}, number={4}, pages={834-848},}
- Matterport Inc. amazing library implementation for Mask R-CNN :
@misc{matterport_maskrcnn_2017,
title={Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow},
author={Waleed Abdulla}, year={2017}, publisher={Github}, journal={GitHub repository},
howpublished={\url{https://github.com/matterport/Mask_RCNN}},
}
Python 3.4, TensorFlow 1.3, Keras 2.0.8 and other common packages listed in requirements.txt
.
To train or test on MS COCO, you'll also need:
- pycocotools (installation instructions below)
- MS COCO Dataset
- Download the 5K minival and the 35K validation-minus-minival subsets. More details in the original Faster R-CNN implementation.
If you use Docker, the code has been verified to work on this Docker container.
-
Clone the desired repository in the root directory
-
Install dependencies
pip3 install -r requirements.txt
-
Run setup from the repository root directory
python3 setup.py install
-
Download pre-trained COCO weights (mask_rcnn_coco.h5) from the releases page.
-
(Optional) To train or test on MS COCO install
pycocotools
from one of these repos. They are forks of the original pycocotools with fixes for Python3 and Windows (the official repo doesn't seem to be active anymore).- Linux: https://github.com/waleedka/coco
- Windows: https://github.com/philferriere/cocoapi. You must have the Visual C++ 2015 build tools on your path (see the repo for additional details)
Here is a link for several homeworks assignment from the Technion [crouse]