Mask R-CNN for object detection and instance segmentation using Keras==2.7.0 and TensorFlow==2.7.0

The Mask-RCNN-TF2.7.0-keras2.7.0 project edits the original Mask_RCNN project, which only supports TensorFlow 1.0, so that it works on TensorFlow 2.7.0. Based on this new project, the Mask R-CNN can be trained and tested (i.e make predictions) in TensorFlow 2.7.0. The Mask R-CNN model generates bounding boxes and segmentation masks for each instance of an object in the image. It's based on Feature Pyramid Network (FPN) and a ResNet101 backbone.

Compared to the source code of the old Mask_RCNN project, the Mask-RCNN-TF2.7.0-keras2.8.0 project edits the following 2 modules:

model.py
utils.py

The Mask-RCNN-TF2.7.0-keras2.7.0 project is tested against TensorFlow 2.7.0, Keras 2.7.0-tf, and Python 3.8.10 for the following system specifications:

GPU - GeForce RTX 3060 12GiB
OS - Ubuntu20.04, Windows 10 and Windows 11

Note: This project does not support any of the available versions of Tensorflow1.

Use the Project Without Installation

It is not required to install the project. It is enough to copy the mrcnn directory to where you are using it.

Here are the steps to use the project for making predictions:

Create a root directory (e.g. ObjectDetection)
Copy the mrcnn directory inside the root directory.
Download the pre-trained weights inside the root directory. The weights can be downloaded from this link: https://github.com/matterport/Mask_RCNN/releases/download/v2.0/mask_rcnn_coco.h5.
Create a script for object detection and save it inside the root directory. This script is an example: samples/mrcnn-prediction.py. Its code is listed in the next section.
Run the script.

The directory tree of the project is as follows:

ObjectDetection:
	mrcnn:
	mask_rcnn_coco.h5
	mrcnn-prediction.py

Code for Prediction/Inference

The next code uses the pre-trained weights of the Mask R-CNN model based on the COCO dataset. The trained weights can be downloaded from this link: https://github.com/matterport/Mask_RCNN/releases/download/v2.0/mask_rcnn_coco.h5. The code is accessible through the samples/mrcnn-prediction.py script.

The COCO dataset has 80 classes. There is an additional class for the background named BG. Thus, the total number of classes is 81. The classes names are listed in the CLASS_NAMES list. DO NOT CHANGE THE ORDER OF THE CLASSES.

After making prediction, the code displays the input image after drawing the bounding boxes, masks, class labels, and prediction scores over all detected objects.

import mrcnn
import mrcnn.config
import mrcnn.model
import mrcnn.visualize
import cv2
import os

# load the class label names from disk, one label per line
# CLASS_NAMES = open("coco_labels.txt").read().strip().split("\n")

CLASS_NAMES = ['BG', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush']

class SimpleConfig(mrcnn.config.Config):
    # Give the configuration a recognizable name
    NAME = "coco_inference"
    
    # set the number of GPUs to use along with the number of images per GPU
    GPU_COUNT = 1
    IMAGES_PER_GPU = 1

    # Number of classes = number of classes + 1 (+1 for the background). The background class is named BG
    NUM_CLASSES = len(CLASS_NAMES)

# Initialize the Mask R-CNN model for inference and then load the weights.
# This step builds the Keras model architecture.
model = mrcnn.model.MaskRCNN(mode="inference", 
                             config=SimpleConfig(),
                             model_dir=os.getcwd())

# Load the weights into the model.
# Download the mask_rcnn_coco.h5 file from this link: https://github.com/matterport/Mask_RCNN/releases/download/v2.0/mask_rcnn_coco.h5
model.load_weights(filepath="mask_rcnn_coco.h5", 
                   by_name=True)

# load the input image, convert it from BGR to RGB channel
image = cv2.imread("sample_image.jpg")
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

# Perform a forward pass of the network to obtain the results
r = model.detect([image], verbose=0)

# Get the results for the first image.
r = r[0]

# Visualize the detected objects.
mrcnn.visualize.display_instances(image=image, 
                                  boxes=r['rois'], 
                                  masks=r['masks'], 
                                  class_ids=r['class_ids'], 
                                  class_names=CLASS_NAMES, 
                                  scores=r['scores'])

Transfer Learning

The kangaroo-transfer-learning dataset has both the data and code for training and testing the Mask R-CNN model using TensorFlow 2.7.0. Here is the content of the dataset directory:

kangaroo-transfer-learning:
	kangaroo:
		images:
		annots:
	kangaroo_training.py
	kangaroo_prediction.py

The kangaroo_training.py script does transfer learning to a pre-trained weights using the COCO dataset. Download these weights from here: https://github.com/matterport/Mask_RCNN/releases/download/v2.0/mask_rcnn_coco.h5

After the transfer learning completes, the trained weights are saved in the Kangaro_mask_rcnn_trained.h5 file.

The kangaroo_prediction.py makes prediction based on the trained weights.

Note that the Mask-RCNN-TF2.7.0-keras2.7.0 project uses the same training and testing code as in the old project.

The repository includes:

Source code of Mask R-CNN built on FPN and ResNet101 inside the mrcnn directory.
Training code for MS COCO
Jupyter notebooks to visualize the detection pipeline at every step
ParallelModel class for multi-GPU training
Evaluation on MS COCO metrics (AP)
Example of training on your own dataset

The code is documented and designed to be easy to extend. If you use it in your research, please consider citing this repository (bibtex below).

Getting Started

mrcnn-prediction.py: A script for loading the pre-trained weights and making predictions using the Mask R-CNN model.
coco_labels.txt: The class labels of the COCO dataset.
demo.ipynb Is the easiest way to start. It shows an example of using a model pre-trained on MS COCO to segment objects in your own images. It includes code to run object detection and instance segmentation on arbitrary images.
train_shapes.ipynb shows how to train Mask R-CNN on your own dataset. This notebook introduces a toy dataset (Shapes) to demonstrate training on a new dataset.
(model.py, utils.py, config.py): These files contain the main Mask RCNN implementation.
inspect_data.ipynb. This notebook visualizes the different pre-processing steps to prepare the training data.
inspect_model.ipynb This notebook goes in depth into the steps performed to detect and segment objects. It provides visualizations of every step of the pipeline.
inspect_weights.ipynb This notebooks inspects the weights of a trained model and looks for anomalies and odd patterns.

Step by Step Detection

To help with debugging and understanding the model, there are 3 notebooks (inspect_data.ipynb, inspect_model.ipynb, inspect_weights.ipynb) that provide a lot of visualizations and allow running the model step by step to inspect the output at each point. Here are a few examples:

1. Anchor sorting and filtering

Visualizes every step of the first stage Region Proposal Network and displays positive and negative anchors along with anchor box refinement.

2. Bounding Box Refinement

This is an example of final detection boxes (dotted lines) and the refinement applied to them (solid lines) in the second stage.

3. Mask Generation

Examples of generated masks. These then get scaled and placed on the image in the right location.

4.Layer activations

Often it's useful to inspect the activations at different layers to look for signs of trouble (all zeros or random noise).

5. Weight Histograms

Another useful debugging tool is to inspect the weight histograms. These are included in the inspect_weights.ipynb notebook.

6. Logging to TensorBoard

TensorBoard is another great debugging and visualization tool. The model is configured to log losses and save weights at the end of every epoch.

6. Composing the different pieces into a final result

Training on MS COCO

We're providing pre-trained weights for MS COCO to make it easier to start. You can use those weights as a starting point to train your own variation on the network. Training and evaluation code is in samples/coco/coco.py. You can import this module in Jupyter notebook (see the provided notebooks for examples) or you can run it directly from the command line as such:

# Train a new model starting from pre-trained COCO weights
python3 samples/coco/coco.py train --dataset=/path/to/coco/ --model=coco

# Train a new model starting from ImageNet weights
python3 samples/coco/coco.py train --dataset=/path/to/coco/ --model=imagenet

# Continue training a model that you had trained earlier
python3 samples/coco/coco.py train --dataset=/path/to/coco/ --model=/path/to/weights.h5

# Continue training the last model you trained. This will find
# the last trained weights in the model directory.
python3 samples/coco/coco.py train --dataset=/path/to/coco/ --model=last

You can also run the COCO evaluation code with:

# Run COCO evaluation on the last trained model
python3 samples/coco/coco.py evaluate --dataset=/path/to/coco/ --model=last

The training schedule, learning rate, and other parameters should be set in samples/coco/coco.py.

Training on Your Own Dataset

Start by reading this blog post about the balloon color splash sample. It covers the process starting from annotating images to training to using the results in a sample application.

In summary, to train the model on your own dataset you'll need to extend two classes:

Config This class contains the default configuration. Subclass it and modify the attributes you need to change.

Dataset This class provides a consistent way to work with any dataset. It allows you to use new datasets for training without having to change the code of the model. It also supports loading multiple datasets at the same time, which is useful if the objects you want to detect are not all available in one dataset.

See examples in samples/shapes/train_shapes.ipynb, samples/coco/coco.py, samples/balloon/balloon.py, and samples/nucleus/nucleus.py.

Differences from the Official Paper

This implementation follows the Mask RCNN paper for the most part, but there are a few cases where we deviated in favor of code simplicity and generalization. These are some of the differences we're aware of. If you encounter other differences, please do let us know.

Image Resizing: To support training multiple images per batch we resize all images to the same size. For example, 1024x1024px on MS COCO. We preserve the aspect ratio, so if an image is not square we pad it with zeros. In the paper the resizing is done such that the smallest side is 800px and the largest is trimmed at 1000px.
Bounding Boxes: Some datasets provide bounding boxes and some provide masks only. To support training on multiple datasets we opted to ignore the bounding boxes that come with the dataset and generate them on the fly instead. We pick the smallest box that encapsulates all the pixels of the mask as the bounding box. This simplifies the implementation and also makes it easy to apply image augmentations that would otherwise be harder to apply to bounding boxes, such as image rotation.

To validate this approach, we compared our computed bounding boxes to those provided by the COCO dataset. We found that ~2% of bounding boxes differed by 1px or more, ~0.05% differed by 5px or more, and only 0.01% differed by 10px or more.
Learning Rate: The paper uses a learning rate of 0.02, but we found that to be too high, and often causes the weights to explode, especially when using a small batch size. It might be related to differences between how Caffe and TensorFlow compute gradients (sum vs mean across batches and GPUs). Or, maybe the official model uses gradient clipping to avoid this issue. We do use gradient clipping, but don't set it too aggressively. We found that smaller learning rates converge faster anyway so we go with that.

Citation

Use this bibtex to cite this repository:

@misc{matterport_maskrcnn_2017,
  title={Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow},
  author={Waleed Abdulla},
  year={2017},
  publisher={Github},
  journal={GitHub repository},
  howpublished={\url{https://github.com/matterport/Mask_RCNN}},
}

Contributing

Contributions to this repository are welcome. Examples of things you can contribute:

Speed Improvements. Like re-writing some Python code in TensorFlow.
Training on other datasets.
Accuracy Improvements.
Visualizations and examples.

Requirements

Python 3 (tested on Python 3.8.10), TensorFlow 2.7.0, Keras 2.7.0-tf and other common packages listed in requirements.txt.

MS COCO Requirements:

To train or test on MS COCO, you'll also need:

pycocotools (installation instructions below)
MS COCO Dataset
Download the 5K minival and the 35K validation-minus-minival subsets. More details in the original Faster R-CNN implementation.

If you use Docker, the code has been verified to work on this Docker container.

Installation

Clone this repository

git clone https://github.com/Kamlesh364/Mask-RCNN-TF2.7.0-keras2.7.0

Install dependencies
```
pip3 install -r requirements.txt
```
Run setup from the repository root directory
```
python3 setup.py install
```
Download pre-trained COCO weights (mask_rcnn_coco.h5) from the releases page.
(Optional) To train or test on MS COCO install pycocotools from one of these repos. They are forks of the original pycocotools with fixes for Python3 and Windows (the official repo doesn't seem to be active anymore).
- Linux: https://github.com/waleedka/coco
- Windows: https://github.com/philferriere/cocoapi. You must have the Visual C++ 2015 build tools on your path (see the repo for additional details)

Projects Using this Model

If you extend this model to other datasets or build projects that use it, I'd love to hear from you.

4K Video Demo by Karol Majek.

Images to OSM: Improve OpenStreetMap by adding baseball, soccer, tennis, football, and basketball fields.

Splash of Color. A blog post explaining how to train this model from scratch and use it to implement a color splash effect.

Segmenting Nuclei in Microscopy Images. Built for the 2018 Data Science Bowl

Code is in the samples/nucleus directory.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Mask R-CNN for object detection and instance segmentation using Keras==2.7.0 and TensorFlow==2.7.0

Use the Project Without Installation

Code for Prediction/Inference

Transfer Learning

Getting Started

Step by Step Detection

1. Anchor sorting and filtering

2. Bounding Box Refinement

3. Mask Generation

4.Layer activations

5. Weight Histograms

6. Logging to TensorBoard

6. Composing the different pieces into a final result

Training on MS COCO

Training on Your Own Dataset

Differences from the Official Paper

Citation

Contributing

Requirements

MS COCO Requirements:

Installation

Projects Using this Model

4K Video Demo by Karol Majek.

Images to OSM: Improve OpenStreetMap by adding baseball, soccer, tennis, football, and basketball fields.

Splash of Color. A blog post explaining how to train this model from scratch and use it to implement a color splash effect.

Segmenting Nuclei in Microscopy Images. Built for the 2018 Data Science Bowl

Segmenting tooth in CBCT Images. Built for my own dataset and private project.

Detection and Segmentation for Surgery Robots by the NUS Control & Mechatronics Lab.

Reconstructing 3D buildings from aerial LiDAR

Usiigaci: Label-free Cell Tracking in Phase Contrast Microscopy

Characterization of Arctic Ice-Wedge Polygons in Very High Spatial Resolution Aerial Imagery

Mask-RCNN Shiny

Mapping Challenge: Convert satellite imagery to maps for use by humanitarian organisations.

GRASS GIS Addon to generate vector masks from geospatial imagery. Based on a Master's thesis by Ondřej Pešek.

Files

README.md

Latest commit

History

README.md

File metadata and controls

Mask R-CNN for object detection and instance segmentation using Keras==2.7.0 and TensorFlow==2.7.0

Use the Project Without Installation

Code for Prediction/Inference

Transfer Learning

Getting Started

Step by Step Detection

1. Anchor sorting and filtering

2. Bounding Box Refinement

3. Mask Generation

4.Layer activations

5. Weight Histograms

6. Logging to TensorBoard

6. Composing the different pieces into a final result

Training on MS COCO

Training on Your Own Dataset

Differences from the Official Paper

Citation

Contributing

Requirements

MS COCO Requirements:

Installation

Projects Using this Model

4K Video Demo by Karol Majek.

Images to OSM: Improve OpenStreetMap by adding baseball, soccer, tennis, football, and basketball fields.

Splash of Color. A blog post explaining how to train this model from scratch and use it to implement a color splash effect.

Segmenting Nuclei in Microscopy Images. Built for the 2018 Data Science Bowl

Segmenting tooth in CBCT Images. Built for my own dataset and private project.

Detection and Segmentation for Surgery Robots by the NUS Control & Mechatronics Lab.

Reconstructing 3D buildings from aerial LiDAR

Usiigaci: Label-free Cell Tracking in Phase Contrast Microscopy

Characterization of Arctic Ice-Wedge Polygons in Very High Spatial Resolution Aerial Imagery

Mask-RCNN Shiny

Mapping Challenge: Convert satellite imagery to maps for use by humanitarian organisations.

GRASS GIS Addon to generate vector masks from geospatial imagery. Based on a Master's thesis by Ondřej Pešek.