We extend the maskrcnn-benchmark to implement the relation aware panoptic segmentation models in PyTorch 1.0.
paper is available here
- Panoptic FPN: We implement the Panoptic Feature Pyramid Network (FPN) in PyTorch 1.0.
- Relation Aware Panoptic (RAP) Network: We implement the relation aware panoptic segmentation network for resolving instance conflicts during the fusion.
- New Fusion Strategy: A new fusion method incorporating relation aware module and two overlapping area ratio check strategies.
- Implement OCFusion: We also implement an occluded head described in OCFusion for comparison.
- Multi-GPU training and inference
- Pre-trained models: We provide pre-trained checkpoints for panoptic FPN and RAP with all backbones on COCO dataset.
Check INSTALL.md for installation instructions.
Please first download the COCO panoptic segmentation dataset. And you will have RGB images and the corresponding original annotations. Next we specify how to prepare annotations for each segmentation task.
COCO provides panoptic_train2017.json
and panoptic_val2017.json
annotations as well as corresponding .png
images for panoptic segmentation task. Each segment (either thing or stuff) has an unique id in .json
file and is colored as [R,G,B] in the .png
image, where id=R+G*256+B*256^2. More information can be found here.
For panoptic segmentation annotations, no post-process is needed.
For instance segmentation task, two annotation files are directly provided (instances_train2017.json
, instances_val2017.json
). Furthermore, we should extract the ground truth data for occluded instance relations.
Given an image with its corresponding instance masks and panoptic segmentation map as follows, we can find that the overlapping pixels are assigned to the wine glass in the panoptic ground truth map. Therefore, an instance relation that "the wine glass is closer than the dining table" can be extracted.
Here we provide two scripts in datasets/coco
to extract all these relations based on different overlapping threshold. Run these two scripts in the following order.
-
retrieve_relation.py
Read panoptic and instance segmentation ground truth to produce a.pkl
file that contains all occluded instance pairs. -
refine_instance_json_file.py
Refine the.json
file by adding a field (overlap
) for each object to store a list of instance ids. Instances in these list are all have significant overlap with this object and is on top of (or in front of) it.
Also you can directly download the refined instance segmentation annotation files at train_relation.json and val_relation.json (overlapping ratio is 0.2).
For semantic segmentation task in COCO panoptic benchmark, there are only 53 stuff categories are used. Since no semantic annotations are provided, panopticapi provides a script to translate panoptic segmentation into semantic segmentation.
Here we also provide a script to regard all instance categories as a whole (Same as Panoptic FPN). Scripts are in the datasets/coco/
directory.
panoptic2semantic_segmentation.py
panoptic2semantic_segmentation_pfpn_style.py
In this way, the annotations for semantic segmentation task is save as .png
images in annotations/semantic_train2017
and annotations/semantic_val2017
.
Use symlink the path to the coco dataset to datasets/
as follows
# COCO
ln -s /path_to_coco_dataset/annotations datasets/coco/annotations
ln -s /path_to_coco_dataset/train2017 datasets/coco/images/train2017
ln -s /path_to_coco_dataset/test2017 datasets/coco/images/test2017
ln -s /path_to_coco_dataset/val2017 datasets/coco/images/val2017
In the annotations
directory, there are
instances_train2017_with_relation.json # annotations for instance segmentation with relations
instances_val2017_with_relation.json # annotations for instance segmentation with relations
panoptic_train2017.json # annotations for panoptic segmentation
panoptic_val2017.json # annotations for panoptic segmentation
--panoptic_train2017 # annotated images as panoptic way
--panoptic_val2017 # annotated images as panoptic way
--semantic_train2017 # annotated images as semantic way
--semantic_val2017 # annotated images as semantic way
To configure your own paths to the datasets, please follow the instruction in maskrcnn-benchmark.
For the following examples to work, you need to first install RAP
.
1. Run the following without modifications
python /path_to_RAP/tools/train_net.py --config-file "/path/to/config/file.yaml"
we set in the configuration files a global batch size that is divided over the number of GPUs. So if we only have a single GPU, this means that the batch size for that GPU will be 8x larger, which might lead to out-of-memory errors.
2. Modify the cfg parameters
If you experience out-of-memory errors, you can reduce the global batch size. But this means that you'll also need to change the learning rate, the number of iterations and the learning rate schedule.
Here is an example for Panoptic FPN R-50 with the 1x schedule:
python tools/train_net.py --config-file "configs/e2e_panoptic_rcnn_R_50_FPN_1x.yaml" SOLVER.IMS_PER_BATCH 2 SOLVER.BASE_LR 0.0025 SOLVER.MAX_ITER 1200000 SOLVER.STEPS "(640000, 960000)" TEST.IMS_PER_BATCH 1
This follows the scheduling rules from Detectron. Note that we have multiplied the number of iterations by 8x (as well as the learning rate schedules), and we have divided the learning rate by 8x.
Since we set MODEL.RPN.FPN_POST_NMS_PER_BATCH
to False
during training, MODEL.RPN.FPN_POST_NMS_TOP_N_TRAIN
is no need to change then. See #672 for more details.
We use internally torch.distributed.launch
in order to launch multi-gpu training. This utility function from PyTorch spawns as many Python processes as the number of GPUs we want to use, and each Python
process will only use a single GPU.
export NGPUS=8
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/train_net.py --config-file "configs/e2e_panoptic_rcnn_R_50_FPN_1x.yaml"
To finetune the relation aware head, we use the same operations as baseline models except for the changing config file to that with fintune
suffix. Here is an example for Panoptic FPN R_50 relation finetune.
export NGPUS=8
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/train_net.py --config-file "configs/e2e_panoptic_rcnn_R_50_FPN_1x_relation_finetune.yaml"
You can test your model directly on single or multiple gpus.
Note that the batch size should be equal to the GPU number. Because semantic branch result is influenced by the multi-image test.
Here is an example for R-50-FPN with the 1x schedule on 8 GPUS:
export NGPUS=8
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_net.py --config-file "configs/e2e_panoptic_rcnn_R_50_FPN_1x.yaml" TEST.IMS_PER_BATCH 8
Here is an example to use our provided checkpoints (list in the next section) for testing.
-
Download checkpoint (
R_50.pkl
) to/path_to_RAP/output_coco_panoptic_R_50/
. The directory name should be the same asOUTPUT_DIR
inconfigs/e2e_panoptic_rcnn_R_50_FPN_1x.yaml
-
create a file named
last_checkpoint
in/path_to_RAP/output_coco_panoptic_R_50/
with content as belowoutput_coco_panoptic_R_50/R_50.pkl
-
Then follow the evaluation instructions. run
python -m torch.distributed.launch --nproc_per_node=8 tools/test_net.py --config-file "configs/e2e_panoptic_rcnn_R_50_FPN_1x.yaml" TEST.IMS_PER_BATCH 8
- 8 NVIDIA P40 GPUs (24GB)
- PyTorch version: 1.0.0.dev20190328
- CUDA 9.0
- CUDNN 7.4
Here are two mainly difference compared with the original paper training strategy.
- The deepest FPN level in the semantic segmentation branch is at 1/64 scale.
- We train more time (150K) than Mask R-CNN does (90K).
backbone | PQ | SQ | RQ | PQ-Thing | PQ-Stuff | checkpoint |
---|---|---|---|---|---|---|
R-50-FPN | 40.0 | 78.0 | 49.1 | 46.2 | 30.5 | R_50.pth |
R-101-FPN | 41.4 | 79.5 | 50.7 | 47.8 | 31.7 | R_101.pth |
X-101-32x8d-FPN | 43.4 | 79.1 | 53.0 | 50.1 | 33.2 | X_101.pth |
X-152-32x8d-FPN | 44.6 | 79.6 | 54.2 | 51.7 | 33.8 | X_152.pth |
X-152-32x8d-FPN (w/ deformable conv.) | 47.0 | 81.0 | 56.9 | 53.4 | 37.4 | X_152_dcn.pth |
backbone | PQ | SQ | RQ | PQ-Thing | PQ-Stuff | checkpoint |
---|---|---|---|---|---|---|
R-50-FPN | 41.8 | 78.1 | 51.3 | 49.2 | 30.5 | R_50_RAP.pth |
R-101-FPN | 43.3 | 79.6 | 53.0 | 50.9 | 31.8 | R_101_RAP.pth |
X-101-32x8d-FPN | 45.7 | 79.2 | 55.8 | 53.9 | 33.3 | X_101_RAP.pth |
X-152-32x8d-FPN | 46.9 | 79.7 | 57.0 | 55.5 | 33.9 | X_152_RAP.pth |
X-152-32x8d-FPN (w/ deformable conv.) | 49.4 | 81.3 | 59.6 | 57.2 | 37.5 | X_152_dcn_RAP.pth |
backbone | PQ | SQ | RQ | PQ-Thing | PQ-Stuff | checkpoint |
---|---|---|---|---|---|---|
R-50-FPN | 40.7 | 77.9 | 50.2 | 47.5 | 30.5 | R_50_ocfusion.pth |
R-101-FPN | 42.5 | 79.4 | 52.2 | 49.6 | 31.9 | R_101_ocfusion.pth |
RAP is released under the MIT license.