MVANet

This is a fork of the original MVANet, with bug fixes and packaging improvements.

Installation

pip install mvanet

Usage

from PIL import Image
from mvanet.predictor import MVANetPredictor

test_image = Image.open("/path/to/image.png")

predictor = MVANetPredictor()

# Predict the RGBA image
predicted_image = predictor(test_image, output_type="rgba")
predicted_image.save("rgba.png")

# Predict the mask image
predicted_mask = predictor(test_image, output_type="map")
predicted_mask.save("mask.png")

The official repo of the CVPR 2024 paper (Highlight), Multi-view Aggregation Network for Dichotomous Image Segmentation

Introduction

Dichotomous Image Segmentation (DIS) has recently emerged towards high-precision object segmentation from high-resolution natural images. When designing an effective DIS model, the main challenge is how to balance the semantic dispersion of high-resolution targets in the small receptive field and the loss of high-precision details in the large receptive field. Existing methods rely on tedious multiple encoder-decoder streams and stages to gradually complete the global localization and local refinement.

Human visual system captures regions of interest by observing them from multiple views. Inspired by it, we model DIS as a multi-view object perception problem and provide a parsimonious multi-view aggregation network (MVANet), which unifies the feature fusion of the distant view and close-up view into a single stream with one encoder-decoder structure. Specifically, we split the high-resolution input images from the original view into the distant view images with global information and close-up view images with local details. Thus, they can constitute a set of complementary multi-view low-resolution input patches.

Moreover, two efficient transformer-based multi-view complementary localization and refinement modules (MCLM & MCRM) are proposed to jointly capturing the localization and restoring the boundary details of the targets.

We achieves state-of-the-art performance in terms of almost all metrics on the DIS benchmark dataset.

We have optimized the code and achieved an enhanced FPS performance, reaching 15.2.

Here are some of our visual results:

I. Requiremets

python==3.7
torch==1.10.0
torchvision==0.11.0
mmcv-full==1.3.17
mmdet==2.17.0
mmengine==0.8.1
mmsegmentation==0.19.0
numpy
ttach
einops
timm
scipy

II. Training

Download the pretrained model at Google Drive.
Then, you can start training by simply run:

python train.py

III. Testing

Update the data path in config file ./utils/config.py (line 4~8)
Replace the existing path with the path to your saved model in ./predict.py (line 14)

You can also download our trained model at Google Drive.
Start predicting by:

python predict.py

Change the predicted map path in ./test.py (line 17) and start testing:

python test.py

You can get our prediction maps at Google Drive.

To Do List

Release our camere-ready paper on arxiv (done)
Release our training code (done)
Release our model checkpoints (done)
Release our prediction maps (done)

Citations

@article{yu2024multi,
  title={Multi-view Aggregation Network for Dichotomous Image Segmentation},
  author={Yu, Qian and Zhao, Xiaoqi and Pang, Youwei and Zhang, Lihe and Lu, Huchuan},
  journal={arXiv preprint arXiv:2404.07445},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.github		.github
mvanet		mvanet
test_fixtures		test_fixtures
tests		tests
.gitignore		.gitignore
.tagpr		.tagpr
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
poetry.lock		poetry.lock
predict.py		predict.py
pyproject.toml		pyproject.toml
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MVANet

Installation

Usage

Introduction

I. Requiremets

II. Training

III. Testing

To Do List

Citations

About

Releases 1

Packages

Languages

License

creative-graphic-design/MVANet

Folders and files

Latest commit

History

Repository files navigation

MVANet

Installation

Usage

Introduction

I. Requiremets

II. Training

III. Testing

To Do List

Citations

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages