Vision-Language Transformer and Query Generation for Referring Segmentation

Please consider citing our paper in your publications if the project helps your research.

@inproceedings{vision-language-transformer,
  title={Vision-Language Transformer and Query Generation for Referring Segmentation},
  author={Ding, Henghui and Liu, Chang and Wang, Suchen and Jiang, Xudong},
  booktitle={Proceedings of the IEEE International Conference on Computer Vision},
  year={2021}
}

Introduction

Vision-Language Transformer (VLT) is a framework for referring segmentation task. Our method produces multiple query vector for one input language expression, and use each of them to “query” the input image, generating a set of responses. Then the network selectively aggregates these responses, in which queries that provide better comprehensions are spotlighted.

Installation

Environment:
- Python 3.6
- tensorflow 1.15
- Other dependencies in requirements.txt
- SpaCy model for embedding:
  
  python -m spacy download en_vectors_web_lg
Dataset preparation
- Put the folder of COCO training set ("train2014") under data/images/.
- Download the RefCOCO dataset from here and extract them to data/. Then run the script for data preparation under data/:
```
cd data
python data_process_v2.py --data_root . --output_dir data_v2 --dataset [refcoco/refcoco+/refcocog] --split [unc/umd/google] --generate_mask
```

Evaluating

Download pretrained models & config files from here.
In the config file, set:
- evaluate_model: path to the pretrained weights
- evaluate_set: path to the dataset for evaluation.

Run

python vlt.py test [PATH_TO_CONFIG_FILE]

Training

Pretrained Backbones: We use the backbone weights proviede by MCN.

Note: we use the backbone that excludes all images that appears in the val/test splits of RefCOCO, RefCOCO+ and RefCOCOg.
Specify hyperparameters, dataset path and pretrained weight path in the configuration file. Please refer to the examples under /config, or config file of our pretrained models.

Run

python vlt.py train [PATH_TO_CONFIG_FILE]

Acknowledgement

We borrowed a lot of codes from MCN, keras-transformer, RefCOCO API and keras-yolo3. Thanks for their excellent works!

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
callbacks		callbacks
config		config
data		data
loader		loader
model		model
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
executor.py		executor.py
fig0.png		fig0.png
requirement.txt		requirement.txt
vlt.py		vlt.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vision-Language Transformer and Query Generation for Referring Segmentation

Introduction

Installation

Evaluating

Training

Acknowledgement

About

Releases

Packages

Languages

License

freshpearYoon/Vision-Language-Transformer

Folders and files

Latest commit

History

Repository files navigation

Vision-Language Transformer and Query Generation for Referring Segmentation

Introduction

Installation

Evaluating

Training

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages