This repository contains the code for the following paper:
- R. Hu, M. Rohrbach, J. Andreas, T. Darrell, K. Saenko, Modeling Relationships in Referential Expressions with Compositional Modular Networks. in CVPR, 2017. (PDF)
@article{hu2017modeling,
title={Modeling Relationships in Referential Expressions with Compositional Modular Networks},
author={Hu, Ronghang and Rohrbach, Marcus and Andreas, Jacob and Darrell, Trevor and Saenko, Kate},
journal={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
year={2017}
}
Project Page: http://ronghanghu.com/cmn
Note: part of this repository is built upon the Faster RCNN code (https://github.com/rbgirshick/py-faster-rcnn), which is under the MIT License.
- Install Python 3 (Anaconda recommended: https://www.continuum.io/downloads)
- Install TensorFlow (v1.0.0 or higher) following the instructions here. TensorFlow must be installed with GPU support.
- Download this repository or clone with Git, and then enter the root directory of the repository:
git clone https://github.com/ronghanghu/cmn.git && cd cmn
- Depending on your system, you may need to re-build the NMS lib and the ROIPooling operation:
export CMN_ROOT=$(pwd)
cd $CMN_ROOT/util/faster_rcnn_lib/ && make
cd $CMN_ROOT/util/roi_pooling/ && ./compile_roi_pooling.sh
cd $CMN_ROOT
The compile_roi_pooling.sh
uses g++-4.8
and CUDA 8.0 to match the binary installation of TensorFlow 1.0.0 on Linux. If you installed TensorFlow from source and used different compiler or CUDA version, modify compile_roi_pooling.sh
accordingly to match your installation.
- Download the model weights of VGG-16 network (and Faster-RCNN VGG-16 network) converted from Caffe model:
./models/convert_caffemodel/params/download_vgg_params.sh
- Download the GloVe matrix for word embedding:
./word_embedding/download_embed_matrix.sh
- Download the Visual Genome dataset from http://visualgenome.org/ and symbol link it to
exp-visgeno-rel/visgeno-dataset
- Download the image data (imdb) for training and evaluation:
./exp-visgeno-rel/data/download_imdb.sh
Alternatively, you may build the imdb yourself using./exp-visgeno-rel/build_visgeno_imdb.ipynb
- Add the repository root directory to Python's module path:
export PYTHONPATH=.:$PYTHONPATH
. - Train the model:
Strong supervision:python ./exp-visgeno-rel/exp_train_visgeno_attbilstm_strong.py
Weak supervision:python ./exp-visgeno-rel/exp_train_visgeno_attbilstm_weak.py
- Evaluate the model:
Subject region precision:python ./exp-visgeno-rel/exp_test_visgeno_attbilstm.py
Subject-object pair precision:python ./exp-visgeno-rel/exp_test_visgeno_pair_attbilstm.py
(change model path in the above files to the snapshot path in./exp-visgeno-rel/tfmodel
)
- Download the Google-Ref dataset from https://github.com/mjhucla/Google_Refexp_toolbox and symbol link it to
exp-refgoog/refgoog-dataset
- Download the image data (imdb) for training and evaluation:
./exp-refgoog/data/download_imdb.sh
Alternatively, you may build the imdb yourself using./exp-refgoog/build_refgoog_imdb.ipynb
- Add the repository root directory to Python's module path:
export PYTHONPATH=.:$PYTHONPATH
. - Train the model:
python ./exp-refgoog/exp_train_refgoog_attbilstm.py
- Evaluate the model (generate prediction output file):
python ./exp-refgoog/exp_test_refgoog_attbilstm.py
(change model path in the above file to the snapshot path in./exp-refgoog/tfmodel
) - Use the evaluation tool in the Google-Ref dataset for evaluation.
- Download the Visual-7W dataset from http://web.stanford.edu/~yukez/visual7w/index.html and symbol link it to
exp-visual7w/visual7w-dataset
- Download the image data (imdb) for training and evaluation:
./exp-visual7w/data/download_imdb.sh
Alternatively, you may build the imdb yourself with./exp-visual7w/build_visual7w_imdb.ipynb
,exp-visual7w/extract_rpn_proposals.py
andexp-visual7w/build_visual7w_imdb_attention.ipynb
- Add the repository root directory to Python's module path:
export PYTHONPATH=.:$PYTHONPATH
. - Train the model:
python exp-visual7w/exp_train_visual7w_attbilstm.py
- Evaluate the model:
python exp-visual7w/exp_test_visual7w_attbilstm.py
(change model path in the above file to the snapshot path in./exp-visual7w/tfmodel
)
- Add the repository root directory to Python's module path:
export PYTHONPATH=.:$PYTHONPATH
. - Download the synthetic shapes dataset:
./exp-shape/data/download_shape_data.sh
- Train the model:
python ./exp-shape/exp_train_shape_attention.py
- Evaluate the model:
python ./exp-shape/exp_test_shape_attention.py
(change model path in the above file to the snapshot path in./exp-shape/tfmodel
)