Hierarchical Visual Context Fusion Transformer

The source code for Multimodal Relation Extraction via a Mixture of Hierarchical Visual Context Learners.

Data preprocessing

Due to the large size of MNRE dataset, please download the dataset from the original repository.

Unzip the data and rename the directory as mnre, which should be placed in the directory data:

mkdir data logs ckpt

We also use the detected visual objects provided in previous work, which can be downloaded using the commend:

cd data/
wget 120.27.214.45/Data/re/multimodal/data.tar.gz
tar -xzvf data.tar.gz

Install all necessary dependencies:

pip install -r requirements.txt

The best hyperparameters we found have been witten in run_mre.sh file.

You can simply run the bash script for multimodal relation extraction:

bash run_mre.sh