Skip to content
/ jwae Public

Joint Wasserstein Autoencoders for Aligning Multimodal Embeddings (ICCV 2019 Workshop on Cross-Modal Learning in Real World)

License

Notifications You must be signed in to change notification settings

visinf/jwae

Repository files navigation

Joint Wasserstein Autoencoders for Aligning Multimodal Embeddings

This repository is the PyTorch implementation of the paper:

Joint Wasserstein Autoencoders for Aligning Multimodal Embeddings (ICCV Worshops 2019)

Shweta Mahajan, Teresa Botschen, Iryna Gurevych and Stefan Roth

This repository is built on top of SCAN and VSE++ in PyTorch.

Requirements

The following code is written in Python 2.7.0 and CUDA 9.0.

Requirements:

  • torch 0.3
  • torchvision 0.3.0
  • nltk 3.5
  • gensim
  • Punkt Sentence Tokenizer:
import nltk
nltk.download()
> d punkt

To install requirements:

conda config --add channels pytorch
conda config --add channels anaconda
conda config --add channels conda-forge
conda config --add channels conda-forge/label/cf202003
conda create -n <environment_name> --file requirements.txt
conda activate <environment_name>

Preprocessed data

  1. The preprocessed COCO and Flickr30K dataset used in the experiments are based on the SCAN and can be downloaded at COCO_Precomp and F30k_Precomp. The downloaded dataset should be placed in the data folder.

  2. Run vocab.py to generate the vocabulary for the datasets as:

python vocab.py --data_path data --data_name f30k_precomp
python vocab.py --data_path data --data_name coco_precomp

Training

A new JWAE model can be trained using the following:

   	python train.py --data_path "$DATA_PATH" --data_name coco_precomp --vocab_path "$VOCAB_PATH"

Evaluation

The trained model can then be evaluated using the following python script:

from vocab import Vocabulary
import evaluation
evaluation.evalrank("$CHECKPOINT_PATH", data_path="$DATA_PATH", split="test")

Bibtex

@inproceedings{Mahajan:2019:JWA,
  author = {Shweta Mahajan and Teresa Botschen and Iryna Gurevych and Stefan Roth},
  booktitle = {ICCV Workshop on Cross-Modal Learning in Real World},
  title = {Joint {W}asserstein Autoencoders for Aligning Multi-modal Embeddings},
  year = {2019}
}

About

Joint Wasserstein Autoencoders for Aligning Multimodal Embeddings (ICCV 2019 Workshop on Cross-Modal Learning in Real World)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages