This repository describes our system for the task of Metaphor Detection.
data_preparation.py
is used for constructing datasets in the format of https://github.com/RuiMao1988/Sequential-Metaphor-Identification/tree/master/data which are prepared by https://github.com/gao-g/metaphor-in-context.model.py
contains all the model classes.util.py
contains all the helper functions.main_toefl.py
contains the code for loading and running experiments on the TOEFL dataset.main_vua.py
contains the code for loading and running experiments on the VUA dataset.
- The environment used is python 3.6 with pytorch 1.4 with standard libraries - allennlp, sklearn, numpy, pandas, matplotlib, nltk, tqdm etc.
- Run
python util.py
to make required directories. - Download GloVe embeddings from here, unzip them and place the text file in ./data/ folder.
- Download VUA data from here and prepare the following files -
vuamc_corpus_train.csv
,vuamc_corpus_test.csv
,all_pos_test_tokens.csv
andverb_test_tokens.csv
and place all of these in ./data/vua/ folder. - For downloading TOEFL dataset, you need to fill an agreement here. Next, rename the essays/ folder of training partition as train_essays/ and place it in ./data/toefl/ folder, similarly rename essays/ folder from test partition as test_essays/ and place it in ./data/toefl/ folder. Also, place
all_pos_test_tokens.csv
andverb_test_tokens.csv
in ./data/toefl/ folder.
- Run
python data_preparation.py [option]
, where option vua creates all files (including ELMo vectors) for the VUA dataset and toefl for the TOEFL dataset. This script also splits the training dataset into train and validation sub parts. Note it takes time to compute the ELMo vectors. - Run
python main_xyz.py
to run the experiments on the respective dataset. It will store the produced graphs in ./graphs/xyz/ folder. It also produces the test predictions which are stored asxyz_all_pos_pred.csv
andxyz_verb_pred.csv
in the ./predictions/ folder.
- The outputs here are expected to match the results reported in paper for the single run case.
- For ensembling, the code is not provided, one can run different models by varying hyperparameters of the model (as mentioned in paper) and aggregate by majority voting.
- Structure of files is adapted from https://github.com/gao-g/metaphor-in-context
- Transformer model is adapted from https://github.com/pbloem/former
- If you find this work useful, consider citing it:
@inproceedings{kumar-sharma-2020-character,
title = "Character aware models with similarity learning for metaphor detection",
author = "Kumar, Tarun and
Sharma, Yashvardhan",
booktitle = "Proceedings of the Second Workshop on Figurative Language Processing",
month = jul,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.figlang-1.18",
pages = "116--125",
}