An unofficial pytorch code reproduction of EMNLP-19 paper "Event Detection with Multi-Order Graph Convolution and Aggregated Attention"
-
Prepare ACE 2005 dataset.(You can get ACE2005 dataset here: https://catalog.ldc.upenn.edu/LDC2006T06)
-
Use nlpcl-lab/ace2005-preprocessing to preprocess ACE 2005 dataset in the same format as the data/sample.json.
1、put the processed data into ./data, or you can modify path in constant.py.
2、put word embedding file into ./data, or you can modify path in constant.py. (You can download GloVe embedding here: https://nlp.stanford.edu/projects/glove/)
python train.py
All network and training parameters are in constant.py. You can modify them in your own way.
About the word embedding, we found that wordemb in the way (train the word embedding using Skip-gram algorithm on the NYT corpus) got better performance than the glove.6B.100d. So we choose 100.utf8 (you can get it here https://github.com/yubochen/NBTNGMA4ED) as our word embedding vector.
Method | Trigger Classification (%) | ||
---|---|---|---|
Precision | Recall | F1 | |
MOGANED(original paper) | 79.5 | 72.3 | 75.7 |
MOGANED(this code) | 78.8 | 72.3 | 75.4 |
In many cases, the trigger is a phrase. Therefore, we treat consecutive tokens which share the same predicted label as a whole trigger. So we don't use BIO schema for trigger word. This strategy comes from "Exploring Pre-trained Language Models for Event Extraction and Generation" (ACL 2019), Yang et al. [paper]