- This is the official resources of our EMNLP 2022 paper "Retrieval Augmentation for Commonsense Reasoning: A Unified Approach" [arXiv].
-
Corpus (20M): Google drive [link]
-
Code: Official DPR code [link]
- first run
python merge-corpus.py
to construct corpus - modify the retrieval corpus path in above the DPR code
- first run
-
Training Data: Google drive [link]
-
Code: Official DPR code, same as above.
- modify the training data path as
raco_train: _target_: dpr.data.biencoder_data.JsonQADataset file: {your folder path}/train.json raco_dev: _target_: dpr.data.biencoder_data.JsonQADataset file: {your folder path}/dev.json
-
Inference Data: Google drive [link]
-
Code: Official DPR code, same as above.
- modify the inference data path as
{dataset}_train: _target_: dpr.data.retriever_data.CsvQASrc file: {your folder path}/{dataset}/train.tsv {dataset}_dev: _target_: dpr.data.retriever_data.CsvQASrc file: {your folder path}/{dataset}/dev.tsv {dataset}_test: _target_: dpr.data.retriever_data.CsvQASrc file: {your folder path}/{dataset}/test.tsv
-
Training Data: obtained from the last step
-
Code: Official FiD code [link]
-
Accuracy is the same as exact match in FiD code.
-
BLUE, ROUGE is from the CommonGen GitHub repo.
- Some commonly seen issues when installing the lib [link]
@inproceedings{yu2022retrieval,
title={Retrieval Augmentation for Commonsense Reasoning: A Unified Approach},
author={Yu, Wenhao and Zhu, Chenguang and Zhang, Zhihan and Wang, Shuohang and Zhang, Zhuosheng and Fang, Yuwei and Jiang, Meng},
booktitle={Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
year={2022}
}
Please kindly cite our paper if you find this paper and the codes helpful.