This project is about Combined-KD long paper: How to Select One Among All ? An Extensive Empirical Study Towards the Robustness of Knowledge Distillation in Natural Language Understanding which is accepted by EMNLP findings 2021. In this project, we proposed a Combined-KD (ComKD) by taking advantage of data-augmentation and progressive training. Results show that our proposed ComKD not only achieves a new state-of-the-art (SOTA) on the GLUE benchmark, but also more robust than other KD methods under OOD evaluation and adversarial attacks. Paper link:https://arxiv.org/abs/2109.05696v1
- Python version >= 3.6
- PyTorch version >= 1.5.0
- HuggingFace version == 3.1.0
-
Set the conda environment
conda env create -f environment.yml
-
Restore the pretrained model from huggingface in following dirs (Can be downloaded from huggingface e.g., https://huggingface.co/distilbert-base-uncased): -- ./bert_models/distilbert-base -- ./bert_models/distilroberta-base -- ./bert_models/uncased_L-6_H-768_A-12 -- ./bert_models/uncased_L-4_H-256_A-4
-
Restore the finetuned teacher model in following dirs: -- ./ckpts/teachers_roberta_large -- ./ckpts/teachers_bert_base
-
Download glue benchmark in following dir (https://gluebenchmark.com/): -- ./glue_data
bash run_combinekd_mnli_bert.sh for BERT model
bash run_combinekd_mnli_roberta.sh for RoBERTa model
This project's license is under the Apache 2.0 license.
@misc{li2021select,
title={How to Select One Among All? An Extensive Empirical Study Towards the Robustness of Knowledge Distillation in Natural Language Understanding},
author={Tianda Li and Ahmad Rashid and Aref Jafari and Pranav Sharma and Ali Ghodsi and Mehdi Rezagholizadeh},
year={2021},
eprint={2109.05696},
archivePrefix={arXiv},
primaryClass={cs.CL}
}