Name		Name	Last commit message	Last commit date
parent directory ..
Combined_KD.py		Combined_KD.py
Combined_KD_bert.py		Combined_KD_bert.py
LICENSE.txt		LICENSE.txt
README.md		README.md
THIRD PARTY OPEN SOURCE SOFTWARE NOTICE.txt		THIRD PARTY OPEN SOURCE SOFTWARE NOTICE.txt
environment.yml		environment.yml
run_combinekd_mnli_bert.sh		run_combinekd_mnli_bert.sh
run_combinekd_mnli_roberta.sh		run_combinekd_mnli_roberta.sh

README.md

Combined-KD

This project is about Combined-KD long paper: How to Select One Among All ? An Extensive Empirical Study Towards the Robustness of Knowledge Distillation in Natural Language Understanding which is accepted by EMNLP findings 2021. In this project, we proposed a Combined-KD (ComKD) by taking advantage of data-augmentation and progressive training. Results show that our proposed ComKD not only achieves a new state-of-the-art (SOTA) on the GLUE benchmark, but also more robust than other KD methods under OOD evaluation and adversarial attacks. Paper link:https://arxiv.org/abs/2109.05696v1

Requirements and Installation

Python version >= 3.6
PyTorch version >= 1.5.0
HuggingFace version == 3.1.0

Set the conda environment
```
conda env create -f environment.yml
```
Restore the pretrained model from huggingface in following dirs (Can be downloaded from huggingface e.g., https://huggingface.co/distilbert-base-uncased): -- ./bert_models/distilbert-base -- ./bert_models/distilroberta-base -- ./bert_models/uncased_L-6_H-768_A-12 -- ./bert_models/uncased_L-4_H-256_A-4
Restore the finetuned teacher model in following dirs: -- ./ckpts/teachers_roberta_large -- ./ckpts/teachers_bert_base
Download glue benchmark in following dir (https://gluebenchmark.com/): -- ./glue_data

Getting Started

Train the models:

bash  run_combinekd_mnli_bert.sh           for BERT model
bash  run_combinekd_mnli_roberta.sh        for RoBERTa model

License

This project's license is under the Apache 2.0 license.

@misc{li2021select,
      title={How to Select One Among All? An Extensive Empirical Study Towards the Robustness of Knowledge Distillation in Natural Language Understanding}, 
      author={Tianda Li and Ahmad Rashid and Aref Jafari and Pranav Sharma and Ali Ghodsi and Mehdi Rezagholizadeh},
      year={2021},
      eprint={2109.05696},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Combined-KD

Combined-KD

README.md

Combined-KD

Requirements and Installation

Getting Started

Train the models:

License

Files

Combined-KD

Directory actions

More options

Directory actions

More options

Latest commit

History

Combined-KD

Folders and files

parent directory

README.md

Combined-KD

Requirements and Installation

Getting Started

Train the models:

License