MoEBERT

This PyTorch package implements MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation (NAACL 2022).

Installation

Create and activate conda environment.

conda env create -f environment.yml

Install Transformers locally.

pip install -e .

Note: The code is adapted from this codebase. Arguments regarding LoRA and adapter can be safely ignored.

Instructions

MoEBERT targets task-specific distillation. Before running any distillation code, a pre-trained BERT model should be fine-tuned on the target task. Path to the fine-tuned model should be passed to --model_name_or_path.

Importance Score Computation

Use bert_base_mnli_example.sh to compute the importance scores, add a --preprocess_importance argument, remove the --do_train argument.
If multiple GPUs are used to compute the importance scores, a importance_[rank].pkl file will be saved for each GPU. Use merge_importance.py to merge these files.
To use the pre-computed importance scores, pass the file name to --moebert_load_importance.

Knowledge Distillation

For GLUE tasks, see examples/text-classification/run_glue.py.
For question answering tasks, see examples/question-answering/run_qa.py.
Run bash bert_base_mnli_example.sh as an example.
The codebase supports different routing strategies: gate-token, gate-sentence, hash-random and hash-balance. Choices should be passed to --moebert_route_method.
- To use hash-balance, a balanced hash list needs to be pre-computed using hash_balance.py. Path to the saved hash list should be passed to --moebert_route_hash_list.
- Add a load balancing loss by setting --moebert_load_balance when using trainable gating mechanisms.
- The sentence-based gating mechanism (gate-sentence) is advantageous for inference because it induces significantly less communication overhead compared with token-level routing methods.

Name		Name	Last commit message	Last commit date
Latest commit History 188 Commits
docker		docker
docs		docs
examples		examples
importance_files		importance_files
model_cards		model_cards
notebooks		notebooks
scripts		scripts
sh_scripts		sh_scripts
src/transformers		src/transformers
templates		templates
tests		tests
utils		utils
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
ISSUES.md		ISSUES.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
bert_base_mnli_example.sh		bert_base_mnli_example.sh
environment.yml		environment.yml
hash_balance.py		hash_balance.py
hubconf.py		hubconf.py
load_datasets.py		load_datasets.py
load_model.py		load_model.py
merge_importance.py		merge_importance.py
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
setup.py		setup.py
valohai.yaml		valohai.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MoEBERT

Installation

Instructions

Importance Score Computation

Knowledge Distillation

About

Releases

Packages

Languages

License

paultheron-X/MoEBERT-fork

Folders and files

Latest commit

History

Repository files navigation

MoEBERT

Installation

Instructions

Importance Score Computation

Knowledge Distillation

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages