Implementation of our paper "Exploring the Universal Vulnerability of Prompt-based Learning Paradigm" on Findings of NAACL 2022.
Prompt-based learning is a new trend in text classification. However, this new learning paradigm has universal vulnerability, meaning that phrases that mislead a pre-trained language model can universally interfere downstream prompt-based models. In this repo, we implement two methods to inject or find these phrases.
- Backdoor Triggers on Prompt-based Learning (BToP) assumes that the attacker can access the training phrase of the language model. These triggers are injected to the language model by fine-tuning.
- Adversarial Triggers on Prompt-based Learning (AToP) assumes no access to language model training. These triggers are discovered using a beam search algorithm on off-the-shelf language models.
Please install pytorch>=1.8.0
and correctly configure the GPU accelerator. (GPU is required.)
Install all requirements by
pip install -r requirements.txt
src/insert_btop.py
implements the backdoor attack on PLMs during the pre-training stage.
command:
python3 -m src.insert_btop --subsample_size 30000 --bert_type roberta-large \
--batch_size 16 --num_epochs 1 --save_path poisoned_lm
The arguments:
sample_nums
: The number of samples we sample from the general corpus.bert_type
: The type of PLMs.save_path
: Path to save the backdoor injected models.
output:
The backdoor injected model will be saved at: poisoned_lm
.
src/search_atop.py
implements the trigger search on RoBERTa-large model.
command:
python3 -m src.search_atop.py --trigger_len 3 --trigger_pos all
To search for position-sensitive triggers, you can change --trigger_pos
to
prefix
: the trigger is supposed to be placed before the text.suffix
: the trigger is supposed to be placed after the text.
For more arguments, see python3 atop/search_atop.py --help
.
output:
Results will be stored in the triggers/
folder as a JSON file.
src/eval.py
can evaluate both AToP and BToP.
Evaluate BToP
python3 -m src.eval --shots 16 --dataset ag_news --model_path poisoned_lm --target_label 0 \
--repeat 5 --bert_type roberta-large --template_id 0
Evaluate AToP
python3 -m src.eval --shots 16 --dataset ag_news --target_label -1 \
--repeat 5 --bert_type roberta-large --template_id 0 --load_trigger trigger/<trigger_json>
The arguments:
dataset
: The evaluation datasets.shots
: The number of samples per label.model_path
:- In case of BToP, set the filename of backdoor injected model.
- In case of AToP, do not set this argument.
target_label
:- In case of targeted attack, set the target label id. (Used in BToP)
- In case of untargeted attack, set -1. (Used in AToP)
bert_type
: The type of PLMs.template_id
: The chosen template. Checkprompt/
folder for all prompt templates.
Important note for AToP:
- for
all
-purpose triggers, you can choose template_id from {0, 1, 2, 3}. - for
prefix
triggers, you can choose template_id from {0, 2}. - for
suffix
triggers, you can choose template_id from {1, 3}.
Use python3 -m src.eval --help
for details.
The datasets and prompts used in experiments are in data/
and prompt/
folders.
If you use AToP and/or BToP, please cite the following work:
- Lei Xu, Yangyi Chen, Ganqu Cui, Hongcheng Gao, Zhiyuan Liu. Exploring the Universal Vulnerability of Prompt-based Learning Paradigm. Findings of NAACL, 2022.
@inproceedings{xu-etal-2022-exploring,
title = "Exploring the Universal Vulnerability of Prompt-based Learning Paradigm",
author = "Xu, Lei and
Chen, Yangyi and
Cui, Ganqu and
Gao, Hongcheng and
Liu, Zhiyuan",
booktitle = "Findings of the Association for Computational Linguistics: NAACL 2022",
year = "2022",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.findings-naacl.137",
pages = "1799--1810"
}