BToP/AToP

Implementation of our paper "Exploring the Universal Vulnerability of Prompt-based Learning Paradigm" on Findings of NAACL 2022.

Overview

Prompt-based learning is a new trend in text classification. However, this new learning paradigm has universal vulnerability, meaning that phrases that mislead a pre-trained language model can universally interfere downstream prompt-based models. In this repo, we implement two methods to inject or find these phrases.

Backdoor Triggers on Prompt-based Learning (BToP) assumes that the attacker can access the training phrase of the language model. These triggers are injected to the language model by fine-tuning.
Adversarial Triggers on Prompt-based Learning (AToP) assumes no access to language model training. These triggers are discovered using a beam search algorithm on off-the-shelf language models.

Installation

Please install pytorch>=1.8.0 and correctly configure the GPU accelerator. (GPU is required.)

Install all requirements by

pip install -r requirements.txt

Usage

BToP: Train a backdoored language model

src/insert_btop.py implements the backdoor attack on PLMs during the pre-training stage.

command:

python3 -m src.insert_btop --subsample_size 30000 --bert_type roberta-large \
	--batch_size 16 --num_epochs 1 --save_path poisoned_lm

The arguments:

sample_nums: The number of samples we sample from the general corpus.
bert_type: The type of PLMs.
save_path: Path to save the backdoor injected models.

output: The backdoor injected model will be saved at: poisoned_lm.

AToP: Search for triggers from existing language models

src/search_atop.py implements the trigger search on RoBERTa-large model.

command:

python3 -m src.search_atop.py --trigger_len 3 --trigger_pos all

To search for position-sensitive triggers, you can change --trigger_pos to

prefix: the trigger is supposed to be placed before the text.
suffix: the trigger is supposed to be placed after the text.

For more arguments, see python3 atop/search_atop.py --help.

output: Results will be stored in the triggers/ folder as a JSON file.

Evaluation

src/eval.py can evaluate both AToP and BToP.

Evaluate BToP

python3 -m src.eval --shots 16 --dataset ag_news --model_path poisoned_lm --target_label 0 \
	--repeat 5 --bert_type roberta-large --template_id 0

Evaluate AToP

python3 -m src.eval --shots 16 --dataset ag_news --target_label -1 \
	--repeat 5 --bert_type roberta-large --template_id 0 --load_trigger trigger/<trigger_json>

The arguments:

dataset: The evaluation datasets.
shots: The number of samples per label.
model_path:
- In case of BToP, set the filename of backdoor injected model.
- In case of AToP, do not set this argument.
target_label:
- In case of targeted attack, set the target label id. (Used in BToP)
- In case of untargeted attack, set -1. (Used in AToP)
bert_type: The type of PLMs.
template_id: The chosen template. Check prompt/ folder for all prompt templates.

Important note for AToP:

for all-purpose triggers, you can choose template_id from {0, 1, 2, 3}.
for prefix triggers, you can choose template_id from {0, 2}.
for suffix triggers, you can choose template_id from {1, 3}.

Use python3 -m src.eval --help for details.

Datasets and Prompts

The datasets and prompts used in experiments are in data/ and prompt/ folders.

Citing BToP/AToP

If you use AToP and/or BToP, please cite the following work:

Lei Xu, Yangyi Chen, Ganqu Cui, Hongcheng Gao, Zhiyuan Liu. Exploring the Universal Vulnerability of Prompt-based Learning Paradigm. Findings of NAACL, 2022.

@inproceedings{xu-etal-2022-exploring,
    title = "Exploring the Universal Vulnerability of Prompt-based Learning Paradigm",
    author = "Xu, Lei  and
      Chen, Yangyi  and
      Cui, Ganqu  and
      Gao, Hongcheng  and
      Liu, Zhiyuan",
    booktitle = "Findings of the Association for Computational Linguistics: NAACL 2022",
    year = "2022",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.findings-naacl.137",
    pages = "1799--1810"
}

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
data		data
misc		misc
prompt		prompt
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BToP/AToP

Overview

Installation

Usage

BToP: Train a backdoored language model

AToP: Search for triggers from existing language models

Evaluation

Datasets and Prompts

Citing BToP/AToP

About

Releases

Packages

Contributors 2

Languages

License

leix28/prompt-universal-vulnerability

Folders and files

Latest commit

History

Repository files navigation

BToP/AToP

Overview

Installation

Usage

BToP: Train a backdoored language model

AToP: Search for triggers from existing language models

Evaluation

Datasets and Prompts

Citing BToP/AToP

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages