Skip to content

qiuhuachuan/latent-jailbreak

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Latent Jailbreak

🎉 Paper

This repository contains the code and data for the paper Latent Jailbreak: A Benchmark for Evaluating Text Safety and Output Robustness of Large Language Models. The paper explores the topic of latent jailbreak and presents a novel approach to evaluate the text safety and output robustness for large language models.

Data

The data used in this paper is included in the data directory.

Templates

templates Templates for latent jailbreak prompts.

Generate Model Responses

cd src
python BELLE_7B_2M.py
python ChatGLM2-6B.py
python ChatGPT.py --api_key 'your key'

Fine-Tune Model to Perform Automatic Labeling

python finetune.py

Citation

If you use the code or data in this repository, please cite the following paper.

@misc{qiu2023latent,
      title={Latent Jailbreak: A Benchmark for Evaluating Text Safety and Output Robustness of Large Language Models},
      author={Huachuan Qiu and Shuai Zhang and Anqi Li and Hongliang He and Zhenzhong Lan},
      year={2023},
      eprint={2307.08487},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published