- 大型语言模型能深入检测复杂的恶意查询吗?一个通过混淆意图实现越狱的框架
- 📅 日期:2024-05-06
- 📑 文件:Can LLMs Deeply Detect Complex Malicious Queries? A Framework for Jailbreaking via Obfuscating Intent
- 🔗 链接:https://arxiv.org/abs/2405.03654
- 推动越狱攻击势头
- 📅 日期:2024-05-02
- 📑 文件:Boosting Jailbreak Attack with Momentum
- 🔗 链接:https://arxiv.org/abs/2405.01229
- 普遍的对抗触发因素并不具有普遍性
- 📅 日期:2024-04-24
- 📑 文件:Universal Adversarial Triggers Are Not Universal
- 🔗 链接:https://arxiv.org/abs/2404.16020
- 迭代提示多模态LLM以复制自然和AI生成的图像
- 📅 日期:2024-04-21
- 📑 文件:Iteratively Prompting Multimodal LLMs to Reproduce Natural and AI-Generated Images
- 🔗 链接:https://arxiv.org/abs/2404.13784
- 大语言模型中的错误标记:分类法与有效检测方法
- 📅 日期:2024-04-19
- 📑 文件:Glitch Tokens in Large Language Models: Categorization Taxonomy and Effective Detection
- 🔗 链接:https://arxiv.org/abs/2404.09894
- 指令层次结构:训练大语言模型优先处理特权指令
- 📅 日期:2024-04-19
- 📑 文件:The Instruction Hierarchy:Training LLMs to Prioritize Privileged Instructions
- 🔗 链接:https://arxiv.org/abs/2404.13208
- 介绍来自MLCommons的人工智能安全基准v0.5
- 📅 日期:2024-04-18
- 📑 文件:Introducing v0.5 of the AI Safety Benchmark from MLCommons
- 🔗 链接:https://arxiv.org/abs/2404.12241
- JailbreakLens:针对大型语言模型的越狱攻击可视化分析
- 📅 日期:2024-04-12
- 📑 文件:JailbreakLens: Visual Analysis of Jailbreak Attacks Against Large Language Models
- 🔗 链接:https://arxiv.org/abs/2404.08793
- 次毒性问题:深入探讨大型语言模型在越狱尝试中响应态度的变化
- 📅 日期:2024-04-12
- 📑 文件:Subtoxic Questions: Dive Into Attitude Change of LLM's Response in Jailbreak Attempts
- 🔗 链接:https://arxiv.org/abs/2404.08309
- AmpleGCG:学习通用且可转移的对抗后缀生成模型,用于破解开放和封闭的 LLM
- 📅 日期:2024-04-11
- 📑 文件:AmpleGCG: Learning a Universal and Transferable Generative Model of Adversarial Suffixes for Jailbreaking Both Open and Closed LLMs
- 🔗 链接:https://arxiv.org/abs/2404.07921
- AEGIS:使用大型语言模型专家团队进行在线自适应人工智能内容安全审核
- 📅 日期:2024-04-09
- 📑 文件:AEGIS- Online Adaptive AI Content Safety Moderation with Ensemble of LLM Experts
- 🔗 链接:https://arxiv.org/abs/2404.05993
- 目标引导的生成式提示注入攻击在大型语言模型上的应用
- 📅 日期:2024-04-06
- 📑 文件:Goal-guided Generative Prompt Injection Attack on Large Language Models
- 🔗 链接:https://arxiv.org/abs/2404.07234
- 微调和量化增加了大型语言模型的漏洞
- 📅 日期:2024-04-05
- 📑 文件:Increased LLM Vulnerabilities from Fine-tuning and Quantization
- 🔗 链接:https://arxiv.org/abs/2404.04392
- 越狱提示攻击:一种可控的对扩散模型的对抗性攻击
- 📅 日期:2024-04-02
- 📑 文件:Jailbreaking Prompt Attack: A Controllable Adversarial Attack against Diffusion Models
- 🔗 链接:https://arxiv.org/abs/2404.02928
- OWASP 大语言模型人工智能应用Top 10 安全威胁
- 📅 日期:2024-01-09
- 📑 文件:OWASP 大语言模型人工智能应用Top 10 安全威胁
- 🔗 链接:https://owasp.org/www-project-top-10-for-large-language-model-applications/
- 控制大型语言模型输出:入门
- 📅 日期:2023-12
- 📑 文件:控制大型语言模型输出:入门
- 🔗 链接:https://cset.georgetown.edu/publication/controlling-large-language-models-a-primer/
- 召唤恶魔并将其束缚:野外LLM红队攻击的实地理论
- 📅 日期:2023-11-10(最后修改日期:2023-11-13)
- 📑 文件:Summon a Demon and Bind it: A Grounded Theory of LLM Red Teaming in the Wild
- 🔗 链接:https://arxiv.org/abs/2311.06237
- 忽略之前的提示:语言模型的攻击技术
- 📅 日期:2022-11-17
- 📑 文件:Ignore Previous Prompt: Attack Techniques For Language Models
- 🔗 链接:https://arxiv.org/abs/2211.09527