Skip to content

This is the official implementation (code, data) of the paper "MOSSBench: Is Your Multimodal Language Model Oversensitive to Safe Queries?""

Notifications You must be signed in to change notification settings

turningpoint-ai/MOSSBench

ย 
ย 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

74 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

MOSSBench: Is Your Multimodal Language Model Oversensitive to Safe Queries?

Oversensitive Safety Alignment Multi-Modal MOSSBench
GPT-4 Gemini-Pro Claude-3

Code for the Paper "MOSSBench: Is Your Multimodal Language Model Oversensitive to Safe Queries?".

For more details, please refer to the project page with dataset exploration and visualization tools: https://turningpoint-ai.github.io/MOSSBench/.

๐Ÿ”” If you have any questions or suggestions, please don't hesitate to let us know. You can comment on the Twitter, or post an issue on this repository.

[Webpage] [Paper] [Huggingface Dataset] [Visualization] [Result Explorer] [Twitter]


Logo for MOSSBench generated by DALLยทE 3.

Outlines

๐Ÿ’ฅ News ๐Ÿ’ฅ

  • [2024.06.22] Our paper is now accessible at ArXiv.

๐Ÿ‘€ About MOSSBench

Humans are prone to cognitive distortions โ€” biased thinking patterns that lead to exaggerated responses to specific stimuli, albeit in very different contexts. This paper demonstrates that advanced MLLMs exhibit similar tendencies. While these models are designed to respond queries under safety mechanism, they sometimes reject harmless queries in the presence of certain visual stimuli, disregarding the benign nature of their contexts.


Overview of MOSSBench. MLLMs exhibit behaviors similar to human cognitive distortions, leading to oversensitive responses where benign queries are perceived as harmful. We discover that oversensitivity prevails among existing MLLMs.

As the initial step in investigating this behavior, we identify three types of stimulus that trigger the oversensitivity of existing MLLMs: Exaggerated Risk, Negated Harm, and Counterintuitive Interpretation. To systematically evaluate MLLMs' oversensitivity to these stimuli, we propose the Multimodal OverSenSitivity Benchmark Logo (MOSSBench). This toolkit consists of 300 manually collected benign multimodal queries, cross-verified by third-party reviewers (AMT).


Three types of stimuli in MOSSBench.

Empirical studies using Logo MOSSBench on 20 MLLMs reveal several insights: (1). Oversensitivity is prevalent among SOTA MLLMs, with refusal rates reaching up to 76% for harmless queries. (2). Safer models are more oversensitive: increasing safety may inadvertently raise caution and conservatism in the model's responses. (3). Different types of stimuli tend to cause errors at specific stages โ€” perception, intent reasoning, and safety decision-making โ€” in the response process of MLLMs. These findings highlight the need for refined safety mechanisms that balance caution with contextually appropriate responses, improving the reliability of MLLMs in real-world applications.

For more details, you can find our project page here and our paper here.

๐Ÿ† Leaderboard ๐Ÿ†

Contributing the Leaderboard

๐Ÿšจ๐Ÿšจ The leaderboard is continuously being updated.

The evaluation instructions are available at ๐Ÿ”ฎ Evaluations on MOSSBench and ๐Ÿ“ Evaluation Scripts of Our Models.

To submit your results to the leaderboard, please send to this email with your result file (we will generate the score file for you), referring to the template file below:

Oversensitivity on MOSSBench

Refusal Rate of mllms:

# Model Availability Date ALL Exaggerated Risk Negated Harm Counterintuitive Interpretation
1 Claude 3 Opus (web) Proprietary MLLMs - Web version 2024-06-22 70.67 41 93 78
2 Gemini Advanced Proprietary MLLMs - Web version 2024-06-22 61 41 67 75
3 Claude 3 Sonnet Proprietary MLLMs 2024-06-22 55 39 65 61
4 Claude 3 Haiku Proprietary MLLMs 2024-06-22 49.33 27 58 63
5 Claude 3 Opus Proprietary MLLMs 2024-06-22 34.67 11 43 55
6 Gemini Pro 1.5 Proprietary MLLMs 2024-06-22 29.33 25 28 35
7 Qwen-VL-Chat Open-source MLLMs 2024-06-22 21.67 16 13 36
8 InternLM-Xcomposer2-7b Open-source MLLMs 2024-06-22 17.67 14 11 28
9 Gemini Pro Vision Proprietary MLLMs 2024-06-22 17 20 9 22
10 Reka Proprietary MLLMs 2024-06-22 16.67 11 21 18
11 InstructBLIP-Vicuna-7b Open-source MLLMs 2024-06-22 15.67 21 23 3
12 IDEFICS-9b-Instruct Open-source MLLMs 2024-06-22 13.67 17 9 15
13 MiniCPM-V 2.0 Open-source MLLMs 2024-06-22 12.33 16 11 10
14 LlaVA-1.5-7b Open-source MLLMs 2024-06-22 12.33 18 10 9
15 mPLUG-Owl2 Open-source MLLMs 2024-06-22 10 11 7 12
16 LlaVA-1.5-13b Open-source MLLMs 2024-06-22 9.67 9 9 11
17 GPT-4o Proprietary MLLMs 2024-06-22 6.33 6 8 5
18 MiniCPM-Llama3-V 2.5 Open-source MLLMs 2024-06-22 6 8 5 5
19 GPT-4o Proprietary MLLMs - Web version 2024-06-22 4 6 2 4

๐Ÿ“Š Dataset Examples

Examples of 3 types of oversensitivity stimuli:

  1. Exaggerated Risk

  1. Negated Harm

  1. Counterintuitive Interpretation

๐Ÿ“– Dataset Usage

Data Downloading

You can download this dataset by the following command (make sure that you have installed Huggingface Datasets):

from datasets import load_dataset

dataset = load_dataset("AIcell/MOSSBench", "oversensitivity")

Here are some examples of how to access the downloaded dataset:

# print the first example on the testmini set
print(dataset["train"][0])
print(dataset["train"][0]['pid']) # print the problem id 
print(dataset["train"][0]['question']) # print the question text 
print(dataset["train"][0]['image']) # print the image path
dataset["train"][0]['decoded_image'] # display the image

Data Format

The dataset is provided in json format and contains the following attributes:

{
    "image": [string] A file path pointing to the associated image,
    "short description": [string] An oracle short description of the associated image,
    "question": [string] A query regarding to the image, 
    "pid": [string] Problem ID, e.g., "1",
    "metadata": {
        "over": [string] Oversensitivity type,
        "human": [integer] Whether image contains human, e.g. 0 or 1,
        "child": [integer] Whether image contains child, e.g. 0 or 1,
        "syn": [integer] Whether image is synthesized, e.g. 0 or 1,
        "ocr": [integer] Whether image contains ocr, e.g. 0 or 1,
        "harm": [integer] Which harm type the query belongs to, 0-7,
    }
}

Data Visualization

๐ŸŽฐ You can explore the dataset in an interactive way here.

๐Ÿ”ฎ Evaluations on MOSSBench

Requirements

Install the Python dependencies if you would like to reproduce our results for ChatGPT, GPT-4, Claude-2, and Bard:

pip install -r requirements.txt

Evaluation Pipelines

Step 1. Prepare your MLLM

For proprietary MLLMs

Get your models API ready in following links

and store them under foler path_to_your_code/api_keys/[model].text. Please replace the [model] by anthropic_keys, google_keys and openai_keys.

For open-source MLLMs

Download your model or get their names for Huggingface. And replace the following path by where you locate your models or your models name.

# Initialize variables
MODEL_NAME="your_path_to/idefics-9b-instruct" # you can replace it by direct naming
DATA_DIR=""

Step 2. Run evaluation (main.py) Next, run experiments/main.py file in folder or excute the .sh files we provide for evaluation by

cd experiments/scripts

bash run_instructblip.sh

๐Ÿ“œ License

The new contributions to our dataset are distributed under the CC BY-SA 4.0 license, including

  • The creation of contrasting and oversensitivity dataset: IQTest, FunctionQA, and Paper;

  • The filtering and cleaning of source datasets;

  • The standard formalization of instances for evaluation purposes;

  • The annotations of metadata.

  • Purpose: The dataset was primarily designed for use as a test set.

  • Commercial Use: The dataset can be used commercially as a test set, but using it as a training set is prohibited. By accessing or using this dataset, you acknowledge and agree to abide by these terms in conjunction with the CC BY-SA 4.0 license.

โ˜• Stay Connected!

We are always open to engaging discussions, collaborations, or even just sharing a virtual coffee. To get in touch or join our team, visit TurningPoint AI's homepage for contact information.

โœ… Cite

If you find MOSSBench useful for your your research and applications, please kindly cite using this BibTeX:

@misc{li2024mossbenchmultimodallanguagemodel,
      title={MOSSBench: Is Your Multimodal Language Model Oversensitive to Safe Queries?}, 
      author={Xirui Li and Hengguang Zhou and Ruochen Wang and Tianyi Zhou and Minhao Cheng and Cho-Jui Hsieh},
      year={2024},
      eprint={2406.17806},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2406.17806}, 
}

MOSSBench Website

MOSSBench website is adapted from Nerfies website and MathVista website.

Website License

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

About

This is the official implementation (code, data) of the paper "MOSSBench: Is Your Multimodal Language Model Oversensitive to Safe Queries?""

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • JavaScript 73.4%
  • HTML 11.5%
  • Python 10.3%
  • Shell 2.1%
  • CSS 2.1%
  • Jupyter Notebook 0.6%