GitHub - S-Abdelnabi/OoC-multi-modal-fc: Code for our CVPR'22 paper: Open-Domain, Content-based, Multi-modal Fact-checking of Out-of-Context Images via Online Resources

Open-Domain, Content-based, Multi-modal Fact-checking of Out-of-Context Images via Online Resources

Authors: Sahar Abdelnabi, Rakibul Hasan, Mario Fritz
CVPR'22
This repository contains code to reproduce our dataset collection and training for our paper. Detailed instructions can be found in each subdirectory.

Abstract

Misinformation is now a major problem due to its potential high risks to our core democratic and societal values and orders. Out-of-context misinformation is one of the easiest and effective ways used by adversaries to spread viral false stories. In this threat, a real image is re-purposed to support other narratives by misrepresenting its context and/or elements. The internet is being used as the go-to way to verify information using different sources and modalities. Our goal is an inspectable method that automates this time-consuming and reasoning-intensive process by fact-checking the image-caption pairing using Web evidence. To integrate evidence and cues from both modalities, we introduce the concept of 'multi-modal cycle-consistency check'; starting from the image/caption, we gather textual/visual evidence, which will be compared against the other paired caption/image, respectively. Moreover, we propose a novel architecture, Consistency-Checking Network (CCN), that mimics the layered human reasoning across the same and different modalities: the caption vs. textual evidence, the image vs. visual evidence, and the image vs. caption. Our work offers the first step and benchmark for open-domain, content-based, multi-modal fact-checking, and significantly outperforms previous baselines that did not leverage external evidence.

Dataset collection

We here share our dataset collection pipeline that you can use to download the dataset from scratch or to download other subsets from the NewsCLIPpings dataset.

Evidence links

We share the links that resulted from the Google search we performed using query images and captions. You can find here more details about how to get them and their format. You can adapt the crawler pipeline to extract and download the evidence from these links.

Dataset access and description

If you would like to access our already-collected evidence (along with the preprocessing and precomputed embeddings), please find more details under curated_dataset.

Dataset preprocessing and embeddings computation

You can find our pipeline for preprocessing the data and computing the embeddings under data_preprocessing. If you are using our collected evidence dataset, you can skip this step.

Training and evaluation of CCN

We share our training and evaluation code for two setups: 1) Training using sentence embeddings. 2) Training using BERT+LSTM.

Fine-tuning CLIP

You can find our code to finetune CLIP in finetuning_clip.

Checkpoints

Checkpoints can be found here.

Citation

If you find this code or dataset helpful, please cite our paper:

@inproceedings{abdelnabi22cvpr,
    title = {Open-Domain, Content-based, Multi-modal Fact-checking of Out-of-Context Images via Online Resources},
    author = {Sahar Abdelnabi and Rakibul Hasan and Mario Fritz},
    year = {2022},
    booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)}
}

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
curated_dataset		curated_dataset
data_preprocessing		data_preprocessing
dataset_collection		dataset_collection
evidence_links		evidence_links
finetuning_clip		finetuning_clip
training_and_evaluation		training_and_evaluation
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Open-Domain, Content-based, Multi-modal Fact-checking of Out-of-Context Images via Online Resources

Abstract

Dataset collection

Evidence links

Dataset access and description

Dataset preprocessing and embeddings computation

Training and evaluation of CCN

Fine-tuning CLIP

Checkpoints

Citation

About

Releases

Packages

Languages

S-Abdelnabi/OoC-multi-modal-fc

Folders and files

Latest commit

History

Repository files navigation

Open-Domain, Content-based, Multi-modal Fact-checking of Out-of-Context Images via Online Resources

Abstract

Dataset collection

Evidence links

Dataset access and description

Dataset preprocessing and embeddings computation

Training and evaluation of CCN

Fine-tuning CLIP

Checkpoints

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages