- Overview
- Dataset Description
- Dataset Usage
- Installation
- Run models for paper metrics
- Contribution
- License
- Contact
Welcome to the Fin-Fact repository! Fin-Fact is a comprehensive dataset designed specifically for financial fact-checking and explanation generation. This README provides an overview of the dataset, how to use it, and other relevant information. Click here to access the paper.
- Name: Fin-Fact
- Purpose: Fact-checking and explanation generation in the financial domain.
- Labels: The dataset includes various labels, including Claim, Author, Posted Date, Sci-digest, Justification, Evidence, Evidence href, Image href, Image Caption, Visualisation Bias Label, Issues, and Claim Label.
- Size: The dataset consists of 3562 claims spanning multiple financial sectors.
- Additional Features: The dataset goes beyond textual claims and incorporates visual elements, including images and their captions.
Fin-Fact is a valuable resource for researchers, data scientists, and fact-checkers in the financial domain. Here's how you can use it:
- Download the Dataset: You can download the Fin-Fact dataset here or via the Hugging Face Hub. You can also load the dataset by using the following code:
from datasets import load_dataset
dataset = load_dataset("amanrangapur/Fin-Fact")
-
Exploratory Data Analysis: Perform exploratory data analysis to understand the dataset's structure, distribution, and any potential biases.
-
Natural Language Processing (NLP) Tasks: Utilize the dataset for various NLP tasks such as fact-checking, claim verification, and explanation generation.
-
Fact Checking Experiments: Train and evaluate machine learning models, including text and image analysis, using the dataset to enhance the accuracy of fact-checking systems.
Requires Python 3.9 to run.
Install conda environment from environment.yml
file.
conda env create -n finfact --file environment.yml
conda activate finfact
We provide scripts let you easily run our dataset on existing state-of-the-art models and re-create the metrics published in paper. You should be able to reproduce our results from the paper by following these instructions. Please post an issue if you're unable to do this. To run existing ANLI models for fact checking.
Please create .env file and set your API key:
OPENAI_API_KEY="YOUR KEY"
GEMINI_API_KEY="YOUR KEY"
To run MLLM experiments:
python scripts/models/experiments.py --model ['llava/gpt-4/gemini'] --prompt_type ['open_book/closed_book/cot/symbolic/self_help']
- BART
python scripts/models/anli.py --model_name 'ynie/bart-large-snli_mnli_fever_anli_R1_R2_R3-nli' --data_file finfact.json --threshold 0.5
- RoBERTa
python scripts/models/anli.py --model_name 'ynie/roberta-large-snli_mnli_fever_anli_R1_R2_R3-nli' --data_file finfact.json --threshold 0.5
- ELECTRA
python scripts/models/anli.py --model_name 'ynie/electra-large-discriminator-snli_mnli_fever_anli_R1_R2_R3-nli' --data_file finfact.json --threshold 0.5
- AlBERT
python scripts/models/anli.py --model_name 'ynie/albert-xxlarge-v2-snli_mnli_fever_anli_R1_R2_R3-nli' --data_file finfact.json --threshold 0.5
- XLNET
python scripts/models/anli.py --model_name 'ynie/xlnet-large-cased-snli_mnli_fever_anli_R1_R2_R3-nli' --data_file finfact.json --threshold 0.5
- GPT-2
python gpt2_nli.py --model_name 'fractalego/fact-checking' --data_file finfact.json
We welcome contributions from the community to help improve Fin-Fact. If you have suggestions, bug reports, or want to contribute code or data, please check our CONTRIBUTING.md file for guidelines.
Fin-Fact is released under the MIT License. Please review the license before using the dataset.
For questions, feedback, or inquiries related to Fin-Fact, please contact arangapur@hawk.iit.edu
.
We hope you find Fin-Fact valuable for your research and fact-checking endeavors. Happy fact-checking!