CAPP 30254 ML Final Project
- Anthony Hakim
- Sasha Filippova
- Yifu Hou
Research Question: Can we identify fake news articles based on article title alone?
In this project, our team designed 2 Natural Language Processing (NLP) machine learning models to classify fake news articles using only article titles. For our baseline model, we use a logistic regression model and TF-IDF techniques to classify fake news articles with 94% accuracy. We also apply a pre-trained BERT model for classification, and discover that the more complex model preforms with lower accuracy.
- baseline_model.ipynb: TF-IDF logistic regression training and testing.
- classification.ipynb: Final BERT model hyperparameter tuning, training and testing.
- original_bert.ipynb: Baseline BERT model training and testing.
- util.py: file of helper functions to preprocess data.
- data/: directory containing data.
- final_presentation: final presentation of results.
https://www.kaggle.com/datasets/clmentbisaillon/fake-and-real-news-dataset?select=True.csv