Skip to content

An advanced machine learning project focused on classifying news articles into 'agreed', 'disagreed', and 'unrelated' categories to effectively identify and combat fake news

Notifications You must be signed in to change notification settings

amoghkori/Fake-news-classification

Repository files navigation

Fake News Classification

Project Overview

This project focuses on the classification of news articles into categories such as 'agreed', 'disagreed', and 'unrelated'. The aim is to identify and categorize fake news effectively. This repository contains the code for training and evaluating different machine learning and deep learning models on a dataset of news article pairs.

Dataset

The dataset used in this project is a collection of news article pairs. Each pair of articles is labeled as 'agreed', 'disagreed', or 'unrelated', based on their content.

Models

We employ several models for this task:

  1. Logistic Regression: A baseline model for classification.
  2. Random Forest: An ensemble learning method for classification.
  3. Neural Network with GRU: A deep learning approach using Gated Recurrent Units (GRU).

Requirements

  • Python 3.x
  • Pandas
  • NumPy
  • Scikit-Learn
  • NLTK
  • TensorFlow
  • Matplotlib
  • Seaborn

Installation

To install the required packages, run the following command:

pip install pandas numpy scikit-learn nltk tensorflow matplotlib seaborn

Usage

  1. Data Preprocessing:

    • Load the dataset.
    • Clean and preprocess the text data.
  2. Feature Extraction:

    • Convert text data into numerical form using TF-IDF Vectorization.
  3. Model Training:

    • Train the Logistic Regression and Random Forest models.
    • Construct and train the Neural Network with GRU.
  4. Evaluation:

    • Evaluate the models on a test set.
    • Generate classification reports.
  5. Prediction on Unseen Data:

    • Use the trained models to predict labels on new data.
    • Output the results to a CSV file.

File Descriptions

  • train.csv and test.csv: The training and testing datasets.
  • train_data and test_data: Python scripts for training and testing the models.
  • results.csv: The output file with predictions on the test set.

Results

The models are evaluated based on their accuracy and loss. The results are visualized using Matplotlib.

About

An advanced machine learning project focused on classifying news articles into 'agreed', 'disagreed', and 'unrelated' categories to effectively identify and combat fake news

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published