Skip to content

Identify topics for news articles using Non-Negative Matrix Factorization

Notifications You must be signed in to change notification settings

dlynch42/bbc_news_classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BBC News Classifier

This project explores a Non-Negative Matrix Factorizaton (NMF) model that classifies BBC news articles by topic. The model is trained on a dataset of BBC news articles and their corresponding categories. The project uses various Python libraries such as numpy, pandas, matplotlib, seaborn, and sklearn for data manipulation, visualization, and machine learning.

The project is structured as follows:

  • week4.ipynb: Main notebook where the data is loaded, explored, and processed. The NMF model is also trained and evaluated in this notebook.

  • nmf_limitations.ipynb: This notebook explores the limitations of Non-negative Matrix Factorization (NMF), a technique used in the main notebook.

  • data/: This directory contains the datasets used in the project. It includes training and testing data for BBC news articles (BBC News Train.csv, BBC News Test.csv, BBC News Sample Solution.csv) and movie ratings data (w3/movies.csv, w3/test.csv, w3/train.csv, w3/users.csv).

  • submission.csv: This file contains the model's predictions on a test set.

  • test_profile.html: This file is an HTML report generated from the data profiling process.

Please note that some files and directoriesare ignored by Git as specified in the .gitignore file. The data can be downloaded through Kaggle.

About

Identify topics for news articles using Non-Negative Matrix Factorization

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published