Skip to content

Latest commit

 

History

History
12 lines (9 loc) · 713 Bytes

Readme.md

File metadata and controls

12 lines (9 loc) · 713 Bytes

News Article Clustering

Overview

The goal of this project is to develop a framework which is capable of clustering news articles on the basis of their text contents. Several techniques such as TFIDF, cosine similarity, truncated SVD, and k-means clustering are applied to this project. This project is basically composed of four parts as follows:

  1. Process and tokenize the news articles
  2. Build a sparse TF-IDF matrix from all terms of the news articles
  3. Perform dimensionality reduction using truncated SVD
  4. Cluster the news articles using k-means clustering

Link to the Project

  1. https://nbviewer.jupyter.org/github/KUANCHENGFU/News-Article-Clustering/blob/main/News_article_clustering.ipynb