Skip to content

sanjanagupta16/Topic-Modeling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

Topic-Modeling

Latent Dirichlet allocation (LDA) is a topic model that generates topics based on word frequency from a set of documents. LDA is particularly useful for finding reasonably accurate mixtures of topics within a given document set. It builds a topic per document model and words per topic model, modeled as Dirichlet distributions.

Data

The data set being used is a list of over one million news headlines published over a period of 15 years and can be downloaded from Kaggle - https://www.kaggle.com/therohk/million-headlines/data

Packages

The following Python packages will be used:

  • genism
  • nltk