Skip to content

Topic Modeling on BBC News using Facebook's FastText embeddings and LDA probabilistic model.

Notifications You must be signed in to change notification settings

akulez/NLP-Topic_Modeling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Project Summary: This project forcuses on applying Topic Modeling to a BBC News Dataset. Topic Modeling is a statistical modeling technique used to uncover the main themes and topics present in a structure of documents or textual data. The project was done in several parts as follows:

  1. Analysis and Preprocessing

    • Performed an in-depth analysis of the dataset and the textual content it contained by using various descriptive statistics.
    • Used visualizations as well as word clouds to figure out most commonly used words and patterns in the occurrence of words.
    • Combined the titles and descriptions of the news articles to extract more information from the text.
  2. Text Preprocessing:

    • Performed text preprocessing to clean the text and prepare it for analysis, andremove words and instanceswhich do not add any semantic meaning to the text.
    • Removed stopwords, punctuation, extra spaces, and unnecessary characters which do not add any semantic meaning to the text and could interfere with the accuracy of the analysis.
  3. Text Vectorization using FastText and Embedding Visualizations using UMAP:

    • Converted the preprocessed text into vector representations using FastText embeddings.
    • Utilized FastText embeddings to capture the semantic meaning of words and generate numerical representations of the text.
    • Created Embedding Visualizations using UMAP.
  4. Topic Modelling with LDA

    • Applied the Latent Dirichlet Allocation (LDA) algorithm, a popular topic modeling technique, to identify underlying topics within the dataset.
    • Analyzed patterns of word co-occurrence in the documents to uncover latent themes or topics.
  5. Analysis of LDA results

    • Interpreted and understood the results obtained from the LDA model.
    • Identified the most significant terms within each topic to gain insights into the main themes present in the dataset.

About

Topic Modeling on BBC News using Facebook's FastText embeddings and LDA probabilistic model.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published