Topic Modeling with The New York Times Headlines (Aug 2019 - Jul 2022)

This repository is a work done for a talk that I have prepared for on Topic Modeling (titled What Can Machine Learning Do with Your Unstructured Data?).

The model used was BERTopic.The work covers how semantically similar documents (in this case, NYT headlines) tend to be closer together in a vector space. It also provides a general idea of Dynamic Topic Modeling, where we delved into how the frequencies of the topics / themes evolve over time.

Reproducibility

As there are limits to the large files storage on Github, I have decided to not push the model artifacts on this repo. However, you can reproduce it by cloning the repo onto your local drive (GPU-enabled machine required) or onto a GPU-enabled Google Colab instance:

    git clone https://github.com/bengsoon/NYT_topic_modeling/

Within the cloned folder, create the conda environment:

    conda create -f environment.yml

Run streamlit

    cd app
    streamlit run app.py

Viewing Results in Web App

I have created a Streamlit app that presents the results of the Topic Modeling https://nyt-topicmodel.streamlitapp.com/.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
app		app
data		data
reports		reports
00_nytimes_api.ipynb		00_nytimes_api.ipynb
01_news_headlines_bertopic.ipynb		01_news_headlines_bertopic.ipynb
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Topic Modeling with The New York Times Headlines (Aug 2019 - Jul 2022)

Reproducibility

Viewing Results in Web App

About

Releases

Packages

Languages

bengsoon/NYT_topic_modeling

Folders and files

Latest commit

History

Repository files navigation

Topic Modeling with The New York Times Headlines (Aug 2019 - Jul 2022)

Reproducibility

Viewing Results in Web App

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages