Twitter Data Analysis

Sentiment analysis and topic modeling

Table of content

Introduction
Overview
Install
Data
Notebooks
Models
Databases
Scripts
Pipelines
Test
Author

Introduction

Starting on August 8, 2022, around 135 trainees along with the whole tutoring and managing team of 10 Academy officially started our week-long, highly intensive, and overwhelming journey of training/testing phase.

There were many forms of tests and training this week. Starting from data acquiring, engineering, transformations, cleaning, visualizations, and EDA, to modeling and machine learning engineering as well as other development skills and best practices like source code management, version control, CICD, debugging bug fixing, and testing. Also, data storage using SQL and other visualization using dashboards on platforms such as Streamlit or Heroku.

Overview

The purpose of this project is to extract sentiment analysis and topic modeling from the given two Twitter data sets by using several topic modeling models and sentiment analysis methods. After analyzing such insights the models built shall be also considering the inevitable cases of different types of data shifts. The sentiment analysis and the topic modeling models are a central object in the framework mentioned above, but it is often unknown, subject to personal knowledge and bias, or loosely connected to the available data. The main objective of the task is to highlight the importance of the matter in a concrete way. In this spirit, trainees are expected to attempt the following tasks:

1.  Perform a sentiment analysis on the Twitter data set.
2.  Perform topic modeling on the Twitter data set.
3.  Get familiar with all the DevOps and MlOps tools provided.

Install

git clone https://github.com/Fisseha-Estifanos/Twitter-Data-Analysis.git
cd Twitter-Data-Analysis
pip install -r requirements.txt

Data

Data can be found here at google drive, and or Global data set, or African data set

Notebooks

All the preprocessing, analysis, EDA and examples of sentiment analysis and topic modeling implementation will be here in the form of .ipynb file in the notebooks folder.

Models

All the models generated will be found here in the models folder.

Database

All the databases generated will be found here in the databases folder.

Scripts

All the modules for the analysis are found here.

Pipelines

Dvc played a huge role in creating a reproducible pipelines real fast and easily. We can easily update the 'dvc.yml' file and create or add several steps to our cleaning, preprocessing, extracting, exploratory data analysis, sentiment analysis, topic modeling and database creation steps.

Tests

All the unit and integration tests are found here in the tests folder.

Author

👤 Fisseha Estifanos

GitHub: Fisseha Estifanos
LinkedIn: Fisseha Estifanos

Show us your support

Give us a ⭐ if you like this project!

Name		Name	Last commit message	Last commit date
Latest commit History 172 Commits
.dvc		.dvc
.github/workflows		.github/workflows
MlOps-Modules		MlOps-Modules
data		data
databases		databases
generated_data		generated_data
models		models
notebooks		notebooks
plots		plots
sample_data		sample_data
tests		tests
.dvcignore		.dvcignore
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
clean_params.yml		clean_params.yml
clean_tweets_dataframe.py		clean_tweets_dataframe.py
defaults.py		defaults.py
dvc.yaml		dvc.yaml
extract_dataframe.py		extract_dataframe.py
extract_params.yml		extract_params.yml
populate_db.yml		populate_db.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Twitter Data Analysis

Sentiment analysis and topic modeling

Introduction

Overview

Install

Data

Notebooks

Models

Database

Scripts

Pipelines

Tests

Author

Show us your support

About

Releases

Packages

Languages

License

Fisseha-Estifanos/Twitter-Data-Analysis

Folders and files

Latest commit

History

Repository files navigation

Twitter Data Analysis

Sentiment analysis and topic modeling

Introduction

Overview

Install

Data

Notebooks

Models

Database

Scripts

Pipelines

Tests

Author

Show us your support

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages