Table of content
Starting on August 8, 2022, around 135 trainees along with the whole tutoring and managing team of 10 Academy officially started our week-long, highly intensive, and overwhelming journey of training/testing phase.
There were many forms of tests and training this week. Starting from data acquiring, engineering, transformations, cleaning, visualizations, and EDA, to modeling and machine learning engineering as well as other development skills and best practices like source code management, version control, CICD, debugging bug fixing, and testing. Also, data storage using SQL and other visualization using dashboards on platforms such as Streamlit or Heroku.
The purpose of this project is to extract sentiment analysis and topic modeling from the given two Twitter data sets by using several topic modeling models and sentiment analysis methods. After analyzing such insights the models built shall be also considering the inevitable cases of different types of data shifts. The sentiment analysis and the topic modeling models are a central object in the framework mentioned above, but it is often unknown, subject to personal knowledge and bias, or loosely connected to the available data. The main objective of the task is to highlight the importance of the matter in a concrete way. In this spirit, trainees are expected to attempt the following tasks:
1. Perform a sentiment analysis on the Twitter data set.
2. Perform topic modeling on the Twitter data set.
3. Get familiar with all the DevOps and MlOps tools provided.
git clone https://github.com/Fisseha-Estifanos/Twitter-Data-Analysis.git
cd Twitter-Data-Analysis
pip install -r requirements.txt
Data can be found here at google drive, and or Global data set, or African data set
All the preprocessing, analysis, EDA and examples of sentiment analysis and topic modeling implementation will be here in the form of .ipynb file in the notebooks folder.
All the models generated will be found here in the models folder.
All the databases generated will be found here in the databases folder.
All the modules for the analysis are found here.
Dvc played a huge role in creating a reproducible pipelines real fast and easily. We can easily update the 'dvc.yml' file and create or add several steps to our cleaning, preprocessing, extracting, exploratory data analysis, sentiment analysis, topic modeling and database creation steps.
All the unit and integration tests are found here in the tests folder.
👤 Fisseha Estifanos
- GitHub: Fisseha Estifanos
- LinkedIn: Fisseha Estifanos
Give us a ⭐ if you like this project!