Conservatory-Sentiment Analysis on COVID-19 Tweets

Develop Machine Learning models to decipher covid tweets and predict whether they are positive, negative or neutral tweets.

Please click on link to interact with app deployed on Heroku.

Background

People use social media not only to share information, but to share their feelings. Over the last year, Coronavirus and the resulting quarantine has greatly affected our lives, and social media platforms, primarily Twitter, which are overflowing with posts about this topic. While positivity and negativity are blatantly obvious in some tweets, other times we can struggle to decipher sentiments in a loaded tweet.

Method

We used machine learning tools to execute sentiment analysis on COVID-19 related tweets starting with data extraction from Kaggle, uploaded into Jupyter notebook for clean up and sorting (filling NaN values with “Unknown” for those that were missing location data; drop unnecessary columns such as user names and screen names, checked that we had unique labels to ensure our datasets were clean.

We then selected Scikit-learn as our primary machine learning library because of its simple and effective nature for which we started with text processing using CountVectorizer, to tokenizing the text, to then transform the data using TFIDF Transformer.

We employed the following classification models for the analysis and prediction:

Logistic regression
Linear SVC
Naive Bayes
Random Forest Classifier

We selected our second model LinearSVC as it yielded the highest accuracy. We then saved CountVectorizer and TFID Transformer objects into a pickle file. Also, saved a LinearSVC model with best parameters from the GridSearchCV.

We used Flask to render the result of submitted tweets. This will enable us to make real time predictions. We also extracted more data to test our model. Due to the large size of datasets, we decided to use about 0.05% of the new data which still had about 360000 rows. This data together with train and test data were loaded in a postgresql database.

Once our model is finalized, we were then able to deploy it in the web browser and hosted on Heroku. A screenshot of the app is seen below:

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
.ipynb_checkpoints		.ipynb_checkpoints
.vscode		.vscode
Dilia		Dilia
corona_tweets		corona_tweets
static		static
templates		templates
.DS_Store		.DS_Store
.gitignore		.gitignore
Conservatory Final Project.pdf		Conservatory Final Project.pdf
Conservatory Final Project_Covid Tweets.docx		Conservatory Final Project_Covid Tweets.docx
Dashboard1.png		Dashboard1.png
Dashboardfull.png		Dashboardfull.png
README.md		README.md
app.py		app.py
rona_tweets_schema.sql		rona_tweets_schema.sql

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Conservatory-Sentiment Analysis on COVID-19 Tweets

Background

Method

About

Releases

Packages

Contributors 5

Languages

LenSin3/Conservatory-CovidTweetsSentiments

Folders and files

Latest commit

History

Repository files navigation

Conservatory-Sentiment Analysis on COVID-19 Tweets

Background

Method

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages