Multi-Label-Topic-Classification-of-Tweet

This is a group project in a Natural Language Processing (NLP) coursework, UEH university. The goal of this project is to build a NLP model that can label tweets under certain topics.

Project overview

1. Data acquisition

Using Tweepy, data are scrapped from Twitter from the following topics: 'Sony', 'politics', 'environment', 'technology', 'travel', 'sustainability', 'economy', 'social media', 'sport', 'music', 'photography', 'healthcare', 'education', 'Sony and politics', 'Sony and environment', 'Sony and technology', 'Sony and travel', 'Sony and sustainability', 'Sony and economy', 'Sony and social media', 'Sony and sport', 'Sony and music', 'Sony and photography'.
Data then can be pickled into a new file and stored in the "Data.zip".

2. Data pre-processing

Lowercase and remove punctuations from Tweets.
Remove Stop Words.
Tokenize words.
Vectorize Tweet using TF - IDF technique.

3. Topic modelling

Topics (Labels) are first binarized using MultiLabelBinarizer.
Using Classifier Chain method to handle multi-label classification task, 20 classification chains were created. Each classifier chain contains a Naive Bayes model for each of the labels. Another 20 classification chains were created, this time is for the Logistic Regression model.
The models in each chain are ordered randomly. In addition to the features in the dataset, each model gets the predictions of the preceding models in the chain as features. These additional features allow each chain to exploit correlations among the classes.

4. Model evaluation

Metrics to evaluate models performance include: Recall, Precision and F1 Score.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Data.zip		Data.zip
LICENSE		LICENSE
MultiLabel_Classification.ipynb		MultiLabel_Classification.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-Label-Topic-Classification-of-Tweet

Project overview

1. Data acquisition

2. Data pre-processing

3. Topic modelling

4. Model evaluation

About

Releases

Packages

Languages

License

minhnhat2001vt/Multi-Label-Topic-Classification-of-Tweet

Folders and files

Latest commit

History

Repository files navigation

Multi-Label-Topic-Classification-of-Tweet

Project overview

1. Data acquisition

2. Data pre-processing

3. Topic modelling

4. Model evaluation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages