About:
- Sentiment analysis is one of the techniques that aims to classify textual data into positive, neutral, and negative labels. The study developed a predictive model that can be used to perform sentiment analysis predictions in various fields.
- This project works on sentiment analysis on Covid Twitter tweets and iPhone Twitter tweets by training on General Tweets from Twitter.
- Based on product review sentiment analysis, companies are able to understand the sentiment of the customers towards their products or services.
- Covid review can be used to determine a human's emotional state and stability during the pandemic so that the government can determine the mental health of the people and take precautions if needed.
Dataset Description:
-
Training dataset based on General Tweets that consists of both Twitter and iPhone tweets. The dataset consists of 27480 rows of training tweets with sentiment labels positive, neutral and negative. Neutral labelled consists of 11117 rows, positive labelled consists of 8582 rows, negative labelled consists of 7781 rows [https://www.kaggle.com/datasets/abhi8923shriv/sentiment-analysis-dataset?select=train.csv]. Neutral labelled consists of 11117 rows, positive labelled consists of 8582 rows, negative labelled consists of 7781 rows.
-
Testing dataset (Using web scrapping technique from Twitter [https://apify.com/quacker/twitter-scraper]):
- Covid Tweets: The dataset consists of 290 rows of testing tweets with sentiment labels positive, neutral and negative. Neutral labelled consists of 102 rows, positive labelled consists of 88 rows, negative labelled consists of 100 rows. Neutral labelled consists of 102 rows, positive labelled consists of 88 rows, negative labelled consists of 100 rows.
- iPhone Tweets: The dataset consists of 310 rows of testing tweets with sentiment labels positive, neutral and negative. Neutral labelled consists of 77 rows, positive labelled consists of 107 rows, negative labelled consists of 126 rows. Neutral labelled consists of 77 rows, positive labelled consists of 107 rows, negative labelled consists of 126 rows
Proposed Frameworks:
-
Machine Learning
- Individual Models: Using 8 classifiers: SVM, LR, Multinomial Naive Bayes (MNB), Decision Tree (DT), KNN, Random Forest (RF), Gradient Boosting Classifiers, Extremely Randomized Trees
- Ensemble Models: Voting and Stacking perform on (SVC, LR, MNB and DT with LR) and (SVC, LR, MNB and RF with LR); Bagging (DT, RF)
-
Deep Learning
- Individual Models: CNN, BiLSTM, RNN, LSTM, GRU
- Ensemble Models: Stacked RNN-LSTM-GRU with SVM, Stacked RNN-LSTM-GRU with LR
- Hybrid: CNN-BiLSTM
Parameters:
- The parameter values of the SVC Classifier are tuned as : C=0.1, kernel= ‘linear’, random_state = 1
- The parameter values of the Logistic Regression Classifier are tuned as : multi_class= ‘multinomial’, random_state = 1, solver= ‘saga’
- The parameter values of the Multinomial Naive Bayes Classifier are tuned as : alpha =20
- The parameter values of the KNN Classifier are tuned as : n_neighbors=13
- The parameter values of the Decision Tree Classifier are tuned as: random_state =1, max_leaf_nodes=65, max_depth=100
- The parameter values of the Random Forest Classifier are tuned as: random_state = 1, max_leaf_nodes=65, max_depth=100
- The parameter values of the Gradient Boosting Classifier are tuned as: random_state = 1, max_depth=3 , max_leaf_nodes=40
- The parameter values of the Extremely Randomized Trees Classifier are tuned as: random_state = 1, max_depth=80, max_leaf_nodes=100.
Results
-
Covid Individual Model Testing Result
- Extremely Randomized Trees is selected as the best individual sentiment classifier (Highest accuracy with the least training time)
- Ensemble Voting model with SVC, Logistic Regression, Multinomial Naive Bayes and Decision Tree as the base learners and Logistic Regression as the meta-classifier has high training accuracy with an accuracy of 71.94%.
- Although Stacking has a higher training accuracy, but from the overall testing accuracy performance and training time, it does not perform well. The results state that Voting has a higher testing accuracy with 62.00% as compared to Stacking with the accuracy of 60.00%, it increase overall performance by 2%.
-
Iphone Individual Model Testing Result
- The testing accuracy for all the models have a huge difference with the training accuracy. Mostly due to the unstructured and contains lots of noisy data as well as weak sentiment constraints, hence this makes machine difficult to analze and gain valuable pattern from the data, e.g. emoji ambiguity, uncertainty of word meaning, negation words.
-
Covid Dataset Using Individual and Ensemble Model
- Voting model and Bagging model have the same testing accuracy but a slight difference in terms of the training accuracy and training time. Since the training time of Bagging model is shorter than the Voting model, the training time performance of Bagging model is better than Voting model.
- The performance of the proposed ensemble classifier which is Bagging with Random Forest is compared with the best individual traditional classifier which is Extremely Randomized Tree. The results show that the implementation of the Ensemble Bagging model of Random Forest can increase the sentiment classification performances.
Deep Learning Model
- LSTM and BiLSTM has the best performance for the train and validation data with an accuracy of 76% and 72% followed by GRU and RNN whereas CNN does not perform as well as the other individual models.
- BiLSTM is said to be the best is because it performs better with data of different patterns. Ability to capture bidirectional semantic dependencies.
Ensemble Model
- Different meta-learners on training and validation data are evaluated only in terms of accuracy.
- Stacked RNN-LSTM-GRU with SVM model has better performance as compared to the Stacked RNN-LSTM-GRU with LR in terms of training set accuracy.
- When it comes to the performance of the models for the COVID-19 related Tweets test set, both models have the same accuracies of 59% but by considering the F1-score, the model using the LR meta-learner has a slightly higher F1-score for two classes which makes it slightly better than the SVM meta-learner. For the iPhone review Tweets test set, the stacked model that uses the SVM meta-learner has a 1% higher accuracy of 49% compared to the LR meta-learner.
- Stacked ensemble model performs better when SVM is used as its meta-learner. This is said so because SVM performs better when dealing with datasets of different patterns which is shown in the results of the models when performing prediction on the two test sets.
- SVM is efficient in word processing and for dealing with high dimensional contexts.
Conclusion: Stacked RNN-LSTM-GRU with SVM has the best performance compared to the other two models on average