Sentiment Analysis for Fine-Grained Classification

Abstract

Sentiment analysis on movie reviews is crucial for understanding audience preferences. This study compares traditional machine learning (ML) and modern deep learning (DL) models for fine-grained sentiment classification using the SST-5 dataset of movie reviews with five sentiment classes. Logistic Regression, Random Forest, Support Vector Machine, a simple neural network, and LSTM achieved 36-40% accuracy. In contrast, pretrained transformer models DistilBERT and BERT outperformed with 51% and 53% accuracy, respectively. Oversampling decreased performance due to overfitting and noise amplification. The results highlight the effectiveness of pretrained DL models but also the challenges posed by frequent neutral texts and annotator bias in SST-5.

Introduction

Fine-grained sentiment analysis allows for more precision and nuance in understanding human language compared to binary classification. However, it is more complex due to factors like sentence structure and dataset balance. The Stanford Sentiment Tree (SST-5) dataset was developed to address shortcomings of traditional datasets by incorporating a tree structure onto sentences and labeling phrases on a 1-5 sentiment scale.

Methodology

We explored various machine learning and deep learning models on the SST-5 dataset, including Logistic Regression, Random Forest, Support Vector Machine, a simple neural network, LSTM, DistilBERT, and BERT. Preprocessing steps involved data splitting, tokenization, lemmatization, stop-word removal, and TF-IDF vectorization.

Results

Discussion

Pretrained transformer models like BERT and DistilBERT performed significantly better than traditional models, attributed to their ability to capture contextual and semantic information effectively. However, the SST-5 dataset posed challenges like frequent neutral short texts and potential annotator bias, impacting model efficacy. Oversampling also led to overfitting and amplification of noise in minority classes.

Conclusion

Modern deep learning models, especially pretrained transformer architectures like BERT, outperformed traditional approaches for fine-grained sentiment classification on SST-5. However, dataset-specific challenges highlight the need for further optimization and research to address complexities in fine-grained sentiment analysis.

Teammate and Project Information

Teammate: Ali KhosraviPour

The project was part of the Neuromatch Academy in Deep Learning course, supervised by our Project TA Joseph AKINYEMI. The goal was to compare the performance of traditional and modern sentiment analysis models on a fine-grained dataset and analyze the impact of various challenges inherent in such tasks.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Code		Code
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentiment Analysis for Fine-Grained Classification

Abstract

Introduction

Methodology

Results

Discussion

Conclusion

Teammate and Project Information

References

About

Releases

Packages

Languages

License

Farid-Karimi/Fine-Grained-Classification

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analysis for Fine-Grained Classification

Abstract

Introduction

Methodology

Results

Discussion

Conclusion

Teammate and Project Information

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages