This repository describes a framework to perform sentiment analyis on COVID-19 tweets posted in Hindi language on Twitter platform. The framework leverages open-source machine translation tools to translate Hindi tweet to English and then pass the preprocessed translated tweet as an input to a BERT-based model for performing multi-lingual sentiment polarity detection.
- Tweet dataset : Kaggle dataset
- GloVe embeddings : GloVe
- FastText embeddings : FastText
- GloVe Twitter : GloveTwitter
- Crisis Embeddings : Crisis Embeddings
Model Description | Training Accuracy | Validation Accuracy | Notebook Link |
---|---|---|---|
Basic LSTM | 85.1% | 86.7% | Notebook |
LSTM + GloVe Embeddings | 88.9% | 90.9% | Notebook |
LSTM + FastText Embeddings | 92.5% | 88.9% | Notebook |
LSTM + Crisis Embeddings | 83.4% | 84.7% | Notebook |
Basic Bi-directional LSTM | 87.3% | 86.0% | Notebook |
Bi-directional LSTM + GloVe Embeddings | 91.2% | 90.6% | Notebook |
Bi-directional LSTM + FastText Embeddings | 88.3% | 88.6% | Notebook |
Bi-directional LSTM + Crisis Embeddings | 86.0% | 85.1% | Notebook |
BERT | 99.7% | 93.8% | Notebook |
Below is the system architecture for sentiment polarity detection of COVID-19 tweets in Hindi using machine translation and BERT.
I have published my findings as a research paper: 'Covhindia: Deep Learning Framework for Sentiment Polarity Detection of Covid-19 Tweets in Hindi' in the 'International Journal on Natural Language Computing'