An emoji-centric NLP resources based on Twitter Data
EmoTag is a collection of resources for analyzing the emotion and sentiment of Emojis as well as Tweets written in English. The name EmoTag indicates its usefulness in exploiting emojis for emotional tagging.
-
Baseline Emoji Emotion Scores: 1200 Emoji-Emotion pairs annotated by humans. It contains emotion scores ranging from 0 to 1 for 150 most popular Twitter emojis for 8 emotion classes (i.e. anger, anticipation, disgust, fear, joy, sadness, surprise, and trust). [Download Scores] [Download Details]
-
Interpretable Word Vectors: A 620-dimensional vector representation of words and emojis trained on ~20.8 million emoji-centric Twitter data. [Download]
-
Raw Tweets: This contains Tweet IDs of ~20.8 million tweets used in our experiments. Please contact us if you need additional samples. [Download All Tweet IDs]
-
Word-Emoji Co-occurrence Frequencies: This lexicon provides word-emoji co-occurrence frequencies observed in our dataset. [Download]
-
Emoji-Emoji Co-occurrence Frequencies: This is the subset of the previous lexicon (i.e. Word-Emoji co-occurrences) which contains only emoji-emoji co-occurrence counts observed in our dataset. This would be useful if someone quickly wants to find co-occurring emojis. [Download]
Please cite the following paper if using any of our resources in an academic publication:
- EmoTag1200 👍 : Understanding the Association between Emojis 😄 and Emotions 😻. Abu Awal Md Shoeb, and Gerard de Melo, EMNLP 2020, November 2020. [BibTeX][Presentation][Video]
- EmoTag – Towards an Emotion-Based Analysis of Emojis. Abu Awal Md Shoeb, Shahab Raji, Gerard de Melo. RANLP 2019, September 2019. [BibTex][Presentation]
- Email: abu.shoeb@rutgers.edu