Skip to content

Sentiment analysis project from internship at TU Graz

Notifications You must be signed in to change notification settings

bblazeka/sentiment-analysis

Repository files navigation

sentiment-analysis

In this repository, we examine which dictionaries or algorithms would be best applied for sentiment analysis of reddit posts. Also, the results of comparsion are examined for conclusions about subreddits.

Code produces the log files of all entries that are scored and scores for each dictionary and also various graphs that represent presence of important words, recognized words, distribution of verdicts (on positive, negative, neutral and unknown) in various visualization methods and various correlations. Which output is generated is determined in the test.py script.

For each table that is analysed, a folder is created. Code was tested on reddit content, divided by subreddit (technology, StarWars, sports, politics, facepalm, philosophy) and by being controversial or non-controversial.

Also, temp data is generated (in json format) so that first process (entry sentiment analysis) and second process (graphing for each subreddit) do not have to be neccesarily run at the same time, especially since not all tables finish their processing at the same time (because of parallelism).

Dictionaries by origin

LabMT (MANUAL) - language assessment by Mechanical Turk, 50 ratings

Sent140Lex (AUTO) - created from the “sentiment140” corpus of tweets, using Pairwise Mutual Information (measure of association used in information theory and statistics) with emoticons as positive and negative labels

VADER (MANUAL) - "method developed specifically for Twitter and social media analysis", all words were rated by 10 experts and their average was the rating (Mechanical Turk survey, 10 ratings)

HashtagSent (AUTO) - NRC Hashtag Sentiment Lexicon: created from tweets using Pairwise Mutual Information with sentiment hashtags as positive and negative labels

SentiWordNet (AUTO) - WordNet synsets each assigned three sentiment scores: positivity, negativity, and objectivity.

SenticNet (AUTO) - Sentiment dataset labeled with semantics and 5 dimensions of emotions by Cambria et al. (label propagation)

SOCAL (MANUAL) - Manually constructed general-purpose sentiment dictionary

WDAL (MANUAL) - Whissel’s Dictionary of Affective Language: words rated in terms of their Pleasantness, Activation, and Imagery (concreteness) (Survey: Columbia students)

Running the repository

In order to run it, you need to aquire resources (dictionary files with sentiments for words), put them in "data" folder and name subfolders by the name of the dictionary. At least, that is the setup currently expected.

To run the sentiment analysis, you need to make sure you have all the required datasets and you need to write: python3 test.py

Resources:

https://github.com/andyreagan/labMT-simple

https://github.com/cjhutto/vaderSentiment

About

Sentiment analysis project from internship at TU Graz

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages