A public dashboard backed with an elaborate sentiment analyzer parsing a Twitter stream to understand public opinions about the Police in the United States.
Our #purpose is to help the public understand and constructively gain from and contribute to the public conversation about the Police Force in the United States.
A significant portion of today's public conversation includes questions, opinions, and the implications of the police's role in society. CopSense aims to illustrate the big picture on the polarizing sentiments across the USA about the police force. We explored and utilized Natural Language Processing techniques like Sentiment Analysis, Opinion Mining, and novel ways to detect and capture the most relevant expressions on the subject of law enforcement.
A twitter Stream is periodically opened for each state and tweets are monitored for the presence of police related keywords. We then try to understand if the sentiments pertain to the Police force. We do this using Stanford NLP group's python library, stanza, to access their Java Stanford CoreNLP software. We used three specific annotators from the CoreNLP -- Parts of Speech(pos), Named Entity Recognition(ner) and Dependancy Parsing(depparse). We mainly used 3 cognations provided by 'depparse' -- 'amod', 'dobj' and 'nsubj' along with Microsoft Azure Cognitive's function to find keywords, which came in handy to filter out tweets relevant to the police but those that did not contribute towards the sentiment analysis eg: reports by news channels. 'pos' was used to further weed out irrelevant tweets. If we were satisfied that the tweet is an opinion on cops, we go ahead and perform sentiment analysis using Microsoft's Text-Analytics Cognitive Service. We then feed this to a MongoDB database which maintains a window of recent tweets, and latest example tweets for each category.
The dashboard is a minimalistic interface where you can view a choropleth map of the USA and running average positive, negative, and neutral sentiments for every state.
Our website seamlessly provides a one-stop solution to more insights on the broader geographical region of the USA. Sample tweets are displayed on the website for each category - positive, negative, neutral - sentiments pertaining to the subject of the police force. This way, people can easily find the trends
Our application hopefully provides a #fast and #easy way to have a transparent, intelligent, and #straightforward dissemination of crucial information; to pop the bubble we may be surrounded by, and build a more informed and responsible community.
This was the first time for all of us to be working with NLP. Understanding concepts of Linguistics was something new and eye-opening.
Microsoft's Azure Sentiment Analysis can give a sentiment score for any type of sentence. Many tweets about the police tend to be neutral but skewed to positive or negative from a plain language level. We focused on trying our best to consider sentiments with the police as the subject. Our novelty includes Dependancy Parsing (depparse) and Parts of Speech(pos) analysis and keyword dependency. We also tried Azure's Opinion Mining service to improve accuracy but experiments showed that most tweets are very short and unstructured to apply this technology. We referred the methods used in the study here https://www.ncjrs.gov/pdffiles1/nij/grants/205619.pdf to guide us in building a more intelligent solution.
Another major challenge was getting location specific information on these sentiments. Most Tweets do not contain a geotag. For this we basically searched by each state with bounding boxes/circular search regions, to analyze location specific tweets. A following challenge was that we could not filter tweepy streams with more both location and keyword. For this sake, we implemented a loop which basically spends time analyzing tweets from each state, checks if it's cop related, and analyzes that then.
We thought of trying to train a model to help classify better but we did not have enough data and the Twitter API rate limits were not big enough for the purpose. We also did not want to break Twitter's policy by attempting to using a Twitter scraping library.
Improving on the NLP architecture, studying more about ongoing research in the same sector and try to implement additional techniques.