News Headline Sentiment Analysis

This repository contains the code and resources for the final project titled "Sentiment Distribution Analysis of News Headlines using Natural Language Processing and ANOVA Techniques."

The paper can be viewed here.

Abstract

Mean World Syndrome is a perceived cognitive bias in which people tend to see the world as cruel, a feeling that is amplified by repeated negativity in media. However, whether or not the media tends to trend more negatively in their reporting is unknown. This project aims to consider different variables that may affect the sentiment of news headlines in order to understand whether there is a tendency for negativity and how it varies across different news categories and publishers.

Notable Findings

The below graphic displays the polarities found in the headlines of specific publishers. The density and dot plots are displayed side-by-side. We find that all distributions appear to be relatively symmetric with similar spreads, except for the New York Times. Polarity is calculated on a range between -1 and 1, used to measure negative and positive sentiments, respectively.

The New York Times has a slightly greater polarity mean as it is shifted to the right of all the other density graphs. As the box plots show, noting that all outliers have already been removed, the publishers all have varying variances. For example, Talking Points Memo, Reuters, National Review, Fox News, CNN, Breitbart, and Atlantic all are centered on a polarity of 0, with almost no variance at all. New York Times, New York Post, Guardian, and Buzzfeed News have a similar range, mostly around 0.25. However, the other publishers all have a relatively high variance, with Business Insider having a range nearing 0.75.

Since our dataset meets the independent condition (there are likely more than one million news articles from 2015-2017), is generated from observation, and appears to be approximately normal (distributions and also generated from a significantly large dataset), we continue with a one-way ANOVA test of homogeneity. The results of the ANOVA can be found below.

DF	Sum Sq	Mean Sq	F-Value	P
Category	14	78.4	5.5992	97.592
Residuals	142551	8178.6	0.0574

As we can see, we have a relatively high F-value, which suggests that the means are not necessarily all equal. This is furthered by the p-value of 2.2E-16, which is less than 0.05; thus, we reject the null hypothesis. There is statistical evidence that one of the news headline categories does not have the same mean polarity.

Conducting a Fisher LSD procedure aids us in identifying which publishers have similar means. The graphic below shows this relationship.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
whitepaper		whitepaper
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
analysis.R		analysis.R
graphs.R		graphs.R
header.png		header.png
sentiment.ipynb		sentiment.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

News Headline Sentiment Analysis

Abstract

Notable Findings

License

About

Languages

License

athanzxyt/newsheadline-sentiment

Folders and files

Latest commit

History

Repository files navigation

News Headline Sentiment Analysis

Abstract

Notable Findings

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages