meanr

Version: 0.1-5
URL: https://github.com/wrathematics/meanr
License: BSD 2-Clause
Author: Drew Schmidt

meanr is an R package performing sentiment analysis. Its main method, score(), computes sentiment as a simple sum of the counts of positive (+1) and negative (-1) sentiment words in a piece of text. More sophisticated techniques are available to R, for example in the qdap package's polarity() function. This package uses the Hu and Liu sentiment dictionary, same as everybody else.

meanr is significantly faster than everything else I tried (which was actually the motivation for its creation), but I don't claim to have tried everything. I believe the package is quite fast. However, the method is merely a dictionary lookup, so it ignores word context like in more sophisticated methods. On the other hand, the more sophisticated tools are very slow. If you have a large volume of text, I believe there is value in getting a "first glance" at the data, and meanr allows you to do this very quickly.

Installation

The stable version is available on CRAN:

install.packages("meanr")

The development version is maintained on GitHub:

remotes::install_github("wrathematics/meanr")

Example Usage

I have a dataset that, for legal reasons, I can not describe, much less provide. You can think of it like a collection of tweets (they are not tweets). But take my word for it that it's real, English language text. The data is in the form of a vector of strings, which we'll call x.

x = readRDS("x.rds")

length(x)
## [1] 655760

sum(nchar(x))
## [1] 162663972

library(meanr)
system.time(s <- score(x))
##  user  system elapsed 
## 1.072   0.000   0.285 

head(s)
##   positive negative score  wc
## 1        2        0     2  32
## 2        5        0     5  29
## 3        4        2     2  67
## 4       12        3     9 203
## 5        8        2     6 101
## 6        4        3     1  99

How It Works

The score() function receives a vector of strings, and operates on each one as follows:

The maximum string length is found, and a buffer of that size is allocated.
The string is copied to the buffer.
All punctuation is removed. All characters are converted to lowercase.
Score sentiment:
- Tokenize words as collections of chars separated by a space.
- Check if the word is positive; if not, check if it is negative; if not, then it's assumed to be neutral. Each check is a lookup up in one of two tables of Hu and Liu's dictionaries.
- If the word is in the table, get its value from the hash table (positive words have value 1, negative words -1) and update the various counts. Otherwise, the word is "neutral" (score of 0).

This is all done in four passes of each string; each pass corresponds to each of the enumerated items above. The hash tables uses perfect hash functions generated by gperf.

Name		Name	Last commit message	Last commit date
Latest commit History 110 Commits
R		R
inst		inst
man		man
src		src
tests		tests
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
.travis.yml		.travis.yml
ChangeLog		ChangeLog
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
README.md		README.md
cleanup		cleanup
configure		configure
configure.ac		configure.ac
configure.win		configure.win

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

meanr

Installation

Example Usage

How It Works

About

Licenses found

Releases

Packages

Contributors 2

Languages

License

Licenses found

wrathematics/meanr

Folders and files

Latest commit

History

Repository files navigation

meanr

Installation

Example Usage

How It Works

About

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages