How do Twitter users feel about Python and R?

Catarina Silva

September 20, 2020

Python and R are popular programming languages for statistics. There is a vast amount of discussion around what's the difference between them, which language is the best for data science and how to choose which one to learn first.

I wanted to have an overview of the Twitter community opinion behind each language. So I used "python AND language" and "rstats" for a Twitter search query for Python and R, respectively.

An interesting point I can conclude from this little exercise is that people think that both R and Python are easy (although the word "difficult" was twitted twice as much as "easy" for Python tweets and it didn't pop up in the top words for R!).

Top sentiments for Python

Top sentiments for R

Word cloud of top Tweets for Python	Word cloud of top Tweets for R

Here is how I conducted the analysis:

Load the packages


library("rmarkdown")
library("rtweet")
library("dplyr")
library("tidyr")
library("tidytext")
library("textdata")
library("ggplot2")
library("wordcloud2") 
library("webshot")
library("htmlwidgets")

Load data and process each set of tweets into tidy text


python_2020_09_19_clean = read.csv("./python_2020_09_19_clean.csv", header = T)
r_2020_09_20_clean = read.csv("./r_2020_09_20_clean.csv", header = T)

tweets_python_2020_09_19_clean = python_2020_09_19_clean %>% select(screen_name, text)
tweets_r_2020_09_20_clean = r_2020_09_20_clean %>% select(screen_name, text)

Use pre-processing text transformations to clean up the tweets

Remove http elements manually


tweets_python_2020_09_19_clean$stripped_text1 = gsub("http\\S+","",tweets_python_2020_09_19_clean$text)

tweets_r_2020_09_20_clean$stripped_text1 = gsub("http\\S+","",tweets_r_2020_09_20_clean$text)

Remove punctuation and add id to each tweet (note: unnest_tokens() converts to lower case)


tweets_python_2020_09_19_clean_stem = tweets_python_2020_09_19_clean %>% 
  select(stripped_text1) %>%
  unnest_tokens(word, stripped_text1)

tweets_r_2020_09_20_clean_stem = tweets_r_2020_09_20_clean %>% 
  select(stripped_text1) %>%
  unnest_tokens(word, stripped_text1)

Remove stop words from your list of words (e.g. is, on...)


cleaned_tweets_python_2020_09_19_stem = tweets_python_2020_09_19_clean_stem %>%
  anti_join(stop_words)

cleaned_tweets_r_2020_09_20_stem = tweets_r_2020_09_20_clean_stem %>%
  anti_join(stop_words)

Have a look at the top 40 words


top_words_tweets_python_2020_09_19 = cleaned_tweets_python_2020_09_19_stem %>% 
  count(word, sort=TRUE) %>%
  top_n(40) %>%
  mutate(word = reorder(word,n))

cloud_phyton = wordcloud2(top_words_tweets_python_2020_09_19, size = 2, color = "grey", shape = "circle")
cloud_phyton

top_words_tweets_r_2020_09_20 = cleaned_tweets_r_2020_09_20_stem %>% 
  count(word, sort=TRUE) %>%
  top_n(40) %>%
  mutate(word = reorder(word,n))

cloud_r = wordcloud2(top_words_tweets_r_2020_09_20, size = 1, color = "grey", shape = "circle")
cloud_r

Get sentiment lexicons


get_sentiments(lexicon = c("bing", "afinn", "loughran", "nrc")) %>% filter(sentiment=="positive")
get_sentiments(lexicon = c("bing", "afinn", "loughran", "nrc")) %>% filter(sentiment=="negative")

bing_python = cleaned_tweets_python_2020_09_19_stem %>%
  inner_join(get_sentiments("bing")) %>%
  count(word, sentiment, sort=TRUE) %>%
  ungroup()

bing_r = cleaned_tweets_r_2020_09_20_stem %>%
  inner_join(get_sentiments("bing")) %>%
  count(word, sentiment, sort=TRUE) %>%
  ungroup()

Plot sentiment analysis for top 5 words


bing_python %>%
  group_by(sentiment) %>%
  top_n(5) %>%
  ungroup() %>%
  mutate(word = reorder(word,n)) %>%
  ggplot(aes(word, n, fill=sentiment)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~sentiment, scales = "free_y") +
  labs(title="Python tweets",
       y="",
       x=NULL) +
  coord_flip() + theme_bw()

bing_r %>%
  group_by(sentiment) %>%
  filter(word != "plot") %>% 
  filter(word != "cloud") %>% 
  filter(word != "shiny") %>% 
  top_n(5) %>%
  ungroup() %>%
  mutate(word = reorder(word,n)) %>%
  ggplot(aes(word, n, fill=sentiment)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~sentiment, scales = "free_y") +
  labs(title="R tweets",
       y="",
       x=NULL) +
  coord_flip() + theme_bw()

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
figures		figures
.gitignore		.gitignore
README.md		README.md
python_2020_09_19_clean.csv		python_2020_09_19_clean.csv
r_2020_09_20_clean.csv		r_2020_09_20_clean.csv
sentiment_analysis.Rmd		sentiment_analysis.Rmd
sentiment_analysis.html		sentiment_analysis.html
twitter_sentiment_python_r.Rproj		twitter_sentiment_python_r.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

How do Twitter users feel about Python and R?

About

Releases

Packages

Languages

CatarinaNSSilva/twitter_sentiment_python_r

Folders and files

Latest commit

History

Repository files navigation

How do Twitter users feel about Python and R?

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages