WeRateDogs Twitter Handle Analysis

Overview

WeRateDogs is a Twitter account that rates people's dogs with a humorous comment about the dog. The account was started in 2015 by college student Matt Nelson, and has received international media attention both for its popularity and for the attention drawn to social media copyright law when it was suspended by Twitter for breaking these aforementioned laws. Read more
The main objective of this project is data wrangling. In this project, I did web scraping using the Request library and Tweepy. I also performed little exploratory and explanatory analysis, found insights and suggested ways to increase tweet retweeting.

Exploratory Analysis ¹

Data Gathering:

This project required gathering three data sets. The method used to gather each data was different and are as follows.

Twitter archive file: This can be downloaded manually or programmatically with the use of the Request library
The tweet image predictions: This can only be downloaded programmatically using the Request library because the file image_predictions.tsv is hosted on Udacity's servers and cannot be accessed manually.
Tweets: Each tweet's retweet count and favorite ("like") count at minimum, and any additional data found to be interesting are scraped. This is done by:
- Extracting the tweet IDs in the WeRateDogs Twitter archive and store in another file (tweet_id.txt)
- Quering the Twitter API for each tweet's JSON data using Python's Tweepy library and store the data in another file (tweet_json.txt)

Data Quality Issues

In the archive table

Change the datatype for some of the columns e.g timestamp
A lot of missing data in the features
Missing values represented as None
Expanded_url containing more than one url

In the image table

Lowercase for P1, P2, and P3 sometimes
Text column not properly formatted

In the tweet table

Extract the date from Created_at column
Rename the Created_at column as Timestamp to bridge uniformity

Data Tidiness

P1, P2, and P3 should be formatted properly in the image table
Remove html tags form the source column in the archive table
Tweet_id in archive table duplicated in image and tweet tables

A new data set named 'twitter_archive_master' was produced by merging the three data sets named above, on tweet_id. Read more

Explanatory Analysis ²

Insights

Favorite count and retweet count has been found to reach their peaks in June. This can be rationally attributed to the fact that dog festival normally occur during this period. Followed by this month is January and December for favorite count and retweet count respectively. Third on the list is also December and January (respectively). This may be due to increased festive activities during the perionds
Saturday usually has the highest favorite count followed by Friday. This is probably due to less busy schedules on these days (weekend).
Also, as expected, the correlation between favorite count and retweet count is, positively, very strong (0.86). Hence, favorite tweets are more likey to be retweeted.
On the other hand, the correlation between the each feature (favorite count and retweet count) and numerator rating is and denominator rating is very weak, positive for the former and negative for the latter.

Recommendations

It is prefferable that posts are targeted on Fridays and Saturdays.
Dog events should be hosted around June, December or January.
Another factor should be used in predicting probability of retweeing as the numerator and denominator ratings are not effective.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.ipynb_checkpoints		.ipynb_checkpoints
LICENSE		LICENSE
README.md		README.md
act_report.ipynb		act_report.ipynb
act_report.pdf		act_report.pdf
image_predictions.tsv		image_predictions.tsv
tweet_ids.txt		tweet_ids.txt
tweet_json.txt		tweet_json.txt
twitter_archive_enhanced.csv		twitter_archive_enhanced.csv
twitter_archive_master.csv		twitter_archive_master.csv
we-rate-dogs.jpg		we-rate-dogs.jpg
wrangle_act.ipynb		wrangle_act.ipynb
wrangle_report.ipynb		wrangle_report.ipynb
wrangle_report.pdf		wrangle_report.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WeRateDogs Twitter Handle Analysis

Overview

Exploratory Analysis ¹

Data Gathering:

Data Quality Issues

Explanatory Analysis ²

Insights

Recommendations

Resources

Access on

About

Releases

Packages

Languages

License

NdAbdulsalaam/WeRateDogs_twitter_analysis

Folders and files

Latest commit

History

Repository files navigation

WeRateDogs Twitter Handle Analysis

Overview

Exploratory Analysis 1

Data Gathering:

Data Quality Issues

Explanatory Analysis 2

Insights

Recommendations

Resources

Access on

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Exploratory Analysis ¹

Explanatory Analysis ²

Packages