The code in this repository converts data files in an R package devoted to the Survivor television series from .rda
to .csv
and .json
formats for users who prefer Python or other data sceience tools.
The data comes from the canonical survivoR package created by David Ohm, et al, which contains detailed datasets about the history of the show, including an episode summary, castaway listing, challenge results and vote history, among many others.
scripts/convert_data.py
: This script converts the survivoR data by fetching the latest.rda
files from the source, storing copies locally indata/raw/rda
, and then converting them to comma-delimited text files indata/processed/csv
. A Gihub Actions workflow also runs the script once daily at 8 pm Pacific Time to keep the files fresh during a season, storing data in the repo and also on S3.
The latest version of each table can be downloaded here:
advantage_details
: json, csvadvantage_movement
: json, csvauction_details
: json, csvboot_mapping
: json, csvcastaway_details
: json, csvcastaways
: json, csvchallenge_description
: json, csvchallenge_results
: json, csvchallenge_summary
: json, csvconfessionals
: json, csvepisodes
: json, csvjury_votes
: json, csvscreen_time
: json, csvseason_palettes
: json, csvseason_summary
: json, csvsurvivor_auction
: json, csvtribe_colours
: json, csvtribe_mapping
: json, csvvote_history
: json, csv
Notes: The converted .rda
data files from the original project are stored in this repo's raw/csv
directory. The content of those files won't change — only the file formats. Any value errors can be flagged as issues there. They are typically resolved quickly. Also: Please see the original repo for metadata about the individual files.
- survivor-voteoffs: How did each castaway react to his or her torch getting snuffed? There's data for that.
- survivor-transcripts: Fetching and storing complete transcripts for each episode of the American television show and analyzing the text for keyword/phrase frequency.