Download and unzip the play-by-play files from Retrosheet for any and all seasons you're interesting in and unzip them.
Clone this repo:
git clone https://github.com/TimeMagazine/perfect-games.git && cd perfect-games
npm install
Crunch through those files calculating consecutive outs from the first batter for each pitcher in each game:
node crunch.js /path/to/retrosheet/data
The script will recursively walk through whatever directory you feed it, looking for files with an extension of .EVA
or .EVN
, so don't worry if different seasons are in different files
The previous step outputs a large file called all.json
with info for each game like so:
{
"id": "FLO201005290",
"info": {
"visteam": "PHI",
"hometeam": "FLO",
"date": "2010/05/29"
},
"starters": {
"away": [
"hallr001",
"\"Roy Halladay\"",
27
],
"home": [
"johnj009",
"\"Josh Johnson\"",
1
]
}
}
Do condense this down to a distribution of consecutive outs, just run:
node index.js distribute