Skip to content

Latest commit

 

History

History

data

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

Data collection

This folder contains all NBA game logs (since 1950) and play-by-play data (since 1997).

Thanks to Swar Patel and all those who have contributed to the nba_api python package. It makes grabbing stats.nba.com data incredibly easy.

There is one hurdle to the data collection: The API for stats.nba.com is rate-limited, meaning you can only make a certain number of requests in a given time period. (I don't know what the number or the length of time is.) To get around this, I made sure to not to make the same call twice in my python code by checking if the output file existed already. A library like requests_cache for python can also help with this process.

Still, it took a couple of days to get all 27,000-plus data files. So I posted them here for others to use.

The code that gathered the data is in two python scripts.

Game logs

01-get-game-logs.ipynb writes one game log for each season.

For every game, there is a line for each team. Here's what the top of the 2018-19 log looks like:

SEASON_ID TEAM_ID TEAM_ABBREVIATION TEAM_NAME GAME_ID GAME_DATE MATCHUP WL MIN FGM FGA FG_PCT FG3M FG3A FG3_PCT FTM FTA FT_PCT OREB DREB REB AST STL BLK TOV PF PTS PLUS_MINUS VIDEO_AVAILABLE
22018 1610612744 GSW Golden State Warriors 21800002 2018-10-16 GSW vs. OKC W 240 42 95 0.442 7 26 0.269 17 18 0.944 17 41 58 28 7 7 21 29 108 8 1
22018 1610612760 OKC Oklahoma City Thunder 21800002 2018-10-16 OKC @ GSW L 240 33 91 0.363 10 37 0.27 24 37 0.649 16 29 45 21 12 6 15 21 100 -8 1
22018 1610612755 PHI Philadelphia 76ers 21800001 2018-10-16 PHI @ BOS L 240 34 87 0.391 5 26 0.192 14 23 0.609 6 41 47 18 8 5 16 20 87 -18 1
22018 1610612738 BOS Boston Celtics 21800001 2018-10-16 BOS vs. PHI W 240 42 97 0.433 11 37 0.297 10 14 0.714 12 43 55 21 7 5 15 20 105 18 1

Play-by-play

02-get-play-by-play.ipynb grabs the play-by-play data for every game in every season.

The stats.nba.com API is rate-limited, though it remains unclear to me how many requests it takes to hit the limit. This code doesn't hit the API if a file containing that game ID's play-by-play data already exists, to avoid repeated requests.

Here's what the top of play-by-play looks like for the opening game of the 2018-19 season between the Boston Celtics and Philadelphia 76ers:

GAME_ID EVENTNUM EVENTMSGTYPE EVENTMSGACTIONTYPE PERIOD WCTIMESTRING PCTIMESTRING HOMEDESCRIPTION NEUTRALDESCRIPTION VISITORDESCRIPTION SCORE SCOREMARGIN
21800001 2 12 0 1 8:03 PM 12:00
21800001 4 10 0 1 8:03 PM 12:00 Jump Ball Horford vs. Embiid: Tip to Simmons
21800001 7 2 1 1 8:03 PM 11:40 MISS Covington 27' 3PT Jump Shot
21800001 8 4 0 1 8:04 PM 11:40 CELTICS Rebound
21800001 10 2 1 1 8:04 PM 11:15 MISS Tatum 25' 3PT Jump Shot