EECS731Project4

Major League

NFL, MLB, NBA and Soccer scores

Set up a data science project structure in a new git repository in your GitHub account
Pick one of the game data sets depending your sports preference https://github.com/fivethirtyeight/nfl-elo-game https://github.com/fivethirtyeight/data/tree/master/mlb-elo https://github.com/fivethirtyeight/data/tree/master/nba-carmelo https://github.com/fivethirtyeight/data/tree/master/soccer-spi
Load the data set into panda data frames
Formulate one or two ideas on how feature engineering would help the data set to establish additional value using exploratory data analysis
Build one or more regression models to determine the scores for each team using the other columns as features
Document your process and results
Commit your notebook, source code, visualizations and other supporting files to the git repository in GitHub

Data Source and description

https://projects.fivethirtyeight.com/nba-model/nba_elo.csv nba_elo.csv contains game-by-game Elo ratings and forecasts back to 1946.

Two main parts of this project

Data exploration and Regression

Part 1: Data exploration

We first clean the data set by removing all the null data. We calculate the correlation between any two factors. Then we calculate total number of games played by each team and find the winner of each game.

The winner of each game is used for regression purpose.

Part 2: Regression

To better leverage the data set, we add three additional factors for regression. They are the winner indicator, difference between post match ratings of the two teams, and product of probabilities.

We come up with two different models. one is the random forest and the other is linear regression. We found that the linear regression model can substantially enhance the performance.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
NBA_score_prediction.ipynb		NBA_score_prediction.ipynb
README.md		README.md
nba_elo.csv		nba_elo.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EECS731Project4

Data Source and description

Two main parts of this project

Part 1: Data exploration

Part 2: Regression

About

Releases

Packages

Languages

xionggj001/EECS731Project4

Folders and files

Latest commit

History

Repository files navigation

EECS731Project4

Data Source and description

Two main parts of this project

Part 1: Data exploration

Part 2: Regression

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages