This project leverages TensorFlow's Keras API to compile, train, and use a neural network to predict betting lines in a baseball game. Training data is gathered from ESPN (the state, or inputs) and DraftKings (the prediction, the outputs).
Simply put - for any point in a theoretical game of baseball, this model predicts what the standard betting lines for the game should be, and thus, can predict a winner at any point before or during a game.
- Download and unzip one of the pre-trained models in the download section below.
- Replace the value of the
nn_model_path
variable in the ui.py with the path of the folder within the unzipped folder. - Install required dependencies:
- tensorflow:
python -m pip install tensorflow
- tensorflow:
- Run ui.py to run the model and GUI!
The model considers the following 16 inputs when predicting betting lines, in this order:
- Away team record, as a percentage (i.e. 0.8 if the team is 8-2, meaning the team won 8 out of their 10 games total)
- Home team record
- Number of runs the away team has
- Number of runs the home team has
- Number of hits the away team has
- Number of hits the home team has
- Number of errors the away team has
- Number of errors the home team has
- The current inning - i.e. 1.0 for first inning, 2.0 for second, etc. And "0.0" would mean the game is yet to be started and the betting lines are a pre-game line.
- Top or bottom of inning? Top = 0.0, Bottom = 1.0
- Number of outs
- Number of balls in the batter's count
- Number of strikes in the batter's count
- Is there a runner on first base? No = 0.0, Yes = 1.0
- Is there a runner on second base? No = 0.0, Yes = 1.0
- Is there a runner on third base? No = 0.0, Yes = 1.0
The model predicts four distinct betting lines, in the following order. You can read more about what each of these lines mean here.
- The Run Line - the "point spread" between the two teams.
- The Total Line - the prediction for what the under/over would be for the combined number of runs in the game.
- The Away Team's Money Line
- The Home Team's Money Line Using the odds above, particularly the money lines, we can use these to calculate the implied win probability for either team.
Name | Parameters | Description |
---|---|---|
model4 | 7,469 | Trained on ~5,300 examples. Warning, trained on data that likely contained errors. |
model5 | 378,274 | Trained on 5,473 examples. Warning, trained on data that likely contained errors. |
model8 | 378,274 | Trained on 14,594 examples |
These are .jsonl
files. Each line is self-contained JSON object with both the state (game scenario) and real-world observed betting line information.
Number of Examples | Size | Description |
---|---|---|
5,473 | Warning, likely contains errors. | |
16,515 | 2 MB | Warning, likely contains errors. |
14,594 | 1.7 MB | |
41,037 | 4.8 MB |
This repo contains the following programs:
- A program for capturing training data from ESPN and DraftKings, written in .NET 7
- A python script to assemble, compile, train, and save a TensorFlow Keras neural network
- A python program to leverage a pre-trained model to allow you to predict for various scenarios, leveraging a pre-trained model
- When a batter walks, ESPN will mark it with 4 balls in the count AND a man on second temporarily. If there are 4 balls and a man is on, count it as 0 balls.