Sequence+

The main goal of Sequence+ is to create a model like Stuff+, Location+, and Pitcher+ that aims to measure the run value of a pitch sequence. Sequence+ will be made using features related to tunneling, and a mix of things that are typically included in Location+ & Stuff+.

General Approach

I have a run expectancy matrix for 2024, that describes roughly how many runs each event in each possible situation is worth. These values are averaged out for each count, which will help isolate this run expectancy value from baserunning situations. Using this and pitch by pitch advanced data from 2020-2023, I am aiming to build a model that can predict the cumulative run expectancy for a given sequence of two pitches. This target variable is refered to as seq_delta_run_exp thoughout the documentation and code.

Features

In order to try and include pitch tunneling in this model, I use the kinematic equations to estimate the location of the baseball in 3D space at the commit point, decision point, release, and over the plate.

See 3D_pitch_location_estimation.md or the feature engineering section of sequence+.ipynb for details on how I am doing this.

Some more features related to pitch sequences and pitch quality were added that would hopefully help explain variance in seq_delta_run_exp. ultimatley, the features below were the ones chosen.

The reason that 9 features were chosen specifically is because of this plot:

Recursive feature selection was done with a step of 1, and as you can see there is a clear 'elbow' in the plot at 9 features. Those 9 features were ultimatley selected for the final model.

Model

The model architecture that I ended up using is a LightGBM Regressor model with a RobustScaler. The RobustScaler() improves performance because it handles outliers better than StandardScaler().

I used optuna to tune hyperparameters to fit a slightly better model that I would guessing them on my own.

Evaluation

Road Map

Feature Engineering
Model Building
Evaluation
Deploy in HuggingFace Dashboard

References

statcast-era-pitches: Used this huggingface dataset to effeciently load dataset of pitches thrown from 2017-present.
Carter Kessinger & Johnny Davis: These guys sparked the idea for using kinematic equations for 3D distances at commit & decision points for a better TunnelScore.
TJ Nestico: He posts lots of great content on X, and I am using a pitch by pitch run expectancy matrix found in his project tjStuff+

Contact

Feel free to reach out to me with any questions or feedback

Email: jensenh87@gmail.com
X: @_holmj_

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
DOCS		DOCS
assets		assets
data		data
models		models
notebooks		notebooks
pipeline/v1.0		pipeline/v1.0
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sequence+

General Approach

Features

Model

Evaluation

Road Map

References

Contact

About

Releases

Packages

Languages

Jensen-holm/Sequence-Plus

Folders and files

Latest commit

History

Repository files navigation

Sequence+

General Approach

Features

Model

Evaluation

Road Map

References

Contact

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages