Skip to content

Latest commit

 

History

History
47 lines (29 loc) · 2.38 KB

File metadata and controls

47 lines (29 loc) · 2.38 KB

Demo/Hack - Baseball HOF Prediction using R Mlr & DALEX Packages

Overview

This demo shows how MLB Baseball historical data from 1876 - 2017 can be used to craft a learning model to predict Hall Of Fame induction.

mlr & DALEX Packages allow you to craft R predictive models rapidly using sophisticated techniques such as: Bayeasian Optimization, Hyperparameter Tuning with Resampling etc.

mlr Package Information

https://mlr-org.github.io/mlr/

DALEX Package Information

https://github.com/pbiecek/DALEX

Results

Predictive Model Variable Response for All-Star Appearances Note: As the amount of All-Star Appearances increases this increases the weight of the All-Star Appearances, thus increasing the probabiity of Hall Of Fame Induction.

Variable Response

XgBoost Learner Prediction Plot: All-Star Apperances vs Total Bases (TB) Note: The XgBoost Learner (trained model) can be broken down into data visualizations that can aid in model performance analysis. Based on this plot, note the implicit boundary of about 5,000 Total Bases & 9 All-Star Appearances as the threshold of MLB Baseball Hall Of Fame Induction. Plot explanation:

  • Dark triangle - true positive - actual HOFer predicted by the model to be in HOF
  • Dark circle - true negatives - NOT a HOFer predicted by the model NOT to be in HOF
  • White traingle - false positive - actual HOFer predicted by the model NOT to be in HOF
  • White circle - false negative - NOT a HOFer predicted by the model to be in HOF

Variable Response

Prediction Breakdown - Derek Jeter Note: The impact of each of the key features, which make up the final prediction of (0.972 probabilty)

Variable Response

Prediction Breakdown - Willie Mays Note: The impact of each of the key features, which make up the final prediction of (0.992 probabilty)

Predictin Breakdown