Skip to content

Latest commit

 

History

History
20 lines (12 loc) · 1.15 KB

README.md

File metadata and controls

20 lines (12 loc) · 1.15 KB

IBM Attrition

For this notebook, I use the IMB Attrition dataset (available on Kaggle) to do the following:

  • Compare the predictive performance of Logistic Regression, Random Forest, and XGBoost models using Confusions Matrices and ROC Curves
  • Explore relative feature importance with respect to employee attrition for the XGBoost model using Gini Importances and Permutation_Importance
  • Further explore feature importance through visualization tools PDP Plots, PDP Interaction Plots, and Shapley Waterfall Plots
  • Use insights from the feature importances to perform dimensionality reduction
  • Rerun models on smaller feature matrix

Insights

  • The three models performed similarly on the original feature matrix, with the XGBoost slightly edging out the others
  • Some important features included 'OverTime', 'NumberofCompaniesWorked', and 'BusinessTravel'
  • After simplifying the model by reducing dimensionality, the simple Logistic Regression outperformed the others, which suffered from overfitting to the training data