This folder contains procedures, classes and functions that I use frequently on my Data Science projects. The notebooks contain examples of applications. This is a permanent work in progress by definition.
- Reference Guide and Notes: notes on useful strategies and techniques for Data Science. The techniques described in this file are demonstrated on the individual notebooks. The purpose of this file is to have a quick reference guide for the contents of the other files.
- Quick Baseline Analysis: quick data cleaning and local validation with Random Forest model for first initial iteration.
- Exploratory Data Analysis: mostly templates for different techniques/plots used on EDA.
- Model Selection and Hyperparameter Tuning: model selection techniques and hyperparameter tuning strategis for Gradient Boosting Models (XGBoost, LightGBM).
- Building Pipelines: custom classes based on BaseEstimator (Scikit-Learn) and some pipelines templates for Machine Learning.