End of Coursera's Google Advanced Data Analitycs course.
In the project, the main goal is to analyze the data and to build a model that predicts whether or not an employee will leave the company. The notebook is divided in 4 steps:
- Package import and dataset load
- Data exploration and visualization
- Understand the variables
- Clean the dataset (missing data, redundant data, outliers)
- Boxplots, scatterplots, histograms and heatmaps
- Model building in 2 methods
- Model approach A: Logistic Regression
- Model approach B: Tree-based Machine Learning
- Results and evaluation
- Summary of model results
- Conclusion and next steps