This project presents a complete machine learning pipeline for predicting customer churn in the telecom industry. It demonstrates data preprocessing, exploratory data analysis (EDA), feature engineering, and model evaluation, utilizing the CatBoost classifier for effective prediction.
The dataset used in this project is the "Telco Customer Churn" dataset, which includes various customer attributes, such as demographics, account information, and services subscribed.
- Data Preprocessing: Handling missing values, encoding categorical variables, and standardizing features.
- Exploratory Data Analysis (EDA): Visual and statistical analysis to understand the data distribution and relationships between variables.
- Feature Engineering: Creation of new features to improve model performance and insights into customer behavior.
- Model Building: Utilizing CatBoost, a gradient boosting algorithm, for building a predictive model.
- Model Evaluation: Assessing model performance with metrics like accuracy, precision, recall, F1 score, and AUC.
The project requires Python and the following Python libraries:
pandas
numpy
matplotlib
seaborn
CatBoost
scikit-learn
Instructions on how to set up the environment, run the analysis, and interpret results. Include steps for installing required libraries, executing the script, and any additional setup needed.
pip install pandas numpy matplotlib seaborn catboost scikit-learn