Customer Churn Analysis

Understanding customer churn

Companies usually have a greater focus on customer acquisition and keep retention as a secondary priority. However, it can cost five times more to attract a new customer than it does to retain an existing one. Increasing customer retention rates by 5% can increase profits by 25% to 95%, according to research done by Bain & Company.

Churn is a metric that shows customers who stop doing business with a company or a particular service, also known as customer attrition. By following this metric, what most businesses could do was try to understand the reason behind churn numbers and tackle those factors, with reactive action plans.

If a company is not capable to identify these signals and take actions prior to losing the customer, there is no turning back. The customer churn data is a valuable asset for meaningful insights and to train customer churn models. Learn from the past, and have strategic information at hand to improve future experiences using machine learning.

Quick start

Folder Structure

Customer_Churn_Analysis/
├── Model
│   ├── images   
│   ├── Model_building_with_clean_data.ipynb
│   └── README.md      
├── data
│   ├── Customer_churn_raw.csv
│   ├── Customer_churn_raw.csv
│   └── churn_final.csv
├── data_preprocessing
│   ├── CustomerChurnPrediction.ipynb
│   └── README.md
└── data_visualization   
│   ├── images
│   ├── Data visualization after cleaning.ipynb
│   ├── DataVisualization_BeforeDataCleaning.ipynb
│   ├── High_Level_Overview_of_dataset.ipynb
│   ├── sweet_report.html
│   └── README.md
└── images
│   ├── ...
│   └── customer_churn.jpeg
└── report
    ├── Images
    ├── Customer_Churn_Analysis.pdf
    ├── ieeeconf.cls
    └── main.tex

Installation

Installation of below packages required before running the project

pip install -r requirements.txt

Steps to run the project

Create a folder in local for Above Git Repo and open in termianl to execute below commands-

$ git clone https://github.com/rohit-chandra/Customer_Churn_Analysis.git
Navigate to folder data_visualization and run DataVisualization_BeforeDataCleaning.ipynb in jupyter notebook
Navigate to folder data_preprocessing and run CustomerChurnPrediction.ipynb in jupyter notebook
Navigate to folder data_visualization and run Data_visualization_after_cleaning.ipynb in jupyter notebook
Navigate to folder Model and run Model_building_with_clean_data.ipynb in jupyter notebook

Dataset

https://archive.ics.uci.edu/ml/datasets/Iranian+Churn+Dataset?TB_iframe=true&width=370.8&height=658.8

Data is about churning telecom customers based on the below features-

Feature Name	Type	Description
Call Failures	Categorical	number of call failures.
Complains	Numerical	binary (0: No complaint, 1: complaint)
Call Failures	Categorical	number of call failures
Subscription Length	Numerical	total months of subscription
Charge Amount	Categorical	0: lowest amount, 9: highest amount
Seconds of Use	Numerical	total seconds of calls
Frequency of use	Numerical	total number of calls
Frequency of SMS	Numerical	total number of text messages
Distinct Called Numbers	Numerical	total number of distinct phone calls
Tariff Plan	Categorical	binary (1: Pay as you go, 2: contractual)
AgeGroup	Categorical	1: younger age, 5: older age
Status	Categorical	binary (1: active, 2: non-active)
Customer Value	Numerical	calculated value of customer
Churn	Categorical	binary (1: churn, 0: non-churn) - Class label

Problem being investigated

Using different multi-classification algorithms that give accurate classifications of the dataset and employ the recognized pattern from the data to make the best decision about each existing customer in the dataset set.

Questions to Investigate?

Main factors that lead customers to the cancellation decision based on the following metrics

Poor service quality
Delay on customer support
tariff plan
Frequency of the complaints
Age group of the customers
Usage frequency

Usually, there is no single reason, but a combination of events that somehow culminate in customer discontinuation.

Data Science Life cycle

Data preprocessing

Handle null values
Handle outliers

Univariate and Bivariate analysis

Infer relations between the input features and the predictor variable

Feature Engineering

Deriving new columns from the exsiting columns
Scaling column values(Using MinMaxScalar() method)

Handle Imbalanced Dataset Sampling techniques such as

Undersampling majority class set
Synthetic Minority Over-sampling Technique (SMOTE)

Feature Importance

Based on correlation we choose the most important features in the entire dataset

Feature Selection

We select the best features using SelectKBest feature Selection technique

Training different multi-classification models with Hyperpameter tuning using gridsearch() with and without sampled data

XGBoost
Naive Bayes - GuassianNB, MultinomialNB, CompleteNB
SVM
Decision Trees

Performance Metrics

Compute different performance meterics like Confusion Matrix, classification report(Precision, Recall, F1 score)

Multicollinearity Problem
- Check for highly correlated features and selectively discard one of them
Compare the performance metrics of different models and derive conclusions to abate the customer churn

Conclusion

Since it is a classification problem we will use the following performance metrics:-

Accuracy
Confusion Matrix
Precision
Recall
F1 score
Receiver operating characteristic(ROC) and Area under the curve(AUC)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Customer Churn Analysis

Contents

Understanding customer churn

Quick start

Folder Structure

Installation

Steps to run the project

Dataset

Problem being investigated

Data Science Life cycle

Conclusion

Contributors

License

About

Releases

Packages

Contributors 4

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 117 Commits
Model		Model
data		data
data_preprocessing		data_preprocessing
data_visualization		data_visualization
images		images
report		report
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

rohit-chandra/Customer_Churn_Analysis

Folders and files

Latest commit

History

Repository files navigation

Customer Churn Analysis

Contents

Understanding customer churn

Quick start

Folder Structure

Installation

Steps to run the project

Dataset

Problem being investigated

Data Science Life cycle

Conclusion

Contributors

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages