Module 3 - Project 2 (ML) - Churn Analysis

Victor, Susanna, Vero & Anna

Data Analytics-Part Time Course | Barcelona June 2020

Content

Project Description and Workflow
Organization
Links

Project Description and Workflow

Description

This project is to practice Machine Learning.

Dataset The dataset chosen is data of 10k clients from Bank - Direct Link to kaggle. Note: the dataset contained in this repository (the one you must use) differs from the original kaggle dataset.

Metadata

- RowNumber: corresponds to the record (row) number and has no effect on the output. This column will be removed.
- CustomerId: contains random values and has no effect on customer leaving the bank. This column will be removed.
- Surname: the surname of a customer has no impact on their decision to leave the bank. This column will be removed.
- CreditScore: can have an effect on customer churn, since a customer with a higher credit score is less likely to leave the bank.
- Geography: a customer’s location can affect their decision to leave the bank. We’ll keep this column.
- Gender: it’s interesting to explore whether gender plays a role in a customer leaving the bank. We’ll include this column, too.
- Age: this is certainly relevant, since older customers are less likely to leave their bank than younger ones.
- Tenure: refers to the number of years that the customer has been a client of the bank. Normally, older clients are more loyal and less likely to leave a bank.
- Balance: also a very good indicator of customer churn, as people with a higher balance in their accounts are less likely to leave the bank compared to those with lower balances.
- NumOfProducts: refers to the number of products that a customer has purchased through the bank.
- HasCrCard: denotes whether or not a customer has a credit card. This column is also relevant, since people with a credit card are less likely to leave the bank. (0=No,1=Yes)
- IsActiveMember: active customers are less likely to leave the bank, so we’ll keep this. (0=No,1=Yes)
- EstimatedSalary: as with balance, people with lower salaries are more likely to leave the bank compared to those with higher salaries.
- Exited: whether or not the customer left the bank. This is what we have to predict. (0=No,1=Yes)

Goal

Classify if a customer from a Bank is going to churn or not. What if would be possible to predict if a customer will leave our company?

Workflow

Environment

Data Preparation

2.1 Data Wrangling

 - Explore Data
 - Clean Data
 - Transform Data

2.2 Feature Selection

 - Delete not useful variables
 - Correlation

2.3 Feature Engineering

 - Log transformation
 - MinMaxScaler
 - Creat new columns

Select From Model (Classification Models)

 - LogisticRegression
 - SVM
 - KNeighborsClassifier
 - DecisionTreeClassifier
 - LinearDiscriminantAnalysis
 - GaussianNB
 - RandomForestClassifier
 - MLPClassifier

Train Models (Machine Learning Classification Models)

 - LogisticRegression
 - SVM
 - RFC
 - NeuralNetworks

Ensemble Methods
```
 - XGBoost
 - Stacking
```

Organization

Main Organization of the project thought GitHub Project board:

Organize a Project GitHub board with main tasks and timings
Create a repository including a .gitignore file and a Readme.md
Data Preparation
Modeling
Tuning
Write conclusions
Complete Readme.md

Inside this repository you can find:

CSVs raw data:

- Churn_Modelling.csv

CSVs clean data:

- Churn_Clean.csv
- Churn_Norm.csv
- Churn_norm_LDA.csv
- Churn_LDA.csv

Jupyter notebook of the data wrangling:

- DataPreparation.ipynb
- SelectFromModel.ipynb
- PCA-LDA.ipynb
- ML_logistic_regression_regularized_models.ipynb
- ML_RFC.ipynb
- MODEL_SVM.ipynb
- ML_MLPC.ipynb
- EnsembleLearning.ipynb

Other:

- Gitignore.txt
- README.md

Links

Repository

PowerPoint Presentation

GitHub Project

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Module 3 - Project 2 (ML) - Churn Analysis

Content

Project Description and Workflow

Description

Workflow

Organization

Links

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
.gitignore		.gitignore
Churn_Clean.csv		Churn_Clean.csv
Churn_LDA.csv		Churn_LDA.csv
Churn_Modelling.csv		Churn_Modelling.csv
Churn_Norm.csv		Churn_Norm.csv
Churn_norm_LDA.csv		Churn_norm_LDA.csv
DataPreparation.ipynb		DataPreparation.ipynb
EnsembleLearning.ipynb		EnsembleLearning.ipynb
ML_RFC.ipynb		ML_RFC.ipynb
ML_logistic_regression_regularized_models.ipynb		ML_logistic_regression_regularized_models.ipynb
MODEL_SVM.ipynb		MODEL_SVM.ipynb
PCA-LDA.ipynb		PCA-LDA.ipynb
README.md		README.md

susanalista/project-ml-bank-churn

Folders and files

Latest commit

History

Repository files navigation

Module 3 - Project 2 (ML) - Churn Analysis

Content

Project Description and Workflow

Description

Workflow

Organization

Links

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages