Telco Project

[Objectives]] [Project Description] [Project Planning] [Data Dictionary] [Data Acquire and Prep] [Data Exploration] [Modeling] [Conclusion] [Steps to Reproduce]

Objectives:

Document the data science pipeline. Ensure the findings are presented clearly and the documentation is clear enough for independent reproduction
Create modules that can be downloaded for the sake of reproducibility.

Project Description and goals:

The goal is to use data to find and explore predictive factors of churn.
Ultimately we hope to use these factors to drive actions which help to maintain a strong customer base and drive profits.

Questions:

Generally we would ask what relationships that might affect churn?

Is there a distinguishable relationship between household size and churn?

This required some feature engineering.

Is there a relationship between churn and paperless billing?

Is there a correlation between customer duration and monthly charges?

If so, how strong is it?

Due total charges have a relationship with churn?

Data Dictionary:

Specify Target add definintions

Variable Name	Data Type	Categorical/Numerical
tenure	int64	Numerical
monthly_charges	float64	Numerical
total_charges	float64	Numerical
gender_encoded	int64	Categorical
partner_encoded	int64	Categorical
dependents_encoded	int64	Categorical
phone_service_encoded	int64	Categorical
paperless_billing_encoded	int64	Categorical
churn_encoded	int64	Categorical
multiple_lines_No_phone_service	uint8	Categorical
multiple_lines_Yes	uint8	Categorical
online_security_No_internet_service	uint8	Categorical
online_security_Yes	uint8	Categorical
online_backup_No_internet_service	uint8	Categorical
online_backup_Yes	uint8	Categorical
device_protection_No_internet_service	uint8	Categorical
device_protection_Yes	uint8	Categorical
tech_support_No_internet_service	uint8	Categorical
tech_support_Yes	uint8	Categorical
streaming_tv_No_internet_service	uint8	Categorical
streaming_tv_Yes	uint8	Categorical
streaming_movies_No_internet_service	uint8	Categorical
streaming_movies_Yes	uint8	Categorical
contract_type_One_year	uint8	Categorical
contract_type_Two_year	uint8	Categorical
internet_service_type_Fiber_optic	uint8	Categorical
internet_service_type_None	uint8	Categorical
payment_type_Credit_card_(automatic)	uint8	Categorical
payment_type_Electronic_check	uint8	Categorical
payment_type_Mailed_check	uint	Categorical

Procedure:

Planning:

Our plan is to follow the data science pipeline best practices. The steps are included below. Ultimately we are buil

Acquisition:

An aquire.py file is created and used. It aquires the data from the database then saves it a .csv file locally (telco.csv). Also it outputs simple graphs of the counts of unique values per variable in order to give a quick visual of whether or not the data is going to be categorical or not.

Preparation:

A prepare.py file is created and used. Here the data is cleaned. Categorical columns are encoded. Numerical columns are designated as floats if there are no nulls. The columns with nulls are noted to be treated at the next step. The results of this step are saved into a csv file (telco_clean.csv).

Exploration and Pre-processing:

A preprocess.py file is created and used. Here we split our data into subsets which are train, validate and test repsectively. From here we address the colums with null values. We take the mean of the non null values in each column and impute them as the null values in the respective columns. This process is done independtley with train, validate and test in order to avoid any "data poisoning". From here we start doing a deep dive into exploration on the train dataset. We ask questions of our data and create graphs in order to better understand our data and ask better questions. We then formulate those questions into hypothesises and do some statisitical tests to find the answers to our questions.

Modeling:

Here we use select various machine learning algorithms from the sklean library to create models. Once we have our models we can further vary our hyperparmeters in each model. From here we

Delivery:

A final report is created which gives a highlevel overview of the the process.

Explanations for Reproducibility:

In order to repoduce these you will need a env.py file which contains host, username and password creditials to access the sql server. The remaining files are availble within my github repo. If you clone this repo, add a env.py file in the format shown below you will be able to reproduce the outcome. As and aside the random state is included in the file. If you were to change this your results my slightly differ.

host='xxxxx'
username='xxxxxx'
password='xxxxxx'
## Where the strings are your respective credentials

Executive Summary:

Conclusion:

We beat the baseline with our models.

Hence, their predictive power is useful.

More data might highlight some interesting relationships.

Specific Recommendations:

It would be nice to obtain more quantitative data related to the projected disposable income of each household.

Actionable Example:

Offer a Telco credit card. In doing so, we are able to collect more quantitative information on credit ratings and household income. This could give us insights on the projected disposable income of each household. The goal being to maximize profits by the data gained and perhaps offering incentives that diminish churn.

Closing Quote:

“Errors using inadequate data are much less than those using no data at all.” (Charles Babbage, English Mathematician)

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
README.md		README.md
User guide and tutorial — seaborn 0.11.2 documentation.html		User guide and tutorial — seaborn 0.11.2 documentation.html
acquire.py		acquire.py
final_report.ipynb		final_report.ipynb
model.ipynb		model.ipynb
model.py		model.py
prepare.ipynb		prepare.ipynb
prepare.py		prepare.py
preprocess.ipynb		preprocess.ipynb
preprocess.py		preprocess.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Telco Project

Objectives:

Project Description and goals:

Questions:

Data Dictionary:

Procedure:

Planning:

Acquisition:

Preparation:

Exploration and Pre-processing:

Modeling:

Delivery:

Explanations for Reproducibility:

Executive Summary:

Conclusion:

Specific Recommendations:

Actionable Example:

Closing Quote:

About

Releases

Packages

Languages

mackenr/Classification_Telco

Folders and files

Latest commit

History

Repository files navigation

Telco Project

Objectives:

Project Description and goals:

Questions:

Data Dictionary:

Procedure:

Planning:

Acquisition:

Preparation:

Exploration and Pre-processing:

Modeling:

Delivery:

Explanations for Reproducibility:

Executive Summary:

Conclusion:

Specific Recommendations:

Actionable Example:

Closing Quote:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages