Classification is a supervised learning technique used to predict discrete traget variables using set of features/attributes.
The aim of the project is to use client data to predict if the client will subscribe to term deposit or not.
- Data Visualization
- Data Transformation - Encoding
- Heatmaps for correlation
- Feature Engineering
- Model Building
- Predictive Modelling
- Logistic Regression
- Decision Tree Classifier
- Python
- Pandas
- matplotlib and seaborn
- sci-kit learn
Term deposits are a major source of income for a bank. A term deposit is a cash investment held at a financial institution. The bank has various outreach plans to sell term deposits to their customers such as email marketing, advertisements, telephonic marketing and digital marketing.
Telephonic marketing campaigns still remain one of the most effective way to reach out to people. However, they require huge investment as large call centers are hired to actually execute these campaigns. Hence, it is crucial to identify the customers most likely to convert beforehand so that they can be specifically targeted via call.
Client personal data such as age of the client, their job type, their marital status, etc along with the call information such as the duration of the call, day and month of the call, etc is used to predict if the client will subscribe to term deposit or not.
We use Classification to predict the same.
- Import the required modules for Python.
- Import the training data as a Data Frame.
- Print the head of the data.
- The basic
info
is printed. - The column names of attributes is also printed.
'ID'
column is dropped.'subscribed'
is indentified as the Target Variable.- Countplot of
'subscribed'
is plotted.
- Stacked Barplot of
'Job'
vs Frequency is plotted such that it shows how many have subscribed or not.
-
LabelEncoder
fromsklearn.preprocessing
is used to convert all categorical variables to numeric variables. -
Heatmap is plotted to check the correlation among the variables.
-
Correlation Table is also created.
-
Dependent and Independent Variables are separated.
-
train_test_split
fromsklearn.model_selection
is used to split the dependent and independent variables into Training and Validation sets.
LogisticRegression
fromsklearn.linear_model
is initialized usinglogi
.X_train
andy_train
are fit tologi
.- Prediction of
y_val
is done by applyingpredict
onX_val
. - The model scores are calculated.
DecisionTreeClassifier
fromsklearn.tree
is initialized suingdtc
.X_train
andy_train
are fit todtc
.- Prediction of
y_val
is done by applyingpredict
onX_val
. - The model scores are calculated.
- Test data is imported as a Data Frame.
- Feature Engineering and Data Transformation is done on Testing data.
predict
is used to obtain predictions.- csv file of predicted values is created as
'submission.csv'
.
-
Logistic Regression
- accuracy = 0.8829383886255924
-
Decision Tree Regressor
- accuracy = 0.8924170616113745
- Some optional data exploration is done on test data to understand it better.
https://www.linkedin.com/in/naveen-a-902a671b3/
Internshala Data Science Course.