Welcome to the Customer-conversion-prediction- wiki!
The goal of the Customer Conversion Prediction project is to build a machine learning model that can predict whether a client will subscribe to the insurance based on their demographic and marketing data. The project aims to help the insurance company identify the customers that are most likely to convert, so that they can be targeted via call and the cost of telephonic marketing campaigns can be reduced. The historical sales data provided will be used to train and evaluate the performance of the machine learning models. The analysis of the model will be done to identify the important factors that contribute towards the conversion and the AUROC metric will be used to evaluate the model's performance. The main objective of the project is to develop an accurate and efficient model that can aid the insurance company in improving its sales conversion rate and reducing marketing costs.
For this project, I utilized Google Colab as my integrated development environment (IDE) for programming in Python. Google Colab is a robust tool provided by Google that is well-suited for implementing machine learning algorithms, performing data analytics and cleaning operations, and developing data science models.
We have imported requrired libraries and also loaded dataset
we have used functions like remove duplicates,dropped null values,missing values,check data type and removal of outlires.
For the good model fit and accuret prediction we've explore the data and done Univariate Analysis, Bivariate Analysis, Correlation Check. Data type check and take dummies.
We've check and learn that which type of occupations have more chances to subscribe our term insurance policy.
We've checked correlation between differant variables. We can see here Duration of Call is Highly correlated to our target variable.
we have used Label and one hot encodeing for this features['job', 'marital', 'education_qual', 'call_type', 'mon', 'prev_outcome']
SMOTE Oversampling we are using due to imbalence data
0 39477
1 29496
We used StandardScaler
Based on the results obtained from the evaluation of the three classification models (Logistic Regression, XGBoost, and Decision Tree Classifier) on the given historical data, XGBoost outperformed the other models with the highest accuracy score of 93.54% and the highest AUROC score of 0.986.
This implies that XGBoost is a suitable model for predicting whether a client will subscribe to the insurance or not. It is recommended to deploy this model in the production environment to accurately target potential customers and optimize marketing costs.
However, further analysis is recommended to identify the important features contributing to the model's performance and to fine-tune the model for better results.
The current project has successfully built and evaluated a machine learning model to predict whether a customer will subscribe to an insurance policy. However, there is still room for improvement and further scope in this project, which includes:
Feature Engineering: The current dataset includes limited features, and feature engineering could be performed to extract more relevant features from the given data or other external sources, which could enhance the model's performance.
Hyperparameter Tuning: Hyperparameter tuning of the model could be performed to optimize its performance further. By varying the hyperparameters of the model and evaluating its performance, we could achieve better results.
Deployment: The current model could be deployed in a production environment, integrated with the company's systems, and used to target potential customers effectively.
Exploratory Data Analysis (EDA): Exploratory data analysis could be performed on the data to gain insights into the customers' behavior, which could help in understanding the customers' preferences, demographics, and other characteristics that influence their decision-making.
Model Comparison: In addition to the models evaluated in this project, other classification models could also be implemented and compared to identify the best performing model for this problem.
Regular Maintenance: As the company's customer base grows and changes, the model's performance might degrade. Regular monitoring and maintenance of the model are necessary to ensure it continues to perform effectively.
Overall, these enhancements and improvements could help the insurance company to optimize its outreach efforts and improve the success rate of selling insurance policies to potential customers.