This project aims to predict the Customer Lifetime Value (CLTV) using K-Means Clustering on the Online Retail Dataset available on Kaggle.
CLTV is a metric that helps companies determine the total amount of money a customer is expected to spend on their products or services throughout their lifetime. This project will use K-Means clustering algorithm to segment customers based on their buying behaviour and predict the CLTV of each segment.
The Online Retail Dataset contains 541,909 transactions made by customers of an online retail store based in the UK from 2009 to 2011. The dataset has the following attributes:
- InvoiceNo: Unique identifier of each transaction
- StockCode: Unique identifier of each product
- Description: Description of each product
- Quantity: Number of products purchased in each transaction
- InvoiceDate: Date and time of each transaction
- UnitPrice: Price of each product
- CustomerID: Unique identifier of each customer
- Country: Country of each customer
The dataset can be downloaded from Kaggle.
The project is implemented in Python 3.x using the following libraries:
- pandas
- numpy
- scikit-learn
- seaborn
- matplotlib
To install the required libraries, run the following command:
pip install pandas numpy scikit-learn seaborn matplotlib
The project has the following structure:
- data: Contains the Online Retail dataset in CSV and ZIP format.
- notebooks: Contains the Jupyter notebooks used in the project.
- data_exploration.ipynb explores the dataset and visualises the data distribution.
- data_preprocessing.ipynb preprocesses the dataset for modeling.
- cltv_prediction.ipynb implements K-Means clustering and predicts the CLTV of each customer segment.
- src: Contains the Python script for CLTV prediction.
- README.md: This file.
- requirements.txt: Contains the list of required libraries.
To run the project, follow these steps:
- Install the required libraries
- Download the Online Retail dataset from Kaggle and place it in the data directory.
- Run the cltv_prediction.ipynb notebook to predict the CLTV of each customer segment.
This project is licensed under the MIT License - see the LICENCE file for details.