Stores usually launch offers, but how do they know what kind of offers could we need?
Fortunately they're not spying us when we are talking or thinking about our necessities (or maybe yes). However, one of the many roads to discover deeper relationships between customer necessities and increasing store sales is by finding patterns in data, and that's what Machine Learning accomplishes perfectly.
This Machine Learning subarea refers to the use of Artificial Intelligence algorithms to detect patterns in data sets containing data points that are neither classified nor labeled.
In order to finding groups of customers with common characteristics such as salary range or spending score proximity I will use a clustering algorithm known as k-means. This partitional algorithm is good at dividing groups of shuffled points and will help us to solve the key problem: separating data points into groups.
After that, I'll use a decision tree which implements the CART algorithm in order to finding specific boundaries to classify customers according to labels generated by the k-means algorithm.
Image 1. Original dataset scatter graph[1]
Image 2. Elbow method graph
Image 3. Clusters graph
label | Annual Income (k$) | Spending score (1-100) |
---|---|---|
0🟢 | low | low |
1🟡 | high | high |
2🔴 | medium | medium |
3🟣 | low | high |
4🔵 | high | low |
Image 4. Decision tree diagram
Value it's an array of label values from 0-4. So according to image 3, it can be undestood in the following way:
value = [ label0🟢, label1🟡, label2🔴, label3🟣, label4🔵 ]
label | Annual Income (k$) | Spending score (1-100) |
---|---|---|
0🟢 | <= 38.5 | <= 50.0 |
1🟡 | > 68.5 | > 51.5 |
2🔴 | 38.5 - 68.5 | - |
3🟣 | <= 38.5 | > 50.0 |
4🔵 | > 68.5 | <= 51.5 |
Image 5. Final segmentation
- Python 3. https://www.python.org/downloads/
- Preferred IDE (optional). https://www.jetbrains.com/pycharm/
- Clone current repo
- Place into cloned repo folder
Run the .py files in the following order:
python kmeans.py
: will plot the original data, elbow diagram, kmeans clusters and will generate customer_segmentation.csvpython dt2.py
: will plot decision tree generated by checking customer_segmentation.csv (using scikit)python decision_tree.py
: will display in console decision tree generated by checking customer_segmentation.csv (coded from scratch)
[1]"Mall Customer Segmentation Data", Kaggle.com, 2021. [Online]. Available: https://www.kaggle.com/vjchoudhary7/customer-segmentation-tutorial-in-python?select=Mall_Customers.csv. [Accessed: 04- Nov- 2021].
[2]"1.10. Decision Trees", scikit-learn, 2021. [Online]. Available: https://scikit-learn.org/stable/modules/tree.html. [Accessed: 04- Nov- 2021].
[3]Gordon, J., 2021. Let’s Write a Decision Tree Classifier from Scratch - Machine Learning Recipes #8. [online] Youtube.com. Available at: https://www.youtube.com/watch?v=LDRbO9a6XPU&list=PLOU2XLYxmsIIuiBfYad6rFYQU_jL2ryal&index=9 [Accessed 4 November 2021].