Skip to content

we introduced some features according to their account holding and their amount of their transition and the active time length of their account. Then, we clustered customers using KMeans and DBSCAN clustering.

Notifications You must be signed in to change notification settings

Sedighe-Raeisi/Analysis-of-value-of-bank-customers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 

Repository files navigation

Analysis of Value of Bank Customers

I had two datasets:
First dataset:

  • Date
  • Account ID Number
  • Opening Date of Account
  • Account Status: Active or Close
  • Account Type
  • Account Balance at the end of the day

Second dataset:

  • Account ID Number
  • Transaction Amount
  • Transaction Category
  • Transaction Time

Data Primary Analysis

I found one null value, I dropped it since it had no information.

Feature Introduction: I introduced 6 features based on my knowledge of bank transactions.

  • Feature 1: The total number of transactions for each account in these three months.
  • Feature 2: The amount average of each transaction.
  • Feature 3: The variance of amount of the transaction for each customer.
  • Feature 4: The average of amount of account balance at end of each day.
  • Feature 5: Account balance variance
  • Feature 6: The duration of the account activity, which includes two parts: from the day of account opening to the beginning of the dataset, plus the active dates of accounts in the dataset. I should convert system date from Gregorian to Jalali to make dataset dates and opening dates of the same type.

Scaling

I scaled financial features using log function and min-max scaling. (I test the model without the scaling I realized that scaling ends in a better clustering) I scaled active time duration using min-max scaling.

Concatenating All Features

I made a dataframe from all features.
Then, I used scatter_matrix to visualize the features.

image

Drop Outliers

I used box plot to show the existence of outliers:

image

I used a standard score. Then I dropped those data that correspond to standard scores greater than 3. Then, the boxplot become:

image

Kmean

I used the elbow curve of inertia, to specify the number of clusters:

image

For clustering kmeans with n=6, we clustered the data. I used the scatter matrix to visualize it:

image

clusters correspond the label=5,6 reperesnt cusstomers, with :

  • longer active time
  • bigger account balance
  • bigger amount of transaction
  • greater number of transaction.

image

Second Kmeans

I did a second kmeans on clusters with label=5,6. Using Elbow curve, I consider n=3.

image

Then, as we see in this clustering, the cluster with label=2 corresponds, to customers with:

  • bigger account balance
  • bigger amount of transaction

image

DBSCAN

As a second method, I used DBSCAN. I used KNN to specify the value of epsilon:

image

DBSCAN on datas resulted in following clustering:

image

Clustering wasn't successful in this method.

About

we introduced some features according to their account holding and their amount of their transition and the active time length of their account. Then, we clustered customers using KMeans and DBSCAN clustering.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published