Analysis of Value of Bank Customers

I had two datasets:
First dataset:

Date
Account ID Number
Opening Date of Account
Account Status: Active or Close
Account Type
Account Balance at the end of the day

Second dataset:

Account ID Number
Transaction Amount
Transaction Category
Transaction Time

Data Primary Analysis

I found one null value, I dropped it since it had no information.

Feature Introduction: I introduced 6 features based on my knowledge of bank transactions.

Feature 1: The total number of transactions for each account in these three months.
Feature 2: The amount average of each transaction.
Feature 3: The variance of amount of the transaction for each customer.
Feature 4: The average of amount of account balance at end of each day.
Feature 5: Account balance variance
Feature 6: The duration of the account activity, which includes two parts: from the day of account opening to the beginning of the dataset, plus the active dates of accounts in the dataset. I should convert system date from Gregorian to Jalali to make dataset dates and opening dates of the same type.

Scaling

I scaled financial features using log function and min-max scaling. (I test the model without the scaling I realized that scaling ends in a better clustering) I scaled active time duration using min-max scaling.

Concatenating All Features

I made a dataframe from all features.
Then, I used scatter_matrix to visualize the features.

Drop Outliers

I used box plot to show the existence of outliers:

I used a standard score. Then I dropped those data that correspond to standard scores greater than 3. Then, the boxplot become:

Kmean

I used the elbow curve of inertia, to specify the number of clusters:

For clustering kmeans with n=6, we clustered the data. I used the scatter matrix to visualize it:

clusters correspond the label=5,6 reperesnt cusstomers, with :

longer active time
bigger account balance
bigger amount of transaction
greater number of transaction.

Second Kmeans

I did a second kmeans on clusters with label=5,6. Using Elbow curve, I consider n=3.

Then, as we see in this clustering, the cluster with label=2 corresponds, to customers with:

bigger account balance
bigger amount of transaction

DBSCAN

As a second method, I used DBSCAN. I used KNN to specify the value of epsilon:

DBSCAN on datas resulted in following clustering:

Clustering wasn't successful in this method.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Final.ipynb		Final.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Analysis of Value of Bank Customers

Data Primary Analysis

Scaling

Concatenating All Features

Drop Outliers

Kmean

Second Kmeans

DBSCAN

About

Releases

Packages

Languages

Sedighe-Raeisi/Analysis-of-value-of-bank-customers

Folders and files

Latest commit

History

Repository files navigation

Analysis of Value of Bank Customers

Data Primary Analysis

Scaling

Concatenating All Features

Drop Outliers

Kmean

Second Kmeans

DBSCAN

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages