Skip to content

BorisKamen/Credit-Risk-Modeling

Repository files navigation

A Complete Cycle of Credit Risk Modelling

As a Machine learning project, We aim to use data science techniques to model three important notions in Credit risk management : Probability of Default (PD), Loss Given Default (LGD - Recovery rate) and Exposure at Default (EAD - CCF).

The data set used for this purpose, is a portofolio of 466285 loans open-source data (on Kaggle) and our purpose is to build models that are compliant with Basel II and Basel III Bank's regulations.

The Data set is well described in the DataPreparation file !!! And there is a link to download it. The data are from 2007 - 2014. and we use year 2015 to monotor the obtained PD model.

At the end of this work, we compute the Expected Loss (EL) and the Regulatory Capital (F- IRB) that will be expected by the regulations for this loans portfolio.

RESULTS : We obtain for this loans portfolio with a total amount of 6.67 Billlions (see the file LGD-EAD EL-RegCap)

The Expected Loss is approximatively 7.6 % of the portfolio total amount, that is 511.66 Millions The Regulatory capital is approximatively 5.3 % of the portfolio total amount, that is 357.79 Millions\

The main Data Science techniques used in this project are :

  • Weight of evidence - Information value - Fine classing - Coarse classing - Linear regression - Logistic regression

  • Area Under the Curve - Receiver Operating Characteristic Curve - Gini Coefficient - Kolmogorov-Smirnov - Assessing Population Stability

The work is subdivide as it follows :

I - DATA PREPARATION

        Import, Explore, Preprocess (formatting, dealing with missing values..)

II - PD Model

     - Data Preparation : format Independent / Dependent variables, identification of Good (non-defaulter)/ Bad (defaulter)
     - We will use the Logistic Regression model to estimate the PD
     - A section will cover PD monitoring

III - LGD Model

     - We will use a two stage model : First stage , Logistic Regression and at the secong stage Linear Regression 

- EAD Model

     - We will use a Linear Regression modeling