Credit risk modelling refers to the use of financial models to estimate losses a firm might suffer in the event of a borrower’s default. Financial institutions used credit risk analysis models to determine the probability of default of a potential borrower. The models provide information on the level of a borrower’s credit risk at any particular time. If the lender fails to detect the credit risk in advance, it exposes them to the risk of default and loss of funds. Lenders rely on the validation provided by credit risk analysis models to make key lending decisions on whether or not to extend credit to the borrower and the credit to be charged. The probability of default, sometimes abbreviated as POD, is the likelihood that a borrower will default on their loan obligations.
The dataset for this credit risk modelling project has been taken from kaggle.
The dataset classifies the credit risk of borrower with respect to various features such as Age , Income , Employment length , Home ownership and other attributes of loan such as Interest Rate , Purpose etc.
In this project, different learning algorithms have been used to find the best algorithm for credit risk prediction. The project also handles the imbalanced data and imputing the missing values.
- K-Nearest Neighbours
- Logistic Regression
- Random Forest
- Decision Trees
After fitting and testing the different models on the data, Random Forest Regressor performed better than all other algorithms that we used and were able to predict the credit risk on our test set with an accuracy of 93%.