Boston-House-Price-Prediction-Using-Regularized-Regression

Comparing Ridge and LASSO model to find the best accuracy for House Price.

Background

Nowadays house price has been sky-rocketing, thats why I think it's gonna be intersting to do a prediction using Regularized Regression. This repo is about predicting house price using regularized regression and comparison between Ridge and LASSO accuracy value. The target on this model is 'medv' or house price, the input is a dataframe and the output is a accuracy value.

Requirement : numpy, pandas, matplotlib, seaborn, sklearn, statsmodels

About the Data

On this data there are 14 columns:

Criminal rate (crim)
Residential land zoned proportion (zn)
Non-retail business acres proportion (indus)
Is bounds with river (chas)
Nitrogen oxides concentration (nox)
Number rooms average (rm)
Owner age proportion (age)
Weighted distance to cities (dis)
Accessibility index (rad)
Tax rate (tax)
Pupil-teacher ratio (ptratio)
Black proportion (black)
Percent lower status (lstat)

Overview

I'm using 'boston.csv' as my main data, after importing it I'm definging the target and the feature the target is 'medv' and the feature is all of the 'boston.csv' columns except 'medv'. Since we want to do a linear regression and find the best lambda I divide the data into train, test, and validation using from sklearn.model_selection import train_test_split.

After that, I want to check multicolinearity variable using VIF score and correlation, for the VIF score I'm using from statsmodels.stats.outliers_influence import variance_inflation_factor as vif. Based on the VIF score and correlation I decided to drop 'tax' column to avoid multicolinearity.

The next step is fit the data using training data using Ridge from sklearn.linear_model import Ridge and Lasso from sklearn.linear_model import Lasso, and then check the best lambda using validation data for both Ridge and LASSO based on RMSE.

Based on the picture above the best model is ridge data with lambda = 1.

After that, I calculate the coefficient using ridge data with lambda = 1. Last step is calculating the testing error using from sklearn.metrics import mean_absolute_error (MAE), from sklearn.metrics import mean_absolute_percentage_error (MAPE), from sklearn.metrics import mean_squared_error (MSE).

Based on the picture above we can see that The best model for this dataset is a ridge with lambda = 1 using MAE(mean absolute error). For further information and code you can see in my file here.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
16402-shutterstock_538341163.jpg		16402-shutterstock_538341163.jpg
HW_Regression_SamuelAdi.ipynb		HW_Regression_SamuelAdi.ipynb
LICENSE		LICENSE
README.md		README.md
RMSE LASSO.JPG		RMSE LASSO.JPG
RMSE Ridge.JPG		RMSE Ridge.JPG
Testing Error.JPG		Testing Error.JPG
boston.csv		boston.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Boston-House-Price-Prediction-Using-Regularized-Regression

Background

About the Data

Overview

About

Releases

Packages

Languages

License

Samuel-the-crack/Boston-House-Price-Prediction

Folders and files

Latest commit

History

Repository files navigation

Boston-House-Price-Prediction-Using-Regularized-Regression

Background

About the Data

Overview

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages