Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Training Text for Housing Problem #183

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
79 changes: 69 additions & 10 deletions other/housing/housing.jl
Original file line number Diff line number Diff line change
@@ -1,12 +1,32 @@
# # Machine Learning Problem : Housing Dataset
#
# The housing problem functions as a starting point in Machine Learning.
# We'll be demonstrating the use of Julia's [Flux Package](https://fluxml.ai/)
# to do this problem.
#
# The data replicates the housing data example from the Knet.jl readme. Although we
# could have reused more of Flux (see the mnist example), the library's
# abstractions are very lightweight and don't force you into any particular
# strategy.
#
# [This](http://www.mit.edu/~6.s085/notes/lecture3.pdf) might help you know more about the fundamentals of what
# we're about to do. If you don't understand something there which is also not mentioned here in this file,
# you may overlook that (or search it up on google to quench your curiosity :-)

using Flux.Tracker, Statistics, DelimitedFiles
using Flux.Tracker: Params, gradient, update!
using DelimitedFiles, Statistics
using Flux: gpu

# This replicates the housing data example from the Knet.jl readme. Although we
# could have reused more of Flux (see the mnist example), the library's
# abstractions are very lightweight and don't force you into any particular
# strategy.
mdvsh marked this conversation as resolved.
Show resolved Hide resolved
# ## Getting the data and other pre-processing.
# We'll start by getting <code>housing.data</code> and splitting it into
mdvsh marked this conversation as resolved.
Show resolved Hide resolved
# training and test sets.
# Training Dataset is the sample of data used to **fit** the model while
# Test Dataset is the sample of data used to provide an unbiased evaluation
# of a final model fit on the training dataset.

# Our aim is to predict the price of the house. In this dataset, the last
# feature is the price and would therefore be our target.

cd(@__DIR__)

Expand All @@ -16,30 +36,60 @@ isfile("housing.data") ||

rawdata = readdlm("housing.data")'

# The last feature is our target -- the price of the house.
split_ratio = 0.1 # For the train test split
#-

# Specifying the split ratio and **x** and **y**
split_ratio = 0.1

x = rawdata[1:13,:] |> gpu
y = rawdata[14:14,:] |> gpu

# Normalise the data
# ### Normalising
# What is the need ? <br>
mdvsh marked this conversation as resolved.
Show resolved Hide resolved
# Normalization is a technique often applied as part of data preparation for machine learning.
# The goal of normalization is to change the values of numeric columns in the dataset to a common scale,
# without distorting differences in the ranges of values. For machine learning, every dataset does not require normalization.
# It is required only when features have different ranges like in this case.

x = (x .- mean(x, dims = 2)) ./ std(x, dims = 2)

# Split into train and test sets
# ### Splitting into test and training sets.

split_index = floor(Int,size(x,2)*split_ratio)
x_train = x[:,1:split_index]
y_train = y[:,1:split_index]
x_test = x[:,split_index+1:size(x,2)]
y_test = y[:,split_index+1:size(x,2)]

# The model
# ## The Model
# Here comes everyone's favourite part : implementing a machine learning model.
#
# We'll now define the Weight (W) and the Bias (b) terms. They are our hyperparameter which
# we tune to enhance our predictions during gradient descent.
# To get an intution about how gradientDescent actually works, check out Andrew Ng's awesome explaination
# here. [Video 1: Intution](https://www.youtube.com/watch?v=rIVLE3condE) |
# [Video 2: The Algorithm](https://www.youtube.com/watch?v=yFPLyDwVifc)

W = param(randn(1,13)/10) |> gpu
b = param([0.]) |> gpu

# Here are our prediction and loss functions.
# - The prediction functions returns our prediction of the price of the house as
# suggested by our 2 hyperparameters: W and b.
# - MSE is the average of the squared error that is used as the loss function for least squares regression.
# It is the sum, over all the data points, of the square of the difference between the predicted and actual target
# variables, divided by the number of data points.
#
# Loss functions evaluate how well your algorithm models your dataset.
# If predictions are off, the loss function is high. If they're good, it'll be low.

predict(x) = W*x .+ b
meansquarederror(ŷ, y) = sum((ŷ .- y).^2)/size(y, 2)
loss(x, y) = meansquarederror(predict(x), y)

# ### Gradient Descent
# Optimizing our parameters to get accurate prediction. Learn more from the links I mentioned above.
mdvsh marked this conversation as resolved.
Show resolved Hide resolved

η = 0.1
θ = Params([W, b])

Expand All @@ -51,6 +101,15 @@ for i = 1:10
@show loss(x_train, y_train)
end

# Predict the RMSE on the test set
# ## Predictions
# Now we're in a position to know how well our program works on the given data.

err = meansquarederror(predict(x_test),y_test)
println(err)

# The prepared model might not very good for predicting the housing prices and may have high error.
# One can improve the prediction results using many other possible machine learning algorithms and techniques.
# If this was your first ML project in Flux, Congrats!
#
# You should have gotten a gist of basic ML functionality in Flux Package using Julia by now.