Skip to content

Latest commit

 

History

History
25 lines (21 loc) · 3.01 KB

batch_gd.md

File metadata and controls

25 lines (21 loc) · 3.01 KB

Index

grape plum

Batch Gradient Descent

plum

Partial Derivative

  • To implement Gradient Descent, you need to compute the gradient of the cost function with regard to each model parameter θj .
  • In other words, you need to calculate how much the cost function will change if you change θ just a little bit.
  • This is called a partial derivative.
  • Lets computes the partial derivative of the cost function with regard to parameter θ , noted MSE(θ).

Partial derivatives of the cost function

  • Instead of computing these partial derivatives individually, you can compute them all in one go.
  • The gradient vector, noted ∇ MSE(θ) (nabla Mean Squared Error of theta), contains all the partial derivatives of the cost function (one for each model parameter).

Gradient vector of the cost function

  • Once you have the gradient vector, which points uphill, just go in the opposite direction to go downhill.
  • This means subtracting ∇ MSE(θ) from θ.
  • This is where the learning rate η (eta) comes into play: multiply the gradient vector by η to determine the size of the downhill step

Gradient Descent step