- To implement Gradient Descent, you need to compute the gradient of the cost function with regard to each model parameter θj .
- In other words, you need to calculate how much the cost function will change if you change θ just a little bit.
- This is called a partial derivative.
- Lets computes the partial derivative of the cost function with regard to parameter θ , noted MSE(θ).
- Instead of computing these partial derivatives individually, you can compute them all in one go.
- The gradient vector, noted ∇ MSE(θ) (nabla Mean Squared Error of theta), contains all the partial derivatives of the cost function (one for each model parameter).
- Once you have the gradient vector, which points uphill, just go in the opposite direction to go downhill.
- This means subtracting ∇ MSE(θ) from θ.
- This is where the learning rate η (eta) comes into play: multiply the gradient vector by η to determine the size of the downhill step