Index

Batch Gradient Descent

To implement Gradient Descent, you need to compute the gradient of the cost function with regard to each model parameter θ_j .
In other words, you need to calculate how much the cost function will change if you change θ just a little bit.
This is called a partial derivative.
Lets computes the partial derivative of the cost function with regard to parameter θ , noted MSE(θ).

Instead of computing these partial derivatives individually, you can compute them all in one go.
The gradient vector, noted ∇ MSE(θ) (nabla Mean Squared Error of theta), contains all the partial derivatives of the cost function (one for each model parameter).

Once you have the gradient vector, which points uphill, just go in the opposite direction to go downhill.
This means subtracting ∇ MSE(θ) from θ.
This is where the learning rate η (eta) comes into play: multiply the gradient vector by η to determine the size of the downhill step