The classic Newton method approximates the function to be optimised as a quadratic using the Taylor series expansion:
By minimising this function with respect to the optimal search direction can be found as:
The step length is then computed via a backtrack linesearch using Wolfe conditions that assure sufficient decrease.
The inverse of the Hessian is computationally expensive to compute due to both finite difference limitations and the cost of inverting a particularly large matrix. For this reason an approximation to the inverse of the Hessian is used . This approximation is updated at each iteration based on the change in and the change in as follows:
For more details on implementation I highly advise Nocedal's book, Numerical Optimization. http://www.apmath.spbu.ru/cnsa/pdf/monograf/Numerical_Optimization2006.pdf
Testing the BFGS algorithm on the Rosenbrock function in 2 dimensions, an optimal solution is found in 34 iterations.
The code implements an initial Hessian as the identity matrix, and if the problem is two dimensional then the code can produce a trajectory plot of the optimisation scheme. The central difference method is used for the calculation of gradients.