Skip to content

Regularized Levenberg-Marquardt algorithm for nonlinear regression on small size datasets

Notifications You must be signed in to change notification settings

flowel1/nonlinear-regression

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Levenberg-Marquardt algorithm for nonlinear regression

We present a Python implementation of a regularized version of the Levenberg-Marquardt algorithm for nonlinear regression. Regularization is obtained by setting prior distributions (Gaussian or Lognormal) on the model parameters. This is suitable for cases when few data are available and we want to incorporate prior knowledge in the model estimation process.

Nonlinear regression: generic problem statement

Let us consider a regression problem where a scalar variable y (target variable) must be predicted based on a vector of observables .

We assume that the dynamics are nonlinear and, specifically, that

where is a vector of unknown real parameters, f is a known deterministic function nonlinear in θ and ε is a random noise with distribution for some positive and unknown value of σ.

If we have N independent observations , we can estimate the value of θ by maximizing the log-likelihood. We can optionally choose to weight some observations more or less than others by choosing weights and assuming that for each i.

Under these assumptions and setting for simplicity of notation

we have that the log-likelihood is given by

Maximizing the log-likelihood is therefore equivalent to minimizing the following objective function (weighted sum of squared residuals):

Moreover, the maximum likelihood estimate for σ is

.

The Levenberg-Marquardt algorithm

The Levenberg-Marquardt algorithm calculates the minimum of Obj in an iterative way calculating a series of local quadratic approximations.

The algorithm requires that an initial guess is provided for the unknown vector of parameters. The objective function Obj is then approximated locally in a neighbourhood of with the following quadratic function (the approximation is only valid for small values of ||δ||:

The peculiarity here is that, thanks to the objective function's special form, we can calculate a local quadratic approximation by taking the first order expansion of f instead of the second-order expansion of the objective function itself (as we would be forced to do in the general case).

Defining for simplicity of notation

we have that this quadratic function has a unique global minimum satisfying the equation

which is equivalent to requiring that the displacement δ solves the following linear system:

This picture illustrates the calculation of δ in a simple one-dimensional case:

alt-text

In the picture, the displacement has been applied as it is. Actually, in practice, since the quadratic approximation is generally only valid locally, δ will just provide the displacement direction, while its module will be re-scaled according to a small positive number h (step) when updating θ:

Regularization

The Levenberg-Marquardt algorithm can be extended to incorporate regularization terms. Seeing the problem in a Bayesian perspective, we can decide to provide a prior distribution on θ. For simplicity, we assume that the prior distributions on the different parameter components are independent, so that the global prior distribution can be factorized:

We assume moreover that the one-dimensional priors on the single components are either gaussian or lognormal:

Reasoning in Bayesian terms, this time we estimate θ via maximum posterior instead of maximum likelihood. The posterior distribution of θ is

(using the fact that ).

Keeping the same notations as before, the objective function to minimize is now

(the term const incorporates terms that are independent of both θ and σ). This corresponds to minimizing a weighted sum of squared residuals plus a series of regularization terms.

The following contributions are added to φ and to the j-th component of its gradient due to the presence of prior distributions:

Gaussian prior:

Lognormal prior:

(We have used the first-order expansion of log in the lognormal case).

The linear system is now

where

and σ is replaced by its maximum likelihood estimate (see above).