Wednesday, September 22, 2010

Scipy curve_fit for easy nonlinear least squares fitting.

Assume we are given input or independent vector x and the repsonse or dependent vector y.
It is desired to fit a non-linear multivariate function of the form y' = f(x, c0, c1,..., c_k) through the points (x, y)
by least squares with the unknown parameters $$c_0, ... c_{k-1}$$ such that

$$T = \sum_{i=0}^{n-1}([y_i- f(x, c_0, c_1, ..., c_{k-1})]^2$$

is minimized. Scipy has an easy to use function from the optimize module to find optimal values of the
fitting parameters by the Levenberg-Marquardt algorithm. It has the following calling format:

curve_fit(f, xdata, ydata, p0=None, sigma=None, **kw)

where the parameters are:

f - The model function, f(x, ...).  
    xdata - an array of predetermined values of length n.
    ydata : an array of dependentdent data.
    p0 : None, scalar, or M-length sequence
        Initial guess for the parameters with a default value of 1 if not given.
    sigma : standard deviation of dependent data which is used as weighting f.
        If not specified,  uniform weights will be applied.

The function returns a pair, (popt, pcov) where

popt : output array of optimal fitted parameters  
    pcov : variance-covariance matrix of fitted parameters. the variances of the parameters are
           along the diagonal.

Here is the built-in example (with known parameters! but with added normal errors) for an exponential function $$y =a e^{-b x} +c$$
with three fitting parameters. It is interesting to see how the curve_fit() function can recover the parameters (not exactly the original
values because of random deviations added to the data).

import numpy as np
from scipy.optimize import curve_fit

def func(x, a, b, c):
    return a*np.exp(-b*x) + c
  
if __name__ == "__main__":  
   x = np.linspace(0,4,50)
   y = func(x, 2.5, 1.3, 0.5)
   yn = y + 0.2*np.random.normal(size=len(x))
   popt, pcov = curve_fit(func, x, yn)

   print "Fitted parameters approximations:", popt
   print "Variance-covariance matrix:"
   print pcov

When the above example is run,it prints the following:

$ python curvefit.py
Fitted parameters approximations: [ 2.29258582 1.34085286 0.58836951]
Variance-covariance matrix:
[[ 0.01242794 0.00585958 -0.00061056]
[ 0.00585958 0.01945042 0.00477259]
[-0.00061056 0.00477259 0.00225576]]
~/Blogs/scipy/fitting$




But there is a catch for Ubuntu 10.04 users. The version of scipy installed is only at 0.7.0-2 while the curve_fit function
is available only with a later version 0.8.0. One has to temporarily remove the scipy-numpy and associated modules
and reinstall them again. I have difficulty installing matplotlib, so let's hope that Ubuntu package maintainers will be able to incorporate the latest statble versions of scipy, numpy and matplotlib in the next Maverick Meerkat version which is available on Monday!