Sunday, March 25, 2012

AlgoPy-automatic differentiation for Python programs



Note: There was failure in running the example in my system consisting of Python 2.7 and the 0.3.1 version of AlgoPy.


Introduction


I had a problem determining the gradient of the ordinal logistic model and I could not find online references on the formula for the gradient. You see it takes around 6 seconds to compute the parameters of model using the BFGS (Broyden, Fletcher, Goldfarb, Shanno) optimizer.

I have heard of ADOL-C before for automatic differentiation, in fact I was browsing the second hand Book Sale and I found a reference book with Java-Forte programs on a CD. That turned me off, since Forte is relative old IDE and I decided NOT to buy it. So lets try an automatic differentiation package for Python.


The home page is at http://packages.python.org/algopy/ homepage.

The documentation at the homepage states the purpose of AlgoPy nicely:


The purpose of AlgoPy is the evaluation of higher-order derivatives in the forward and reverse mode of Algorithmic Differentiation (AD) of functions that are implemented as Python programs. Particular focus are functions that contain numerical linear algebra functions as they often appear in statistically motivated functions. The intended use of AlgoPy is for easy prototyping at reasonable execution speeds. More precisely, for a typical program a directional derivative takes order 10 times as much time as time as the function evaluation. This is approximately also true for the gradient.


Capabilities of AlgoPy

AlgoPy promises to compute the following if possible the following:

  • evaluation of derivatives useful for nonlinear continuous optimization
  • gradient
  • Jacobian
  • Hessian
  • Jacobian vector product
  • vector Jacobian product
  • Hessian vector product
  • vector Hessian vector product
  • higher-order tensors
  • Taylor series evaluation

Installation

I am using the latest Ubuntu OS ( march 24, 2012
Oneiric Ocelot) on my laptop which serves as my main computer. Try to see the version of your installed default Python by issuing python --version. I see that I have 2.7.2+ Now try this:
sudo easy_install-2.7 algopy
That's all! Here are the terminal output messages
sudo easy_install-2.7 algopy Searching for algopy Reading http://pypi.python.org/simple/algopy/ Reading http://packages.python.org/algopy Reading http://www.github.com/b45ch1/algopy Best match: algopy 0.3.1 Downloading http://pypi.python.org/packages/source/a/algopy/algopy-0.3.1.zip#md5=84edf4722e0f67ae254d663906893ff5 Processing algopy-0.3.1.zip Running algopy-0.3.1/setup.py -q bdist_egg --dist-dir /tmp/easy_install-SvuPun/algopy-0.3.1/egg-dist-tmp-8pSEhA .dev warning: manifest_maker: MANIFEST.in, line 1: 'recursive-include' expects ... zip_safe flag not set; analyzing archive contents... algopy.__init__: module references __file__ algopy.tracer.tests.environment: module references __file__ Adding algopy 0.3.1 to easy-install.pth file Installed /usr/local/lib/python2.7/dist-packages/algopy-0.3.1-py2.7.egg Processing dependencies for algopy Finished processing dependencies for algopy
Actually easy_install has three versions in my machine, namely easy_install easy_install-2.6 easy_install-2.7 I got the following error for instance by typing easy_install only:
sudo easy_install algopy [sudo] password for toto: Traceback (most recent call last): File "/usr/local/bin/easy_install", line 5, in from pkg_resources import load_entry_point File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 2676, in parse_requirements(__requires__), Environment() File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 552, in resolve raise DistributionNotFound(req)
It is much better to install easy_install to your system than downloading the tarred package from the homepage and unpacking and installing it via python setup.py install

Trying out an example

import numpy, algopy from algopy import UTPM, exp def eval_f(x): """ some function """ return x[0]*x[1]*x[2] + exp(x[0])*x[1] # forward mode without building the computational graph # ----------------------------------------------------- x = UTPM.init_jacobian([3,5,7]) y = eval_f(x) algopy_jacobian = UTPM.extract_jacobian(y) print 'jacobian = ',algopy_jacobian # reverse mode using a computational graph # ---------------------------------------- # STEP 1: trace the function evaluation cg = algopy.CGraph() x = algopy.Function([1,2,3]) y = eval_f(x) cg.trace_off() cg.independentFunctionList = [x] cg.dependentFunctionList = [y] # STEP 2: use the computational graph to evaluate derivatives print 'gradient =', cg.gradient([[3.,5,7]]) print 'Jacobian =', cg.jacobian([[3.,5,7]]) print 'Hessian =', cg.hessian([[3.,5.,7.]]) print 'Hessian vector product =', cg.hess_vec([[3.,5.,7.]],[[4,5,6]])
When the above is executed in my laptop, it prints to the console or terminal the following:
. >>> import numpy, algopy >>> from algopy import UTPM, exp >>> >>> def eval_f(x): ... """ some function """ ... return x[0]*x[1]*x[2] + exp(x[0])*x[1] ... >>> # forward mode without building the computational graph ... # ----------------------------------------------------- ... x = UTPM.init_jacobian([3,5,7]) >>> y = eval_f(x) >>> algopy_jacobian = UTPM.extract_jacobian(y) >>> print 'jacobian = ',algopy_jacobian jacobian = [ 135.42768462 41.08553692 15. ] >>> >>> # reverse mode using a computational graph ... # ---------------------------------------- ... >>> # STEP 1: trace the function evaluation ... cg = algopy.CGraph() >>> x = algopy.Function([1,2,3]) >>> y = eval_f(x) >>> cg.trace_off() >>> cg.independentFunctionList = [x] >>> cg.dependentFunctionList = [y] >>> >>> # STEP 2: use the computational graph to evaluate derivatives ... print 'gradient =', cg.gradient([[3.,5,7]]) gradient = [array([ 135.42768462, 41.08553692, 15. ])] >>> print 'Jacobian =', cg.jacobian([[3.,5,7]]) Jacobian = Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python2.7/dist-packages/algopy-0.3.1-py2.7.egg/algopy/tracer/tracer.py", line 262, in jacobian self.pushforward(utpm_x_list) File "/usr/local/lib/python2.7/dist-packages/algopy-0.3.1-py2.7.egg/algopy/tracer/tracer.py", line 103, in pushforward f.__class__.pushforward(f.func, f.args, Fout = f) File "/usr/local/lib/python2.7/dist-packages/algopy-0.3.1-py2.7.egg/algopy/tracer/tracer.py", line 637, in pushforward out = func(*args) File "/usr/local/lib/python2.7/dist-packages/algopy-0.3.1-py2.7.egg/algopy/utpm/utpm.py", line 73, in __getitem__ tmp = self.data.__getitem__((slice(None),slice(None)) + tuple(sl)) IndexError: invalid index >>> print 'Hessian =', cg.hessian([[3.,5.,7.]]) Hessian = Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python2.7/dist-packages/algopy-0.3.1-py2.7.egg/algopy/tracer/tracer.py", line 349, in hessian self.pushforward(utpm_x_list) File "/usr/local/lib/python2.7/dist-packages/algopy-0.3.1-py2.7.egg/algopy/tracer/tracer.py", line 103, in pushforward f.__class__.pushforward(f.func, f.args, Fout = f) File "/usr/local/lib/python2.7/dist-packages/algopy-0.3.1-py2.7.egg/algopy/tracer/tracer.py", line 637, in pushforward out = func(*args) File "/usr/local/lib/python2.7/dist-packages/algopy-0.3.1-py2.7.egg/algopy/utpm/utpm.py", line 73, in __getitem__ tmp = self.data.__getitem__((slice(None),slice(None)) + tuple(sl)) IndexError: invalid index >>> print 'Hessian vector product =', cg.hess_vec([[3.,5.,7.]],[

Ouch, we have fatal errors! We do have the latest version of AlgoPy (0.3.1). But you may be luckier than me.

We are a little sorry for this post. But we will come back soon, we need to visit the mailing list or the algopy group (if there is). We will come back soon!

Wednesday, March 7, 2012

Its about time Python get serious with Statistics! Introducing Scikits.statsmodels (Linux Ubuntu)), Part 1

You can of course perform powerful statistical routines using Python, Rpy module and R. But greater joy is now coming to pure Python users with the Scikits.statsmodels package.


  • Install the scikits.statsmodels module
  • Be sure that your versions of scipy and numpy are the latest or later than the following:
    Python >= 2.5
    NumPy >= 1.4.0 
    SciPy >= 0.7
    
    You can install the latest versions by typing sudo apt-get install python python-setuptools python-numpy python-scipy
  • Download the software scikits.statsmodels module
  • In a terminal type wget -ct 0 http://pypi.python.org/packages/source/s/scikits.statsmodels/scikits.statsmodels-0.3.1.tar.gz#md5=1f55b53d161544b95ca2709c9731c00c Untar it,then descend to the directory.
    tar -xvvf scikits.statsmodels-0.3.1.tar.gz
    sudo python setup.py install
    
  • Test the installed module
  • Lets try this from ipython(Again, do a sudo apt-get install ipython if it is not yet installed.
    import scikits.statsmodels.api as sm
    help(sm)
    
    The following will be printed:
    Help on module scikits.statsmodels.api in scikits.statsmodels:  NAME     scikits.statsmodels.api - Statistical models  FILE     /home/toto/testsoftware/scikits.statsmodels-0.3.1/scikits/statsmodels/api.py  DESCRIPTION      - standard `regression` models            - `GLS` (generalized least squares regression)       - `OLS` (ordinary least square regression)       - `WLS` (weighted least square regression)       - `GLASAR` (GLS with autoregressive errors model)            - `GLM` (generalized linear models)       - robust statistical models            - `RLM` (robust linear models using M estimators)       - `robust.norms` estimates       - `robust.scale` estimates (MAD, Huber's proposal 2).       - sandbox models       - `mixed` effects models       - `gam` (generalized additive model 
    What is available by importing sm? Type dir(sm) and the following functions or attibutes will be listed:
    In [17]: dir(sm)
    Out[17]: 
    ['GLM',
     'GLS',
     'GLSAR',
     'Logit',
     'MNLogit',
     'OLS',
     'Poisson',
     'Probit',
     'RLM',
     'WLS',
     '__builtins__',
     '__doc__',
     '__file__',
     '__name__',
     '__package__',
     'add_constant',
     'categorical',
     'datasets',
     'families',
     'iolib',
     'nonparametric',
     'regression',
     'robust',
     'test',
     'tools',
     'tsa',
     'version']
    
  • Additional resources
  • Home page of pystatsmodels: http://statsmodels.sourceforge.net/ Subscribe to the Google group: http://groups.google.com/group/pystatsmodels
  • The regression example from the homepage
  • It is time to show what statsmodels can do. Here is a regression example from the home page: http://statsmodels.sourceforge.net/
    import numpy as np
    import scikits.statsmodels.api as sm
    
    
    # get data
    nsample = 100
    x = np.linspace(0,10, 100)
    X = sm.add_constant(np.column_stack((x, x**2)))
    beta = np.array([1, 0.1, 10])
    y = np.dot(X, beta) + np.random.normal(size=nsample)
    
    
    # run the regression
    results = sm.OLS(y, X).fit()
    
    # look at the results
    print results.summary()
    
    Here is the results by running the above published example:
         Summary of Regression Results     
    =======================================
    | Dependent Variable:            ['y']|
    | Model:                           OLS|
    | Method:                Least Squares|
    | Date:               Wed, 07 Mar 2012|
    | Time:                       17:58:02|
    | # obs:                         100.0|
    | Df residuals:                   97.0|
    | Df model:                        2.0|
    ==============================================================================
    |                   coefficient     std. error    t-statistic          prob. |
    ------------------------------------------------------------------------------
    | x1                      1.060         0.1353         7.8344         0.0000 |
    | x2                    0.09398        0.01309         7.1802         0.0000 |
    | const                   9.848         0.2927        33.6490         0.0000 |
    ==============================================================================
    |                          Models stats                      Residual stats  |
    ------------------------------------------------------------------------------
    | R-squared:                     0.9729   Durbin-Watson:              1.803  |
    | Adjusted R-squared:            0.9724   Omnibus:                    3.108  |
    | F-statistic:                    1742.   Prob(Omnibus):             0.2114  |
    | Prob (F-statistic):         9.776e-77   JB:                         2.791  |
    | Log likelihood:                -139.9   Prob(JB):                  0.2477  |
    | AIC criterion:                  285.8   Skew:                     -0.4176  |
    | BIC criterion:                  293.6   Kurtosis:                   2.927  |
    ------------------------------------------------------------------------------
    
    
    So far so good. We hope that the statsmodels module be more robust, easier to use and inspiring enough for Python users users to contribute code!