Free Software Explorations: March 2012

Note: There was failure in running the example in my system consisting of Python 2.7 and the 0.3.1 version of AlgoPy.

Introduction

I had a problem determining the gradient of the ordinal logistic model and I could not find online references on the formula for the gradient. You see it takes around 6 seconds to compute the parameters of model using the BFGS (Broyden, Fletcher, Goldfarb, Shanno) optimizer.

I have heard of ADOL-C before for automatic differentiation, in fact I was browsing the second hand Book Sale and I found a reference book with Java-Forte programs on a CD. That turned me off, since Forte is relative old IDE and I decided NOT to buy it. So lets try an automatic differentiation package for Python.

The home page is at http://packages.python.org/algopy/ homepage.

The documentation at the homepage states the purpose of AlgoPy nicely:

The purpose of AlgoPy is the evaluation of higher-order derivatives in the forward and reverse mode of Algorithmic Differentiation (AD) of functions that are implemented as Python programs. Particular focus are functions that contain numerical linear algebra functions as they often appear in statistically motivated functions. The intended use of AlgoPy is for easy prototyping at reasonable execution speeds. More precisely, for a typical program a directional derivative takes order 10 times as much time as time as the function evaluation. This is approximately also true for the gradient.

Capabilities of AlgoPy

AlgoPy promises to compute the following if possible the following:

evaluation of derivatives useful for nonlinear continuous optimization
gradient
Jacobian
Hessian
Jacobian vector product
vector Jacobian product
Hessian vector product
vector Hessian vector product
higher-order tensors
Taylor series evaluation

Installation

I am using the latest Ubuntu OS ( march 24, 2012
Oneiric Ocelot) on my laptop which serves as my main computer. Try to see the version of your installed default Python by issuing python --version. I see that I have 2.7.2+ Now try this:

sudo easy_install-2.7 algopy

That's all! Here are the terminal output messages

sudo easy_install-2.7 algopy
Searching for algopy
Reading http://pypi.python.org/simple/algopy/
Reading http://packages.python.org/algopy
Reading http://www.github.com/b45ch1/algopy
Best match: algopy 0.3.1
Downloading http://pypi.python.org/packages/source/a/algopy/algopy-0.3.1.zip#md5=84edf4722e0f67ae254d663906893ff5
Processing algopy-0.3.1.zip
Running algopy-0.3.1/setup.py -q bdist_egg --dist-dir /tmp/easy_install-SvuPun/algopy-0.3.1/egg-dist-tmp-8pSEhA
.dev
warning: manifest_maker: MANIFEST.in, line 1: 'recursive-include' expects    ...

zip_safe flag not set; analyzing archive contents...
algopy.__init__: module references __file__
algopy.tracer.tests.environment: module references __file__
Adding algopy 0.3.1 to easy-install.pth file

Installed /usr/local/lib/python2.7/dist-packages/algopy-0.3.1-py2.7.egg
Processing dependencies for algopy
Finished processing dependencies for algopy

Actually easy_install has three versions in my machine, namely easy_install easy_install-2.6 easy_install-2.7 I got the following error for instance by typing easy_install only:

sudo easy_install algopy
[sudo] password for toto: 
Traceback (most recent call last):
  File "/usr/local/bin/easy_install", line 5, in 
    from pkg_resources import load_entry_point
  File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 2676, in 
    parse_requirements(__requires__), Environment()
  File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 552, in resolve
    raise DistributionNotFound(req)

It is much better to install easy_install to your system than downloading the tarred package from the homepage and unpacking and installing it via python setup.py install

Trying out an example

import numpy, algopy
from algopy import UTPM, exp

def eval_f(x):
    """ some function """
    return x[0]*x[1]*x[2] + exp(x[0])*x[1]

# forward mode without building the computational graph
# -----------------------------------------------------
x = UTPM.init_jacobian([3,5,7])
y = eval_f(x)
algopy_jacobian = UTPM.extract_jacobian(y)
print 'jacobian = ',algopy_jacobian

# reverse mode using a computational graph
# ----------------------------------------

# STEP 1: trace the function evaluation
cg = algopy.CGraph()
x = algopy.Function([1,2,3])
y = eval_f(x)
cg.trace_off()
cg.independentFunctionList = [x]
cg.dependentFunctionList = [y]

# STEP 2: use the computational graph to evaluate derivatives
print 'gradient =', cg.gradient([[3.,5,7]])
print 'Jacobian =', cg.jacobian([[3.,5,7]])
print 'Hessian =', cg.hessian([[3.,5.,7.]])
print 'Hessian vector product =', cg.hess_vec([[3.,5.,7.]],[[4,5,6]])

When the above is executed in my laptop, it prints to the console or terminal the following:

.
>>> import numpy, algopy
>>> from algopy import UTPM, exp
>>> 
>>> def eval_f(x):
...     """ some function """
...     return x[0]*x[1]*x[2] + exp(x[0])*x[1]
... 
>>> # forward mode without building the computational graph
... # -----------------------------------------------------
... x = UTPM.init_jacobian([3,5,7])
>>> y = eval_f(x)
>>> algopy_jacobian = UTPM.extract_jacobian(y)
>>> print 'jacobian = ',algopy_jacobian
jacobian =  [ 135.42768462   41.08553692   15.        ]
>>> 
>>> # reverse mode using a computational graph
... # ----------------------------------------
... 
>>> # STEP 1: trace the function evaluation
... cg = algopy.CGraph()
>>> x = algopy.Function([1,2,3])
>>> y = eval_f(x)
>>> cg.trace_off()

>>> cg.independentFunctionList = [x]
>>> cg.dependentFunctionList = [y]
>>> 
>>> # STEP 2: use the computational graph to evaluate derivatives
... print 'gradient =', cg.gradient([[3.,5,7]])
gradient = [array([ 135.42768462,   41.08553692,   15.        ])]
>>> print 'Jacobian =', cg.jacobian([[3.,5,7]])
Jacobian =
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/local/lib/python2.7/dist-packages/algopy-0.3.1-py2.7.egg/algopy/tracer/tracer.py", line 262, in jacobian
    self.pushforward(utpm_x_list)
  File "/usr/local/lib/python2.7/dist-packages/algopy-0.3.1-py2.7.egg/algopy/tracer/tracer.py", line 103, in pushforward
    f.__class__.pushforward(f.func, f.args, Fout = f)
  File "/usr/local/lib/python2.7/dist-packages/algopy-0.3.1-py2.7.egg/algopy/tracer/tracer.py", line 637, in pushforward
    out  = func(*args)
  File "/usr/local/lib/python2.7/dist-packages/algopy-0.3.1-py2.7.egg/algopy/utpm/utpm.py", line 73, in __getitem__
    tmp = self.data.__getitem__((slice(None),slice(None)) + tuple(sl))
IndexError: invalid index
>>> print 'Hessian =', cg.hessian([[3.,5.,7.]])
Hessian =
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/local/lib/python2.7/dist-packages/algopy-0.3.1-py2.7.egg/algopy/tracer/tracer.py", line 349, in hessian
    self.pushforward(utpm_x_list)
  File "/usr/local/lib/python2.7/dist-packages/algopy-0.3.1-py2.7.egg/algopy/tracer/tracer.py", line 103, in pushforward
    f.__class__.pushforward(f.func, f.args, Fout = f)
  File "/usr/local/lib/python2.7/dist-packages/algopy-0.3.1-py2.7.egg/algopy/tracer/tracer.py", line 637, in pushforward
    out  = func(*args)
  File "/usr/local/lib/python2.7/dist-packages/algopy-0.3.1-py2.7.egg/algopy/utpm/utpm.py", line 73, in __getitem__
    tmp = self.data.__getitem__((slice(None),slice(None)) + tuple(sl))
IndexError: invalid index
>>> print 'Hessian vector product =', cg.hess_vec([[3.,5.,7.]],[

Ouch, we have fatal errors! We do have the latest version of AlgoPy (0.3.1). But you may be luckier than me.

We are a little sorry for this post. But we will come back soon, we need to visit the mailing list or the algopy group (if there is). We will come back soon!

You can of course perform powerful statistical routines using Python, Rpy module and R. But greater joy is now coming to pure Python users with the Scikits.statsmodels package.

Install the scikits.statsmodels module

Python >= 2.5
NumPy >= 1.4.0 
SciPy >= 0.7

sudo apt-get install python python-setuptools python-numpy python-scipy

Download the software scikits.statsmodels module

wget -ct 0 http://pypi.python.org/packages/source/s/scikits.statsmodels/scikits.statsmodels-0.3.1.tar.gz#md5=1f55b53d161544b95ca2709c9731c00c

tar -xvvf scikits.statsmodels-0.3.1.tar.gz
sudo python setup.py install

Test the installed module

sudo apt-get install ipython

import scikits.statsmodels.api as sm
help(sm)

Help on module scikits.statsmodels.api in scikits.statsmodels:  NAME     scikits.statsmodels.api - Statistical models  FILE     /home/toto/testsoftware/scikits.statsmodels-0.3.1/scikits/statsmodels/api.py  DESCRIPTION      - standard `regression` models            - `GLS` (generalized least squares regression)       - `OLS` (ordinary least square regression)       - `WLS` (weighted least square regression)       - `GLASAR` (GLS with autoregressive errors model)            - `GLM` (generalized linear models)       - robust statistical models            - `RLM` (robust linear models using M estimators)       - `robust.norms` estimates       - `robust.scale` estimates (MAD, Huber's proposal 2).       - sandbox models       - `mixed` effects models       - `gam` (generalized additive model

In [17]: dir(sm)
Out[17]: 
['GLM',
 'GLS',
 'GLSAR',
 'Logit',
 'MNLogit',
 'OLS',
 'Poisson',
 'Probit',
 'RLM',
 'WLS',
 '__builtins__',
 '__doc__',
 '__file__',
 '__name__',
 '__package__',
 'add_constant',
 'categorical',
 'datasets',
 'families',
 'iolib',
 'nonparametric',
 'regression',
 'robust',
 'test',
 'tools',
 'tsa',
 'version']

Additional resources

http://statsmodels.sourceforge.net/

http://groups.google.com/group/pystatsmodels

The regression example from the homepage

import numpy as np
import scikits.statsmodels.api as sm


# get data
nsample = 100
x = np.linspace(0,10, 100)
X = sm.add_constant(np.column_stack((x, x**2)))
beta = np.array([1, 0.1, 10])
y = np.dot(X, beta) + np.random.normal(size=nsample)


# run the regression
results = sm.OLS(y, X).fit()

# look at the results
print results.summary()

     Summary of Regression Results     
=======================================
| Dependent Variable:            ['y']|
| Model:                           OLS|
| Method:                Least Squares|
| Date:               Wed, 07 Mar 2012|
| Time:                       17:58:02|
| # obs:                         100.0|
| Df residuals:                   97.0|
| Df model:                        2.0|
==============================================================================
|                   coefficient     std. error    t-statistic          prob. |
------------------------------------------------------------------------------
| x1                      1.060         0.1353         7.8344         0.0000 |
| x2                    0.09398        0.01309         7.1802         0.0000 |
| const                   9.848         0.2927        33.6490         0.0000 |
==============================================================================
|                          Models stats                      Residual stats  |
------------------------------------------------------------------------------
| R-squared:                     0.9729   Durbin-Watson:              1.803  |
| Adjusted R-squared:            0.9724   Omnibus:                    3.108  |
| F-statistic:                    1742.   Prob(Omnibus):             0.2114  |
| Prob (F-statistic):         9.776e-77   JB:                         2.791  |
| Log likelihood:                -139.9   Prob(JB):                  0.2477  |
| AIC criterion:                  285.8   Skew:                     -0.4176  |
| BIC criterion:                  293.6   Kurtosis:                   2.927  |
------------------------------------------------------------------------------

Free Software Explorations

Pages

Sunday, March 25, 2012

AlgoPy-automatic differentiation for Python programs

Introduction

Capabilities of AlgoPy

Installation

Trying out an example

Wednesday, March 7, 2012

Its about time Python get serious with Statistics! Introducing Scikits.statsmodels (Linux Ubuntu)), Part 1