Open Mind Tree: Computational economics lecture 22

SciPy

SciPy provides functions and other objects that work with NumPy arrays:

optimization, numerical integration, statistics, etc.

Very impressive package
Currently being actively developed

SciPy versus NumPy

When we import SciPy, we also get NumPy
Here's the relevant lines of the SciPy initialization file:

# Import numpy symbols to scipy name space
from numpy import *
from numpy.random import rand, randn
from numpy.fft import fft, ifft
from numpy.lib.scimath import *
# Remove the linalg imported from numpy so that the scipy.linalg package can be
# imported.
del linalg

Most of SciPy resides in submodules

scipy.optimize, scipy.integrate, etc.

In the current set up these are not imported into the scipy namespace
Means that you have to import them separately

import scipy.optimize
from scipy.integrate import quad

Which to use, NumPy or SciPy?

SciPy includes NumPy, so you can just import SciPy
But personally I like NumPy because it is small and stable
I usually start my programs with

import numpy as np

Then I import bits and pieces from SciPy as needed

from scipy.integrate import quad
from scipy.optimize import brentq
# etc

Statistics

The scipy.stats package supplies

numerous random variable objects
- densities, cumulative distributions, random sampling, etc.
some estimation procedures
some statistical tests

Random Variables and Distributions

Recall that numpy.random provides functions for generating random variables

>>> a = 5
>>> b = 5
>>> np.random.beta(a, b, size=3)
array([ 0.23607443,  0.90941746,  0.49067643])

This generates a draw from the distribution (density)

However, sometimes we need access to the density itself, or the cdf, the quantiles, etc.
For this we can use scipy.stats
Here's an example

import numpy as np
from scipy.stats import beta
from matplotlib.pyplot import hist, plot, show
q = beta(5, 5)      # Beta(a, b), with a = b = 5
obs = q.rvs(2000)   # 2000 observations
hist(obs, bins=40, normed=True)
grid = np.linspace(0.01, 0.99, 100)
plot(grid, q.pdf(grid), 'k-', linewidth=2)
show()

The following plot is produced

In this code we created an rv_frozen object by the call

q = beta(5, 5)

The general syntax is

identifier = scipy.stats.distribution_name(shape_parameters)

where distribution_name is one of the distribution names in dir(scipy.stats)
The shape parameters depend on the specific distribution---use the online help

You do know how to do this by now, don't you?
- In a Python shell, try help(scipy.stats.beta)
- In IPython, try scipy.stats.beta?

After that we called the methods rvs and pdf, which give random variates and the density respectively
Other methods include cdf, moments, etc. See online help
In fact there are also two keyword arguments, loc and scale:

identifier = scipy.stats.distribution_name(shape_parameters, `loc=c`, `scale=d`)

These transform the original random variable X into

The methods rvs, pdf, cdf, etc. are transformed accordingly
Note that there is an alternative way of calling the methods described above
For example, the previous code can be replaced by

import numpy as np
from scipy.stats import beta
from matplotlib.pyplot import hist, plot, show
obs = beta.rvs(5, 5, size=2000)   # 2000 observations
hist(obs, bins=40, normed=True)
grid = np.linspace(0.01, 0.99, 100)
plot(grid, beta.pdf(grid, 5, 5), 'k-', linewidth=2)
show()

Other Goodies in scipy.stats

There are lots of other functions and routines in scipy.stats
A useful one is scipy.stats.linregress

from scipy import stats
x = # some array like object (list, tuple, NumPy array, etc)
y = # some array like object (list, tuple, NumPy array, etc)
gradient, intercept, r_value, p_value, std_err = stats.linregress(x,y)

However, probably the best way to do statistics in Python is to use rpy2

An interface to R from Python
Allows you to call R functions from a Python script

Roots and Fixed Points

Some background on rates of convergence
A sequence (x_n) converges to x linearly if

For example, if λ = 1/2 then the distance between the current value and the limit is at least halved at each step
The sequence converges to x with order q

If q = 2, then called quadratic convergence
The higher is q the better the rate

Case Study: Root finding

A root of f on [a, b] is an x such that f(x) = 0.
How to find one?
To simplify matters, suppose that we have a function f on [a, b] which

is continuous
has one and only one root, and, moreover,
satisfies f(a) < 0, while f(b) > 0

Bisection

A simple algorithm is bisection
You know the game where

I think of a number between 1 and 100
You say, is it less than 50?
- If yes, is it less than 25?
- If no, then is it less than 75?
And so on

This is bisection. I leave further explanation to Wikipedia
Here's an implementation of bisection for the function described above

## Filename: bisection.py
## Author: John Stachurski

def bisect(f, a, b, tol=10e-5):
    lower, upper = a, b
    while upper - lower > tol:
        middle = 0.5 * (upper + lower)
        if f(middle) > 0: # root is between lower and middle
            lower, upper = lower, middle
        else:  # Root is between middle and upper
            lower, upper = middle, upper
    print 'Root:', lower

Problem:
Recall the concept of recusive function calls from this lecture
Write a recursive implementation of the bisection function
Solution:

## Filename: bisection2.py
## Author: John Stachurski

def bisect(f, a, b, tol=10e-5):
    lower, upper = a, b
    if upper - lower < tol:
        print 'Root:', lower
    else:
        middle = 0.5 * (upper + lower)
        print 'lower = %.2f, middle = %.2f, upper = %.2f' \
                % (lower, middle, upper)
        if f(middle) > 0: # root is between lower and middle
            bisect(f, lower, middle)
        else:  # Root is between middle and upper
            bisect(f, middle, upper)

Let's test this implementation

import bisection2
from scipy.stats import beta
from random import uniform

c = uniform(0, 1)
f = lambda x: beta.cdf(x, 5, 5) - c
bisection2.bisect(f, 0, 1)

Here's the output

lower = 0.00, middle = 0.50, upper = 1.00
lower = 0.00, middle = 0.25, upper = 0.50
lower = 0.25, middle = 0.38, upper = 0.50
lower = 0.38, middle = 0.44, upper = 0.50
lower = 0.44, middle = 0.47, upper = 0.50
## some lines omitted
Root: 0.448425292969

Analysis

This bisection algorithm will always converge to the root in our setting
However, the rate of convergence is only linear
An alternative would be to use the Newton-Raphson method
Under certain conditions on the second derivative, convergence is quadratic
But the algorithm is less robust
For some functions it does not converge (cycles, etc.)
This illustrates a general principle

If you have specific knowledge about your function, you can use a more efficient technique
If not, then algorithm choice involves a trade-off between speed of convergence and robustness

In practice, most default algorithms for root finding, optimization and fixed points use hybrid methods

Combines a fast method with a robust method
- Attempts to use a fast method
- Uses some diagnostics
- If diagnostics are bad, then switches to robust algorithm

Roots and Fixed Points in Scipy

Regarding the scalar (i.e., univariate) case:
In scipy.optimize we find bisect, newton, brentq and some others
Here brentq is a hybrid method, and a good default
Let's test them

from scipy.stats import beta
from random import uniform
import bisection2
from scipy.optimize import bisect, newton, brentq

print 'Our implementation:'
c = uniform(0, 1)
f = lambda x: beta.cdf(x, 5, 5) - c
bisection2.bisect(f, 0, 1)
print 'Scipy bisect:', bisect(f, 0, 1)
print 'Scipy newton:', newton(f, 0.5)  # 0.5 = initial condition
print 'Scipy brentq:', brentq(f, 0, 1)

Here's the output

Our implementation:
lower = 0.00, middle = 0.50, upper = 1.00
## Some lines omitted
Root: 0.177001953125
Scipy bisect: 0.177014592494
Scipy newton: 0.177014592494
Scipy brentq: 0.177014592494

Note that bisect, newton and brentq all give the same value
Using IPython's timeit we can get an idea of how long each one takes

In [11]: timeit bisect(f, 0, 1)
100 loops, best of 3: 8.91 ms per loop
In [12]: timeit newton(f, 0.5)
1000 loops, best of 3: 1.33 ms per loop
In [13]: timeit brentq(f, 0, 1)
100 loops, best of 3: 2.11 ms per loop

Here newton (using Newton-Raphson) is best because the function is well behaved
But brentq is not far off, and would work well for a less well behaved function

Fixed Points

SciPy has a function for finding (scalar) fixed points too:

>>> from scipy.optimize import fixed_point
>>> fixed_point(lambda x: x**2, 10.0)  # 10.0 is an initial guess
1.0

However, current algorithm is not very robust
Might be better to use a root finder such as brentq to find the fixed point of f

Search for the root of g(x) := x - f(x)

>>> from scipy.optimize import brentq
>>> brentq(lambda x: x - x**2, 0.5, 10)
1.0

Multivariate Root Finding

Use scipy.optimize.fsolve, a wrapper for a hybrid method in MINPACK
See the docstring for details

Optimization

Most numerical packages provide only functions for minimization
To find the maximizer of f on D, find the minimizer of -f on D
Minimization is closely related to root finding

Newton's method searches for root of the first derivative
Many variations on this technique
Gradient decent is another approach using derivatives

As in root finding, nice properties of the function permit use of faster methods
But often we don't know much about the function
In this case it's best to use hybrid methods

Robust, and fast when diagnostics are good

Univariate Optimization

For constrained, univariate (i.e., scalar) minimization, a good option is fminbound

>>> from scipy.optimize import fminbound
>>> fminbound(lambda x: x**2, -1, 2)  # Search in [-1, 2]
0.0

Uses Brent's hybrid method
Finds local optima, not global

Multivariate Optimization

Multivariate local optimizers: fmin, fmin_powell, fmin_cg, fmin_bfgs, fmin_ncg
Constrained multivariate local optimizers: fmin_l_bfgs_b, fmin_tnc, fmin_cobyla
See the docstrings for details

Integration

Most numerical methods compute integral of approximating polynomial
Error depends on quality of fit (of polynomial to integrand)
The relevant module is scipy.integrate
A good default for univariate integration is quad

>>> from scipy.integrate import quad
>>> quad(lambda x: x**2, 0, 1)   # Integrate f(x) = x**2 over [0, 1]
(0.33333333333333331, 3.7007434154171879e-15)  # Integral, estimate of error

Interface to function in Fortran library QUADPACK
Uses Clenshaw-Curtis quadrature

Based on expansion in terms of Chebychev polynomials
Accuracy similar to Gaussian quadrature
- But often runs a bit faster

Other univariate options:

quadrature, romberg (Gaussian quadrature, Romberg integration)

Multivariate options

dblquad, tplquad

In high dimensions, use Monte Carlo if possible

Linear Algebra

We saw that NumPy provides a module for linear algebra called linalg
SciPy also provides a module for linear algebra with the same name
Personally I just use numpy.linalg

Open Mind Tree

Computational economics lecture 22