Saturday, December 24, 2011

Computational economics lecture 22


SciPy

SciPy provides functions and other objects that work with NumPy arrays:
  • optimization, numerical integration, statistics, etc.
Very impressive package
Currently being actively developed

SciPy versus NumPy

When we import SciPy, we also get NumPy
Here's the relevant lines of the SciPy initialization file:
# Import numpy symbols to scipy name space
from numpy import *
from numpy.random import rand, randn
from numpy.fft import fft, ifft
from numpy.lib.scimath import *
# Remove the linalg imported from numpy so that the scipy.linalg package can be
# imported.
del linalg
Most of SciPy resides in submodules
  • scipy.optimizescipy.integrate, etc.
In the current set up these are not imported into the scipy namespace
Means that you have to import them separately
import scipy.optimize
from scipy.integrate import quad

Which to use, NumPy or SciPy?

SciPy includes NumPy, so you can just import SciPy
But personally I like NumPy because it is small and stable
I usually start my programs with
import numpy as np
Then I import bits and pieces from SciPy as needed
from scipy.integrate import quad
from scipy.optimize import brentq
# etc

Statistics

The scipy.stats package supplies
  • numerous random variable objects
    • densities, cumulative distributions, random sampling, etc.
  • some estimation procedures
  • some statistical tests

Random Variables and Distributions

Recall that numpy.random provides functions for generating random variables
>>> a = 5
>>> b = 5
>>> np.random.beta(a, b, size=3)
array([ 0.23607443,  0.90941746,  0.49067643])
This generates a draw from the distribution (density)


However, sometimes we need access to the density itself, or the cdf, the quantiles, etc.
For this we can use scipy.stats
Here's an example
import numpy as np
from scipy.stats import beta
from matplotlib.pyplot import hist, plot, show
q = beta(5, 5)      # Beta(a, b), with a = b = 5
obs = q.rvs(2000)   # 2000 observations
hist(obs, bins=40, normed=True)
grid = np.linspace(0.01, 0.99, 100)
plot(grid, q.pdf(grid), 'k-', linewidth=2)
show()
The following plot is produced



In this code we created an rv_frozen object by the call
q = beta(5, 5)
The general syntax is
identifier = scipy.stats.distribution_name(shape_parameters)
where distribution_name is one of the distribution names in dir(scipy.stats)
The shape parameters depend on the specific distribution---use the online help
  • You do know how to do this by now, don't you?
    • In a Python shell, try help(scipy.stats.beta)
    • In IPython, try scipy.stats.beta?
After that we called the methods rvs and pdf, which give random variates and the density respectively
Other methods include cdfmoments, etc. See online help
In fact there are also two keyword arguments, loc and scale:
identifier = scipy.stats.distribution_name(shape_parameters, `loc=c`, `scale=d`)
These transform the original random variable X into


The methods rvspdfcdf, etc. are transformed accordingly
Note that there is an alternative way of calling the methods described above
For example, the previous code can be replaced by
import numpy as np
from scipy.stats import beta
from matplotlib.pyplot import hist, plot, show
obs = beta.rvs(5, 5, size=2000)   # 2000 observations
hist(obs, bins=40, normed=True)
grid = np.linspace(0.01, 0.99, 100)
plot(grid, beta.pdf(grid, 5, 5), 'k-', linewidth=2)
show()

Other Goodies in scipy.stats

There are lots of other functions and routines in scipy.stats
A useful one is scipy.stats.linregress
from scipy import stats
x = # some array like object (list, tuple, NumPy array, etc)
y = # some array like object (list, tuple, NumPy array, etc)
gradient, intercept, r_value, p_value, std_err = stats.linregress(x,y)
However, probably the best way to do statistics in Python is to use rpy2
  • An interface to R from Python
  • Allows you to call R functions from a Python script

Roots and Fixed Points

Some background on rates of convergence
A sequence (xn) converges to x linearly if


For example, if λ = 1/2 then the distance between the current value and the limit is at least halved at each step
The sequence converges to x with order q


If q = 2, then called quadratic convergence
The higher is q the better the rate

Case Study: Root finding

A root of f on [ab] is an x such that f(x) = 0.
How to find one?
To simplify matters, suppose that we have a function f on [ab] which
  • is continuous
  • has one and only one root, and, moreover,
  • satisfies f(a) < 0, while f(b) > 0

Bisection

A simple algorithm is bisection
You know the game where
  • I think of a number between 1 and 100
  • You say, is it less than 50?
    • If yes, is it less than 25?
    • If no, then is it less than 75?
  • And so on
This is bisection. I leave further explanation to Wikipedia
Here's an implementation of bisection for the function described above
## Filename: bisection.py
## Author: John Stachurski

def bisect(f, a, b, tol=10e-5):
    lower, upper = a, b
    while upper - lower > tol:
        middle = 0.5 * (upper + lower)
        if f(middle) > 0: # root is between lower and middle
            lower, upper = lower, middle
        else:  # Root is between middle and upper
            lower, upper = middle, upper
    print 'Root:', lower
Problem:
Recall the concept of recusive function calls from this lecture
Write a recursive implementation of the bisection function
Solution:
## Filename: bisection2.py
## Author: John Stachurski

def bisect(f, a, b, tol=10e-5):
    lower, upper = a, b
    if upper - lower < tol:
        print 'Root:', lower
    else:
        middle = 0.5 * (upper + lower)
        print 'lower = %.2f, middle = %.2f, upper = %.2f' \
                % (lower, middle, upper)
        if f(middle) > 0: # root is between lower and middle
            bisect(f, lower, middle)
        else:  # Root is between middle and upper
            bisect(f, middle, upper)
Let's test this implementation
import bisection2
from scipy.stats import beta
from random import uniform

c = uniform(0, 1)
f = lambda x: beta.cdf(x, 5, 5) - c
bisection2.bisect(f, 0, 1)
Here's the output
lower = 0.00, middle = 0.50, upper = 1.00
lower = 0.00, middle = 0.25, upper = 0.50
lower = 0.25, middle = 0.38, upper = 0.50
lower = 0.38, middle = 0.44, upper = 0.50
lower = 0.44, middle = 0.47, upper = 0.50
## some lines omitted
Root: 0.448425292969

Analysis

This bisection algorithm will always converge to the root in our setting
However, the rate of convergence is only linear
An alternative would be to use the Newton-Raphson method
Under certain conditions on the second derivative, convergence is quadratic
But the algorithm is less robust
For some functions it does not converge (cycles, etc.)
This illustrates a general principle
  • If you have specific knowledge about your function, you can use a more efficient technique
  • If not, then algorithm choice involves a trade-off between speed of convergence and robustness
In practice, most default algorithms for root finding, optimization and fixed points use hybrid methods
  • Combines a fast method with a robust method
    • Attempts to use a fast method
    • Uses some diagnostics
    • If diagnostics are bad, then switches to robust algorithm

Roots and Fixed Points in Scipy

Regarding the scalar (i.e., univariate) case:
In scipy.optimize we find bisectnewtonbrentq and some others
Here brentq is a hybrid method, and a good default
Let's test them
from scipy.stats import beta
from random import uniform
import bisection2
from scipy.optimize import bisect, newton, brentq

print 'Our implementation:'
c = uniform(0, 1)
f = lambda x: beta.cdf(x, 5, 5) - c
bisection2.bisect(f, 0, 1)
print 'Scipy bisect:', bisect(f, 0, 1)
print 'Scipy newton:', newton(f, 0.5)  # 0.5 = initial condition
print 'Scipy brentq:', brentq(f, 0, 1)
Here's the output
Our implementation:
lower = 0.00, middle = 0.50, upper = 1.00
## Some lines omitted
Root: 0.177001953125
Scipy bisect: 0.177014592494
Scipy newton: 0.177014592494
Scipy brentq: 0.177014592494
Note that bisectnewton and brentq all give the same value
Using IPython's timeit we can get an idea of how long each one takes
In [11]: timeit bisect(f, 0, 1)
100 loops, best of 3: 8.91 ms per loop
In [12]: timeit newton(f, 0.5)
1000 loops, best of 3: 1.33 ms per loop
In [13]: timeit brentq(f, 0, 1)
100 loops, best of 3: 2.11 ms per loop
Here newton (using Newton-Raphson) is best because the function is well behaved
But brentq is not far off, and would work well for a less well behaved function

Fixed Points

SciPy has a function for finding (scalar) fixed points too:
>>> from scipy.optimize import fixed_point
>>> fixed_point(lambda x: x**2, 10.0)  # 10.0 is an initial guess
1.0
However, current algorithm is not very robust
Might be better to use a root finder such as brentq to find the fixed point of f
  • Search for the root of g(x) := x - f(x)
>>> from scipy.optimize import brentq
>>> brentq(lambda x: x - x**2, 0.5, 10)
1.0

Multivariate Root Finding

Use scipy.optimize.fsolve, a wrapper for a hybrid method in MINPACK
See the docstring for details

Optimization

Most numerical packages provide only functions for minimization
To find the maximizer of f on D, find the minimizer of -f on D
Minimization is closely related to root finding
  • Newton's method searches for root of the first derivative
  • Many variations on this technique
  • Gradient decent is another approach using derivatives
As in root finding, nice properties of the function permit use of faster methods
But often we don't know much about the function
In this case it's best to use hybrid methods
  • Robust, and fast when diagnostics are good

Univariate Optimization

For constrained, univariate (i.e., scalar) minimization, a good option is fminbound
>>> from scipy.optimize import fminbound
>>> fminbound(lambda x: x**2, -1, 2)  # Search in [-1, 2]
0.0
Uses Brent's hybrid method
Finds local optima, not global

Multivariate Optimization

Multivariate local optimizers: fminfmin_powellfmin_cgfmin_bfgsfmin_ncg
Constrained multivariate local optimizers: fmin_l_bfgs_bfmin_tncfmin_cobyla
See the docstrings for details

Integration

Most numerical methods compute integral of approximating polynomial
Error depends on quality of fit (of polynomial to integrand)
The relevant module is scipy.integrate
A good default for univariate integration is quad
>>> from scipy.integrate import quad
>>> quad(lambda x: x**2, 0, 1)   # Integrate f(x) = x**2 over [0, 1]
(0.33333333333333331, 3.7007434154171879e-15)  # Integral, estimate of error
Interface to function in Fortran library QUADPACK
Uses Clenshaw-Curtis quadrature
  • Based on expansion in terms of Chebychev polynomials
  • Accuracy similar to Gaussian quadrature
    • But often runs a bit faster
Other univariate options:
  • quadratureromberg (Gaussian quadrature, Romberg integration)
Multivariate options
  • dblquadtplquad
In high dimensions, use Monte Carlo if possible

Linear Algebra

We saw that NumPy provides a module for linear algebra called linalg
SciPy also provides a module for linear algebra with the same name
Personally I just use numpy.linalg

0 comments: