Saturday, December 24, 2011

Computational Economics lecture 19


More NumPy

Additional features of NumPy
  • Linear algebra
  • Random sampling
  • Vectorized functions

Linear Algebra

NumPy provides a module for linear algebra called linalg
This is a sub-module of NumPy
  • Attributes can be accessed via np.linalg.attribute_name
  • An interface to the well-known LAPACK Fortran library
The module provides functions for computing determinants, eigenvalues, etc.
>>> A = np.array([[1, 2], [3, 4]])
>>> np.linalg.det(A)           # Compute the determinant
-2.0
>>> np.linalg.inv(A)           # Compute the inverse
array([[-2. ,  1. ],
       [ 1.5, -0.5]])
>>> np.linalg.eigvals(A)       # Compute eigenvalues
array([-0.37228132+0.j,  5.37228132+0.j])
To solve the linear system A x = b for x we can use solve()
>>> A
array([[1, 2],
       [3, 4]])
>>> b = (5, 6)
>>> np.linalg.solve(A, b)
array([-4. ,  4.5])
Another alternative is to invert A
>>> np.dot(np.linalg.inv(A), b)
array([-4. ,  4.5])
But solve() is usually more efficient

Random Sampling

NumPy has a sub-module called random
Populates ndarrays with samples of various distributions
For example, to generate 3 draws from the beta distribution


we use random.beta
>>> a = 0.5
>>> b = 0.5
>>> np.random.beta(a, b, size=3)
array([ 0.23607443,  0.90941746,  0.49067643])
Binomial distribution:
>>> y = np.random.binomial(10, 0.5, size=1000)    # 1,000 observations of Bin(10, 0.5)
>>> y.mean()
5.1109999999999998
>>> y = np.random.binomial(10, 0.8, size=1000)
>>> y.mean()
7.9909999999999997
The normal distribution can be accessed via np.random.normal, but there is also the convenience function randn, which gives standard normals
>>> Z = np.random.randn(10000)
>>> Z = 5 * Z + 3   # Each element is now N(3, 25)
>>> Z.mean()
2.997125110268561
>>> Z.var()
24.970266678202606
All the other standard distributions can be found in np.random
  • To get the list, type dir(np.random)
  • Then use the docstrings

Other Goodies

NumPy provides some vectorized functions, which act elementwise on arrays
>>> Z = np.linspace(2, 4, 4)
>>> np.log(Z)
array([ 0.69314718,  0.98082925,  1.2039728 ,  1.38629436])
>>> np.exp(Z)
array([  7.3890561 ,  14.3919161 ,  28.03162489,  54.59815003])
>>> np.cos(Z)
array([-0.41614684, -0.88932657, -0.981674  , -0.65364362])
>>> np.sin(Z)
array([ 0.90929743,  0.45727263, -0.19056796, -0.7568025 ])
The advantage is that we avoid looping
Note that not all Python functions will be vectorized
>>> f = lambda x: -1 if x < 0 else 1
>>> f(-2)
-1
>>> f(3)
1
>>> Z = np.array([-2, 0, 2])
>>> f(Z)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <lambda>
ValueError: The truth value...
We can vectorize f using np.vectorize()
>>> f = np.vectorize(f)
>>> f(Z)
array([-1,  1,  1])
In addition
  • np.interp provides linear interpolation in 1 dim
  • np.poly1d is a class for manipulating polynomials
  • np.histogram gives histogram information
See docstrings for details

A Note on Pylab and Matplotlib

A quick aside on the relationship between Pylab and Matplotlib
One way to do plots with Matplotlib is like this
import pylab
pylab.plot([1, 2, 3])
pylab.show()
The same can be achieved by
import matplotlib.pyplot as plt
plt.plot([1, 2, 3])
plt.show()
What is the difference?
We can see the difference in the pylab initialization file:
## some stuff
from numpy import *
from numpy.fft import *
from numpy.random import *
from numpy.linalg import *
## some more stuff
from matplotlib.pyplot import *
## some more stuff
Thus, import pylab brings in
  • everything from the NumPy namespace
  • everything from various NumPy submodules (randomlinalg, etc.)
  • everything from matplotlib.pyplot
The plotting functions are in matplotlib.pyplot

Exercises

Here is a common problem in simulation
Let's say we have an array q which represents a probability mass function
  • Array of nonnegative floats with q.sum() equal to one
  • Let len(q) be equal to N
We want to generate a random variable X such that
  • X takes values in 0,...,N-1
  • The probability that X = i is equal to q[i]
The standard (inverse transform) algorithm is as follows
  • Divide unit interval [0, 1] into N subintervals I0, I1,...,IN-1, such that the length of Ii is q[i]
  • Draw a uniform r.v. U on [0, 1] and
  • Return the i such that U is in Ii
The probability of drawing i is the length of Ii, which is equal to q[i]
EDTC provides some code for this operation
from random import uniform

def sample(q):
    """
    Returns i with probability q[i], where q is an array 
    (e.g., list or tuple).
    """
    a = 0.0
    U = uniform(0,1)  
    N = len(q)
    for i in range(N):
        if a < U <= a + q[i]:
            return i
        a = a + q[i]
Can you see how it works?
  • If not, try thinking through the flow for a simple example
    • e.g., set q = [0.25, 0.75]
    • Consider different values of U
    • It helps to sketch the intervals on paper
Problem 1
Make the following two improvements to this code:
  1. Speed it up using NumPy, avoiding for and while loops
    • Hint: Use searchsorted() and cumsum()
  2. Implement as a class, where each instance stores its own value of q
    • Give the class a draw() method, which returns one draw from {0,...,len(q)-1} according to q
      • Or even better, write the method so that draw(n) returns n draws from q
Solution:
## Filename: discreterv.py
## Author: John Stachurski

from numpy import cumsum
from numpy.random import uniform

class DiscreteRV:
    """
    Each instance is provided with an array of probabilities q. 
    The draw() method returns x with probability q[x].
    """

    def __init__(self, q):
        self.set_q(q)

    def set_q(self, q):
        self.Q = cumsum(q)   # Cumulative sum

    def draw(self, n=1):
        """
        Returns n draws from q
        """
        return self.Q.searchsorted(uniform(0, 1, size=n)) 
A bit tricky, but with some thought you will be able to understand it
Problem 2
In a previous lecture and EDTC, section 6.1.2, I discuss the empirical (cumulative) distribution function


Thus Fn(x) is the fraction of the sample that falls below x.
This function approximates the true distribution function F
  • Converges to F(x) with probability one as n goes to infinity for all x
The ECDF is implemented as follows:
## Filename: ecdf.py
## Author: John Stachurski

class ECDF:

    def __init__(self, observations):
        self.observations = observations

    def __call__(self, x):
        counter = 0.0
        for obs in self.observations:
            if obs <= x:
                counter += 1
        return counter / len(self.observations)
Here's an example of usage:
from ecdf import ECDF
samples = np.random.uniform(0, 1, size=10)
F = ECDF(samples)
F(0.5)  # Returned 0.29
F.observations = np.random.uniform(0, 1, size=1000)
F(0.5)  # Returned 0.479
Our implementation is not very efficient because it uses a for loop
Problem:
  • Make the __call__ method more efficient using NumPy.
  • Add a method which plots the ECDF over [ab]
    • Here a and b are method parameters
Solution:
## Filename: ecdf2.py
## Author: John Stachurski

import numpy as np
import matplotlib.pyplot as plt

class ECDF:

    def __init__(self, observations):
        "Parameters: observations is a NumPy array."
        self.observations = observations

    def __call__(self, x): 
        try:
            return (self.observations <= x).mean()
        except AttributeError:
            # self.observations probably needs to be converted to
            # a NumPy array
            self.observations = np.array(self.observations)
            return (self.observations <= x).mean()

    def plot(self, a=None, b=None): 
        if a == None:
            # Set up a reasonable default
            a = self.observations.min() - self.observations.std()
        if b == None:
            # Set up a reasonable default
            b = self.observations.max() + self.observations.std()
        X = np.linspace(a, b, num=100)
        f = np.vectorize(self.__call__)
        plt.plot(X, f(X))
        plt.show()

0 comments: