Open Mind Tree: Computational Economics lecture 19

More NumPy

Additional features of NumPy

Linear algebra
Random sampling
Vectorized functions

Linear Algebra

NumPy provides a module for linear algebra called linalg

This is a sub-module of NumPy

Attributes can be accessed via np.linalg.attribute_name
An interface to the well-known LAPACK Fortran library

The module provides functions for computing determinants, eigenvalues, etc.

>>> A = np.array([[1, 2], [3, 4]])
>>> np.linalg.det(A)           # Compute the determinant
-2.0
>>> np.linalg.inv(A)           # Compute the inverse
array([[-2. ,  1. ],
       [ 1.5, -0.5]])
>>> np.linalg.eigvals(A)       # Compute eigenvalues
array([-0.37228132+0.j,  5.37228132+0.j])

To solve the linear system A x = b for x we can use solve()

>>> A
array([[1, 2],
       [3, 4]])
>>> b = (5, 6)
>>> np.linalg.solve(A, b)
array([-4. ,  4.5])

Another alternative is to invert A

>>> np.dot(np.linalg.inv(A), b)
array([-4. ,  4.5])

But solve() is usually more efficient

Random Sampling

NumPy has a sub-module called random

Populates ndarrays with samples of various distributions

For example, to generate 3 draws from the beta distribution

we use random.beta

>>> a = 0.5
>>> b = 0.5
>>> np.random.beta(a, b, size=3)
array([ 0.23607443,  0.90941746,  0.49067643])

Binomial distribution:

>>> y = np.random.binomial(10, 0.5, size=1000)    # 1,000 observations of Bin(10, 0.5)
>>> y.mean()
5.1109999999999998
>>> y = np.random.binomial(10, 0.8, size=1000)
>>> y.mean()
7.9909999999999997

The normal distribution can be accessed via np.random.normal, but there is also the convenience function randn, which gives standard normals

>>> Z = np.random.randn(10000)
>>> Z = 5 * Z + 3   # Each element is now N(3, 25)
>>> Z.mean()
2.997125110268561
>>> Z.var()
24.970266678202606

All the other standard distributions can be found in np.random

To get the list, type dir(np.random)
Then use the docstrings

Other Goodies

NumPy provides some vectorized functions, which act elementwise on arrays

>>> Z = np.linspace(2, 4, 4)
>>> np.log(Z)
array([ 0.69314718,  0.98082925,  1.2039728 ,  1.38629436])
>>> np.exp(Z)
array([  7.3890561 ,  14.3919161 ,  28.03162489,  54.59815003])
>>> np.cos(Z)
array([-0.41614684, -0.88932657, -0.981674  , -0.65364362])
>>> np.sin(Z)
array([ 0.90929743,  0.45727263, -0.19056796, -0.7568025 ])

The advantage is that we avoid looping

Note that not all Python functions will be vectorized

>>> f = lambda x: -1 if x < 0 else 1
>>> f(-2)
-1
>>> f(3)
1
>>> Z = np.array([-2, 0, 2])
>>> f(Z)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <lambda>
ValueError: The truth value...

We can vectorize f using np.vectorize()

>>> f = np.vectorize(f)
>>> f(Z)
array([-1,  1,  1])

In addition

np.interp provides linear interpolation in 1 dim
np.poly1d is a class for manipulating polynomials
np.histogram gives histogram information

See docstrings for details

A Note on Pylab and Matplotlib

A quick aside on the relationship between Pylab and Matplotlib

One way to do plots with Matplotlib is like this

import pylab
pylab.plot([1, 2, 3])
pylab.show()

The same can be achieved by

import matplotlib.pyplot as plt
plt.plot([1, 2, 3])
plt.show()

What is the difference?

We can see the difference in the pylab initialization file:

## some stuff
from numpy import *
from numpy.fft import *
from numpy.random import *
from numpy.linalg import *
## some more stuff
from matplotlib.pyplot import *
## some more stuff

Thus, import pylab brings in

everything from the NumPy namespace
everything from various NumPy submodules (random, linalg, etc.)
everything from matplotlib.pyplot

The plotting functions are in matplotlib.pyplot

Exercises

Here is a common problem in simulation

Let's say we have an array q which represents a probability mass function

Array of nonnegative floats with q.sum() equal to one
Let len(q) be equal to N

We want to generate a random variable X such that

X takes values in 0,...,N-1
The probability that X = i is equal to q[i]

The standard (inverse transform) algorithm is as follows

Divide unit interval [0, 1] into N subintervals I₀, I₁,...,I_N-1, such that the length of I_i is q[i]
Draw a uniform r.v. U on [0, 1] and
Return the i such that U is in I_i

The probability of drawing i is the length of I_i, which is equal to q[i]

EDTC provides some code for this operation

from random import uniform

def sample(q):
    """
    Returns i with probability q[i], where q is an array 
    (e.g., list or tuple).
    """
    a = 0.0
    U = uniform(0,1)  
    N = len(q)
    for i in range(N):
        if a < U <= a + q[i]:
            return i
        a = a + q[i]

Can you see how it works?

If not, try thinking through the flow for a simple example
- e.g., set q = [0.25, 0.75]
- Consider different values of U
- It helps to sketch the intervals on paper

Problem 1

Make the following two improvements to this code:

Speed it up using NumPy, avoiding for and while loops
- Hint: Use searchsorted() and cumsum()
Implement as a class, where each instance stores its own value of q
- Give the class a draw() method, which returns one draw from {0,...,len(q)-1} according to q
  - Or even better, write the method so that draw(n) returns n draws from q

Solution:

## Filename: discreterv.py
## Author: John Stachurski

from numpy import cumsum
from numpy.random import uniform

class DiscreteRV:
    """
    Each instance is provided with an array of probabilities q. 
    The draw() method returns x with probability q[x].
    """

    def __init__(self, q):
        self.set_q(q)

    def set_q(self, q):
        self.Q = cumsum(q)   # Cumulative sum

    def draw(self, n=1):
        """
        Returns n draws from q
        """
        return self.Q.searchsorted(uniform(0, 1, size=n))

A bit tricky, but with some thought you will be able to understand it

Problem 2

In a previous lecture and EDTC, section 6.1.2, I discuss the empirical (cumulative) distribution function

Thus F_n(x) is the fraction of the sample that falls below x.

This function approximates the true distribution function F

Converges to F(x) with probability one as n goes to infinity for all x

The ECDF is implemented as follows:

## Filename: ecdf.py
## Author: John Stachurski

class ECDF:

    def __init__(self, observations):
        self.observations = observations

    def __call__(self, x):
        counter = 0.0
        for obs in self.observations:
            if obs <= x:
                counter += 1
        return counter / len(self.observations)

Here's an example of usage:

from ecdf import ECDF
samples = np.random.uniform(0, 1, size=10)
F = ECDF(samples)
F(0.5)  # Returned 0.29
F.observations = np.random.uniform(0, 1, size=1000)
F(0.5)  # Returned 0.479

Our implementation is not very efficient because it uses a for loop

Problem:

Make the __call__ method more efficient using NumPy.
Add a method which plots the ECDF over [a, b]
- Here a and b are method parameters

Solution:

## Filename: ecdf2.py
## Author: John Stachurski

import numpy as np
import matplotlib.pyplot as plt

class ECDF:

    def __init__(self, observations):
        "Parameters: observations is a NumPy array."
        self.observations = observations

    def __call__(self, x): 
        try:
            return (self.observations <= x).mean()
        except AttributeError:
            # self.observations probably needs to be converted to
            # a NumPy array
            self.observations = np.array(self.observations)
            return (self.observations <= x).mean()

    def plot(self, a=None, b=None): 
        if a == None:
            # Set up a reasonable default
            a = self.observations.min() - self.observations.std()
        if b == None:
            # Set up a reasonable default
            b = self.observations.max() + self.observations.std()
        X = np.linspace(a, b, num=100)
        f = np.vectorize(self.__call__)
        plt.plot(X, f(X))
        plt.show()

Open Mind Tree

Computational Economics lecture 19

More NumPy

Linear Algebra

Random Sampling

Other Goodies

A Note on Pylab and Matplotlib

Exercises

0 comments:

Popular Posts

Visitors

Archives

Infolinks In Text Ads

Featured Posts

Blogger Tips