Saturday, December 24, 2011

Computational economics lecture 18


NumPy

NumPy is a first class library for numerical programming
Numerical libraries such as NumPy provide code for common numerical operations
  • Creating and manipulating arrays of numbers
    • sums, means, matrix operations
  • Generating random numbers, etc.
In non-compiled languages such as Python, MATLAB, etc., there is an extra value:
These languages are inherently slower than compiled languages like C and Fortran
Using numerical libraries, individual operations on large arrays can be shifted out to fast compiled code

Introduction to NumPy

Used by scientists all around the world
  • Engineering, biology, astronomy, geosciences, artificial intelligence, etc.
Mature, fast and and stable
NumPy can be installed individually or as part of SciPy
  • I recommend the latter as we will be using SciPy soon
  • Follow the download instructions at scipy.org
NumPy provides the definition of a numerical array class, plus some utilities
  • Arrays are implemented in C
  • Array operations use highly optimized, compiled C code
The utilities are Python interfaces to fast numerical routines compiled from C and Fortran
The rest of the lecture assumes that NumPy has been imported as np
>>> import numpy as np
Although this lecture cannot cover all the details of NumPy, remember
  • The docstrings are your friends, so use them
    • In the Python shell help(np.poly1d)
    • In an IPython shell np.poly1d?

NumPy Arrays

Let's now turn to the basics of NumPy arrays
Internally, a NumPy array is an instance of numpy.ndarray (np.ndarray)
Like a native Python list, except that
  • Data must be homogeneous (all elements of the same type)
  • These types must be one of the data types (dtypes) provided by NumPy

Data Types

Some dtypes:
bool8 bit True or False
int3232 bit integer
int6464 bit integer
float3232 bit floating point number
float6464 bit floating point number
There are also dtypes to represent complex numbers, unsigned integers, etc
At present, the default dtype for arrays is float64
  • When you create arrays, they are of this type unless otherwise specified
Here's an example
>>> Z = np.zeros(10)
>>> Z
array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.])
>>> type(Z)
<type 'numpy.ndarray'>
>>> type(Z[0])
<type 'numpy.float64'>
zeros() is a NumPy function that returns an ndarray of zeros
If we had wanted to use integers we could have specified as follows:
>>> Z = np.zeros(10, dtype=int)  # Here 'int' is equivalent to 'np.int32'
>>> Z
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
There's no problem mixing native Python numeric types and NumPy dtypes:
>>> Z = np.zeros(1)
>>> Z
array([ 0.])
>>> Z[0] = Z[0] + 1    # Add a Python int to a NumPy float64
>>> Z
array([ 1.])           # The int is upcast to a float64 before addition
>>> type(Z[0])
<type 'numpy.float64'>

Shape and Dimension

Suppose we create an array such as
>>> Z = np.zeros(10)
At present, Z has no dimension:
  • It is neither a row vector nor a column vector
  • We say that the array is "flat"
The dimension is recorded in the shape attribute, which is a tuple
>>> Z.shape
(10,)
The shape tuple has only one element, which is the length of the array
  • Recall that tuples with one element have a comma after that element
To give it dimension, we need to change the shape attribute
>>> Z.shape = (10, 1)   # Column vector: 10 rows and 1 column
>>> Z
array([[ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.]])
>>> Z = np.zeros(4)
>>> Z.shape = (2, 2)
>>> Z
array([[ 0.,  0.],
       [ 0.,  0.]])
In the last case, we could also pass a tuple to the zeros() function
>>> Z = np.zeros((2, 2))
>>> Z.shape
(2, 2)
>>> Z
array([[ 0.,  0.],
       [ 0.,  0.]])

Creating Arrays

NumPy arrays can be created from Python lists, tuples, etc. using array() or asarray()
>>> Z = np.array([10, 20])                 # ndarray from Python list
>>> Z
array([10, 20])
>>> type(Z)
<type 'numpy.ndarray'>
>>> Z = np.array((10, 20), dtype=float)    # Here 'float' is equivalent to 'np.float64'
>>> Z
array([ 10.,  20.])
>>> Z = np.array([[1, 2], [3, 4]])         # 2D array from a list of lists
>>> Z
array([[1, 2],
       [3, 4]])
>>> Z.shape
(2, 2)
To read in the data from a text file use loadtxt()
Here is the text file test.txt
1 2
3 4
Now
>>> Z = np.loadtxt('test.txt')
>>> Z
array([[ 1.,  2.],
       [ 3.,  4.]])
In addition there are utility functions for creating standard arrays
  • We already met zeros()
  • Another one is empty()
>>> Z = np.empty(3)
>>> Z
array([ -2.41128814e-039,   1.48539705e-313,  -8.30185262e-040])
What are those numbers, by the way?
  • Python allocates 3 contiguous pieces of memory (RAM), each with 64 bits
  • Contents of those memory slots is interpreted as a float64
To generate an array of ones, use ones():
>>> Z = np.ones((4, 4))
>>> Z
array([[ 1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.]])
To set up a grid of evenly spaced numbers use linspace()
>>> Z = np.linspace(2, 4, 5)  # From 2 to 4, with 5 elements
>>> Z
array([ 2. ,  2.5,  3. ,  3.5,  4. ])
>>> Z = np.linspace(2, 4, 6)  # From 2 to 4, with 6 elements
>>> Z
array([ 2. ,  2.4,  2.8,  3.2,  3.6,  4. ])
To create an identity matrix use identity()
>>> Z = np.identity(2)
>>> Z
array([[ 1.,  0.],
       [ 0.,  1.]])

Array Indexing

For a flat array, indexing is the same as Python sequences:
>>> Z = np.linspace(1, 2, 5)
>>> Z
array([ 1.  ,  1.25,  1.5 ,  1.75,  2.  ])
>>> Z[0]
1.0
>>> Z[0:2]
array([ 1.  ,  1.25])
>>> Z[-1]
2.0
For 2D arrays the syntax is as follows:
>>> Z = np.array([[1, 2], [3, 4]])
>>> Z
array([[1, 2],
       [3, 4]])
>>> Z[0, 0]
1
>>> Z[0, 1]
2
And so on
Note that indices are still zero-based, to maintain compatibility with Python sequences
Columns and rows can be extracted as follows
>>> Z[0,:]   # First row
array([1, 2])
>>> Z[:,1]   # Second column
array([2, 4])
NumPy arrays of integers can be used to extract elements:
>>> Z = np.linspace(2, 4, 5)
>>> Z
array([ 2. ,  2.5,  3. ,  3.5,  4. ])
>>> indices = np.array((0, 2, 3))
>>> Z[indices]
array([ 2. ,  3. ,  3.5])
We can also use an array of dtype bool to extract elements
>>> Z
array([ 2. ,  2.5,  3. ,  3.5,  4. ])
>>> d = np.array([0, 1, 1, 0, 0], dtype=bool)
>>> d
array([False,  True,  True, False, False], dtype=bool)
>>> Z[d]
array([ 2.5,  3. ])
An aside: all elements of an array can be set equal to one number using slice notation:
>>> Z = np.empty(3)
>>> Z
array([  3.45740277e-269,  -6.35113306e-040,   5.88173752e-313])
>>> Z[:] = 42
>>> Z
array([ 42.,  42.,  42.])

Comparisons

In general, comparisons are done elementwise:
>>> Z = np.array([2, 3])
>>> Y = np.array([2, 3])
>>> Z == Y
array([ True,  True], dtype=bool)
>>> Y[0] = 5
>>> Z == Y
array([False,  True], dtype=bool)
>>> Z != Y
array([ True, False], dtype=bool)
The situation is similar for ><>= and <=
>>> Z > Y
array([False, False], dtype=bool)
>>> Z < Y
array([ True, False], dtype=bool)
>>> Z >= Y
array([False,  True], dtype=bool)
>>> Z <= Y
array([ True,  True], dtype=bool)
We can also do comparisons against scalars:
>>> Z = np.linspace(0, 10, 5)
>>> Z
array([  0. ,   2.5,   5. ,   7.5,  10. ])
>>> Z > 3
array([False, False,  True,  True,  True], dtype=bool)
This is particularly useful for conditional extraction
>>> b = Z > 3
>>> b
array([False, False,  True,  True,  True], dtype=bool)
>>> Z[b]
array([  5. ,   7.5,  10. ])   # All elements of Z that are > 3
Of course we can do this in one step
>>> Z[Z > 3]   
array([  5. ,   7.5,  10. ])

Algebraic Operations in NumPy

The algebraic operators +-*/ and ** all act elementwise on arrays
>>> A = np.array([1, 2, 3, 4])
>>> B = np.array([5, 6, 7, 8])
>>> A + B
array([ 6,  8, 10, 12])
>>> A * B
array([ 5, 12, 21, 32])
We can also add a scalar to each element as follows
>>> A + 10   
array([11, 12, 13, 14])
Scalar multiplication is similar
>>> A = np.array([1, 2, 3, 4])
>>> A * 10
array([10, 20, 30, 40])
The two dimensional arrays follow the same general rules:
>>> A = np.ones((2, 2))
>>> B = np.ones((2, 2))
>>> A + B
array([[ 2.,  2.],
       [ 2.,  2.]])
>>> A + 10
array([[ 11.,  11.],
       [ 11.,  11.]])
>>> A * B
array([[ 1.,  1.],
       [ 1.,  1.]])
Note that A * B is not the matrix product, it is an elementwise product
To do matrix multiplication we use the dot() function
>>> A
array([[ 1.,  1.],
       [ 1.,  1.]])
>>> B
array([[ 1.,  1.],
       [ 1.,  1.]])
>>> np.dot(A, B)
array([[ 2.,  2.],
       [ 2.,  2.]])
With dot() we can take the inner product of two flat arrays
>>> A = np.array([1, 2])
>>> B = np.array([10, 20])
>>> np.dot(A, B)   # Returns a scalar in this case
50
We can use dot() with a Python list or tuple
>>> A = np.empty((2, 2))
>>> A
array([[ -2.41124330e-039,   0.00000000e+000],
       [  2.12199579e-314,   9.88131292e-324]])
>>> np.dot(A, (0, 1))
array([  0.00000000e+000,   9.88131292e-324])
Here dot() knows we are postmultiplying, so (0, 1) is treated as a column vector
I could go on but you get the general idea...

Array Methods

Here are some useful methods of the ndarray class
>>> A = np.array((4, 3, 2, 1))
>>> A
array([4, 3, 2, 1])
>>> A.sort()              # Sorts A in place
>>> A
array([1, 2, 3, 4])        
>>> A.sum()               # Sum
10
>>> A.mean()              # Mean
2.5
>>> A.max()               # Max
4
>>> A.argmax()            # Returns the index of the maximal element
3
>>> A.cumsum()            # Cumulative sum of the elements of A
array([ 1,  3,  6, 10])
>>> A.cumprod()           # Cumulative product of the elements of A
array([ 1,  2,  6, 24])
>>> A.var()               # Variance
1.25
>>> A.std()               # Standard deviation
1.1180339887498949
>>> A.shape = (2, 2)
>>> A.T                   # Equivalent to A.transpose()
array([[1, 3],
       [2, 4]])
A slightly more complicated method is searchsorted()
  • Suppose that Z is a nondecreasing array, then
  • Z.searchsorted(a) returns index of first z in Z such that z >= a
>>> Z = np.linspace(2, 4, 5)
>>> Z
array([ 2. ,  2.5,  3. ,  3.5,  4. ])
>>> Z.searchsorted(2.2)
1
>>> Z.searchsorted(2.5)
1
>>> Z.searchsorted(2.6)
2
Note that these methods have equivalent functions in the NumPy namespace
>>> A = np.array((4, 3, 2, 1))
>>> np.sum(A)    #  Use NumPy's sum(). The built-in sum() is slower.
10
>>> np.mean(A)
2.5
>>> np.sort(A)
array([1, 2, 3, 4])

Exercises

Problem 1
Consider evaluating the polynomial


at some x given the vector of coefficients
We can do this using np.poly1d, but for the sake of the exercise don't use this class
We also want to avoid for and while loops, which are slow
Write a function which takes the vector of coefficients and a number x, and returns the value p(x)
  • Hint: Use np.cumprod()
Solution
import numpy as np

def p(coef, x): 
    X = np.empty(len(coef))
    X[0] = 1 
    X[1:] = x 
    Y = np.cumprod(X)   # Y = [1, x, x**2,...]
    return np.dot(coef, Y)

0 comments: