Open Mind Tree: Computational economics lecture 18

NumPy

NumPy is a first class library for numerical programming

Numerical libraries such as NumPy provide code for common numerical operations

Creating and manipulating arrays of numbers
- sums, means, matrix operations
Generating random numbers, etc.

In non-compiled languages such as Python, MATLAB, etc., there is an extra value:

These languages are inherently slower than compiled languages like C and Fortran

Using numerical libraries, individual operations on large arrays can be shifted out to fast compiled code

Introduction to NumPy

Used by scientists all around the world

Engineering, biology, astronomy, geosciences, artificial intelligence, etc.

Mature, fast and and stable

NumPy can be installed individually or as part of SciPy

I recommend the latter as we will be using SciPy soon
Follow the download instructions at scipy.org

NumPy provides the definition of a numerical array class, plus some utilities

Arrays are implemented in C
Array operations use highly optimized, compiled C code

The utilities are Python interfaces to fast numerical routines compiled from C and Fortran

The rest of the lecture assumes that NumPy has been imported as np

>>> import numpy as np

Although this lecture cannot cover all the details of NumPy, remember

The docstrings are your friends, so use them
- In the Python shell help(np.poly1d)
- In an IPython shell np.poly1d?

NumPy Arrays

Let's now turn to the basics of NumPy arrays

Internally, a NumPy array is an instance of numpy.ndarray (np.ndarray)

Like a native Python list, except that

Data must be homogeneous (all elements of the same type)
These types must be one of the data types (dtypes) provided by NumPy

Data Types

Some dtypes:

bool	8 bit True or False
int32	32 bit integer
int64	64 bit integer
float32	32 bit floating point number
float64	64 bit floating point number

There are also dtypes to represent complex numbers, unsigned integers, etc

At present, the default dtype for arrays is float64

When you create arrays, they are of this type unless otherwise specified

Here's an example

>>> Z = np.zeros(10)
>>> Z
array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.])
>>> type(Z)
<type 'numpy.ndarray'>
>>> type(Z[0])
<type 'numpy.float64'>

zeros() is a NumPy function that returns an ndarray of zeros

If we had wanted to use integers we could have specified as follows:

>>> Z = np.zeros(10, dtype=int)  # Here 'int' is equivalent to 'np.int32'
>>> Z
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

There's no problem mixing native Python numeric types and NumPy dtypes:

>>> Z = np.zeros(1)
>>> Z
array([ 0.])
>>> Z[0] = Z[0] + 1    # Add a Python int to a NumPy float64
>>> Z
array([ 1.])           # The int is upcast to a float64 before addition
>>> type(Z[0])
<type 'numpy.float64'>

Shape and Dimension

Suppose we create an array such as

>>> Z = np.zeros(10)

At present, Z has no dimension:

It is neither a row vector nor a column vector
We say that the array is "flat"

The dimension is recorded in the shape attribute, which is a tuple

>>> Z.shape
(10,)

The shape tuple has only one element, which is the length of the array

Recall that tuples with one element have a comma after that element

To give it dimension, we need to change the shape attribute

>>> Z.shape = (10, 1)   # Column vector: 10 rows and 1 column
>>> Z
array([[ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.]])
>>> Z = np.zeros(4)
>>> Z.shape = (2, 2)
>>> Z
array([[ 0.,  0.],
       [ 0.,  0.]])

In the last case, we could also pass a tuple to the zeros() function

>>> Z = np.zeros((2, 2))
>>> Z.shape
(2, 2)
>>> Z
array([[ 0.,  0.],
       [ 0.,  0.]])

Creating Arrays

NumPy arrays can be created from Python lists, tuples, etc. using array() or asarray()

>>> Z = np.array([10, 20])                 # ndarray from Python list
>>> Z
array([10, 20])
>>> type(Z)
<type 'numpy.ndarray'>
>>> Z = np.array((10, 20), dtype=float)    # Here 'float' is equivalent to 'np.float64'
>>> Z
array([ 10.,  20.])
>>> Z = np.array([[1, 2], [3, 4]])         # 2D array from a list of lists
>>> Z
array([[1, 2],
       [3, 4]])
>>> Z.shape
(2, 2)

To read in the data from a text file use loadtxt()

Here is the text file test.txt

1 2
3 4

Now

>>> Z = np.loadtxt('test.txt')
>>> Z
array([[ 1.,  2.],
       [ 3.,  4.]])

In addition there are utility functions for creating standard arrays

We already met zeros()
Another one is empty()

>>> Z = np.empty(3)
>>> Z
array([ -2.41128814e-039,   1.48539705e-313,  -8.30185262e-040])

What are those numbers, by the way?

Python allocates 3 contiguous pieces of memory (RAM), each with 64 bits
Contents of those memory slots is interpreted as a float64

To generate an array of ones, use ones():

>>> Z = np.ones((4, 4))
>>> Z
array([[ 1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.]])

To set up a grid of evenly spaced numbers use linspace()

>>> Z = np.linspace(2, 4, 5)  # From 2 to 4, with 5 elements
>>> Z
array([ 2. ,  2.5,  3. ,  3.5,  4. ])
>>> Z = np.linspace(2, 4, 6)  # From 2 to 4, with 6 elements
>>> Z
array([ 2. ,  2.4,  2.8,  3.2,  3.6,  4. ])

To create an identity matrix use identity()

>>> Z = np.identity(2)
>>> Z
array([[ 1.,  0.],
       [ 0.,  1.]])

Array Indexing

For a flat array, indexing is the same as Python sequences:

>>> Z = np.linspace(1, 2, 5)
>>> Z
array([ 1.  ,  1.25,  1.5 ,  1.75,  2.  ])
>>> Z[0]
1.0
>>> Z[0:2]
array([ 1.  ,  1.25])
>>> Z[-1]
2.0

For 2D arrays the syntax is as follows:

>>> Z = np.array([[1, 2], [3, 4]])
>>> Z
array([[1, 2],
       [3, 4]])
>>> Z[0, 0]
1
>>> Z[0, 1]
2

And so on

Note that indices are still zero-based, to maintain compatibility with Python sequences

Columns and rows can be extracted as follows

>>> Z[0,:]   # First row
array([1, 2])
>>> Z[:,1]   # Second column
array([2, 4])

NumPy arrays of integers can be used to extract elements:

>>> Z = np.linspace(2, 4, 5)
>>> Z
array([ 2. ,  2.5,  3. ,  3.5,  4. ])
>>> indices = np.array((0, 2, 3))
>>> Z[indices]
array([ 2. ,  3. ,  3.5])

We can also use an array of dtype bool to extract elements

>>> Z
array([ 2. ,  2.5,  3. ,  3.5,  4. ])
>>> d = np.array([0, 1, 1, 0, 0], dtype=bool)
>>> d
array([False,  True,  True, False, False], dtype=bool)
>>> Z[d]
array([ 2.5,  3. ])

An aside: all elements of an array can be set equal to one number using slice notation:

>>> Z = np.empty(3)
>>> Z
array([  3.45740277e-269,  -6.35113306e-040,   5.88173752e-313])
>>> Z[:] = 42
>>> Z
array([ 42.,  42.,  42.])

Comparisons

In general, comparisons are done elementwise:

>>> Z = np.array([2, 3])
>>> Y = np.array([2, 3])
>>> Z == Y
array([ True,  True], dtype=bool)
>>> Y[0] = 5
>>> Z == Y
array([False,  True], dtype=bool)
>>> Z != Y
array([ True, False], dtype=bool)

The situation is similar for >, <, >= and <=

>>> Z > Y
array([False, False], dtype=bool)
>>> Z < Y
array([ True, False], dtype=bool)
>>> Z >= Y
array([False,  True], dtype=bool)
>>> Z <= Y
array([ True,  True], dtype=bool)

We can also do comparisons against scalars:

>>> Z = np.linspace(0, 10, 5)
>>> Z
array([  0. ,   2.5,   5. ,   7.5,  10. ])
>>> Z > 3
array([False, False,  True,  True,  True], dtype=bool)

This is particularly useful for conditional extraction

>>> b = Z > 3
>>> b
array([False, False,  True,  True,  True], dtype=bool)
>>> Z[b]
array([  5. ,   7.5,  10. ])   # All elements of Z that are > 3

Of course we can do this in one step

>>> Z[Z > 3]   
array([  5. ,   7.5,  10. ])

Algebraic Operations in NumPy

The algebraic operators +, -, *, / and ** all act elementwise on arrays

>>> A = np.array([1, 2, 3, 4])
>>> B = np.array([5, 6, 7, 8])
>>> A + B
array([ 6,  8, 10, 12])
>>> A * B
array([ 5, 12, 21, 32])

We can also add a scalar to each element as follows

>>> A + 10   
array([11, 12, 13, 14])

Scalar multiplication is similar

>>> A = np.array([1, 2, 3, 4])
>>> A * 10
array([10, 20, 30, 40])

The two dimensional arrays follow the same general rules:

>>> A = np.ones((2, 2))
>>> B = np.ones((2, 2))
>>> A + B
array([[ 2.,  2.],
       [ 2.,  2.]])
>>> A + 10
array([[ 11.,  11.],
       [ 11.,  11.]])
>>> A * B
array([[ 1.,  1.],
       [ 1.,  1.]])

Note that A * B is not the matrix product, it is an elementwise product

To do matrix multiplication we use the dot() function

>>> A
array([[ 1.,  1.],
       [ 1.,  1.]])
>>> B
array([[ 1.,  1.],
       [ 1.,  1.]])
>>> np.dot(A, B)
array([[ 2.,  2.],
       [ 2.,  2.]])

With dot() we can take the inner product of two flat arrays

>>> A = np.array([1, 2])
>>> B = np.array([10, 20])
>>> np.dot(A, B)   # Returns a scalar in this case
50

We can use dot() with a Python list or tuple

>>> A = np.empty((2, 2))
>>> A
array([[ -2.41124330e-039,   0.00000000e+000],
       [  2.12199579e-314,   9.88131292e-324]])
>>> np.dot(A, (0, 1))
array([  0.00000000e+000,   9.88131292e-324])

Here dot() knows we are postmultiplying, so (0, 1) is treated as a column vector

I could go on but you get the general idea...

Array Methods

Here are some useful methods of the ndarray class

>>> A = np.array((4, 3, 2, 1))
>>> A
array([4, 3, 2, 1])
>>> A.sort()              # Sorts A in place
>>> A
array([1, 2, 3, 4])        
>>> A.sum()               # Sum
10
>>> A.mean()              # Mean
2.5
>>> A.max()               # Max
4
>>> A.argmax()            # Returns the index of the maximal element
3
>>> A.cumsum()            # Cumulative sum of the elements of A
array([ 1,  3,  6, 10])
>>> A.cumprod()           # Cumulative product of the elements of A
array([ 1,  2,  6, 24])
>>> A.var()               # Variance
1.25
>>> A.std()               # Standard deviation
1.1180339887498949
>>> A.shape = (2, 2)
>>> A.T                   # Equivalent to A.transpose()
array([[1, 3],
       [2, 4]])

A slightly more complicated method is searchsorted()

Suppose that Z is a nondecreasing array, then
Z.searchsorted(a) returns index of first z in Z such that z >= a

>>> Z = np.linspace(2, 4, 5)
>>> Z
array([ 2. ,  2.5,  3. ,  3.5,  4. ])
>>> Z.searchsorted(2.2)
1
>>> Z.searchsorted(2.5)
1
>>> Z.searchsorted(2.6)
2

Note that these methods have equivalent functions in the NumPy namespace

>>> A = np.array((4, 3, 2, 1))
>>> np.sum(A)    #  Use NumPy's sum(). The built-in sum() is slower.
10
>>> np.mean(A)
2.5
>>> np.sort(A)
array([1, 2, 3, 4])

Exercises

Problem 1

Consider evaluating the polynomial

at some x given the vector of coefficients

We can do this using np.poly1d, but for the sake of the exercise don't use this class

We also want to avoid for and while loops, which are slow

Write a function which takes the vector of coefficients and a number x, and returns the value p(x)

Hint: Use np.cumprod()

Solution

import numpy as np

def p(coef, x): 
    X = np.empty(len(coef))
    X[0] = 1 
    X[1:] = x 
    Y = np.cumprod(X)   # Y = [1, x, x**2,...]
    return np.dot(coef, Y)

Open Mind Tree

Computational economics lecture 18

NumPy

Introduction to NumPy

NumPy Arrays

Data Types

Shape and Dimension

Creating Arrays

Array Indexing

Comparisons

Algebraic Operations in NumPy

Array Methods

Exercises

0 comments:

Popular Posts

Visitors

Archives

Infolinks In Text Ads

Featured Posts

Blogger Tips