Saturday, December 24, 2011

Computational economics lecture 29


Generators

A generator is a kind of iterator (i.e., it implements a next() method)
We will study two ways to build generators
  • Generator expressions
  • Generator functions

Generator Expressions

The easiest way to build generators is using generator expressions
Just like a list comprehension, but with round brackets
Here is the list comprehension:
>>> singular = ('dog', 'cat', 'bird')  
>>> type(singular)
<type 'tuple'>
>>> plural = [string + 's' for string in singular]  # Creates a list
>>> plural
['dogs', 'cats', 'birds']
>>> type(plural)
<type 'list'>
And here is the generator expression
>>> singular = ('dog', 'cat', 'bird')  
>>> plural = (string + 's' for string in singular)  # Creates a generator
>>> type(plural)
<type 'generator'>
>>> plural.next()
'dogs'
>>> plural.next()
'cats'
>>> plural.next()
'birds'
Since sum() can be called on iterators, we can do this
>>> sum((x * x for x in range(10)))
285
The function sum() calls next() to get the items, adds successive terms
In fact, we can omit the outer brackets in this case
>>> sum(x * x for x in range(10))
285

Generator Functions

The most flexible way to create generator objects
(Note that this section is technical, and you can probably get by without it)
Here's an example
Example 1
def f():
    yield 'start'
    yield 'middle'
    yield 'end'
Here f() is called a generator function
Looks like a function, uses new keyword yield
Let's see how it works
john@c246:~/sync_dir/teaching/kyoto_08$ python -i temp.py 
>>> type(f)           # f itself is a function
<type 'function'>
>>> gen = f()         # Creates a generator object
>>> gen
<generator object at 0xb7cf31ac>
>>> gen.next()
'start'
>>> gen.next()
'middle'
>>> gen.next()
'end'
>>> gen.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>> 
The function f() is used to create generator objects (in this case gen)
Generators are iterators, because they support a next() method
The first call to gen.next()
  • Executes code in the body of f() until it meets a yield statement
  • Returns that value to the caller of gen.next()
The second call to gen.next()
  • Starts executing from the next line
def f():
    yield 'start'
    yield 'middle'  # This line!
    yield 'end'
  • Continues until the next yield statement
  • Returns that value to the caller of gen.next()
  • Etc.
When the code block ends, throws a StopIteration error
Example 2
Our next example receives an argument x from the caller
def g(x):
    while x < 100:
        yield x
        x = x * x 
Let's see how it works
john@c246:~$ python -i test.py 
>>> g
<function g at 0xb7d6b25c>
>>> gen = g(2)  # Call generator function to make a generator
>>> type(gen)   # gen is an object of type generator
<type 'generator'>
>>> gen.next()  # Generators are iterators 
2
>>> gen.next()
4
>>> gen.next()
16
>>> gen.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>> 
The call gen = g(2) binds gen to a generator
Inside the generator, the name x is bound to 2
When we call gen.next()
  • The body of g() executes until the line yield x
  • The value of x is returned
Note that value of x is retained inside the generator
When we call gen.next() again, execution continues from where it left off
def g(x):
    while x < 100:
        yield x
        x = x * x  # execution continues from here
Continues until yield x, returns the value of x, repeats
When x < 100 fails, throws a StopIteration error
Here's the generator used with for
gen = g(2)
for v in gen:
    print v
Note that the loop inside the generator can be infinite
def g(x):
    while 1:
        yield x
        x = x * x 
Here's how it works
>>> gen = g(3)
>>> gen.next()
3
>>> gen.next()
9
>>> gen.next()
81
>>> gen.next()
6561
>>> gen.next()
43046721
>>> gen.next()
1853020188851841L
Don't use this in a for loop ; )

Advantages of Iterators

What's the advantage of using an iterator here?
Suppose we want to sample a binomial(n,0.5)
One way to do it is as follows
>>> n = 10000000
>>> draws = [random.uniform(0, 1) < 0.5 for i in range(n)]
>>> sum(draws)
But we are creating two huge lists here
  • range(n), and
  • draws
Uses up lots of memory, very slow
If I make n even bigger then my computer refuses to allocate the memory
>>> n = 1000000000
>>> draws = [random.uniform(0, 1) < 0.5 for i in range(n)]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
MemoryError
We can avoid these problems using iterators
Here is the generator function:
import random

def f(n):
    i = 1 
    while i <= n:
        yield random.uniform(0, 1) < 0.5 
        i += 1
Now let's do the sum:
john@c246:~/sync_dir/teaching/kyoto_08$ python -i temp.py 
>>> n = 10000000
>>> draws = f(n)
>>> draws
<generator object at 0xb7d8b2cc>
>>> sum(draws)
4999141
In summary
  • Iterables avoid the need to create big lists/tuples
  • Provide a uniform interface to iteration
    • Can be used transparently in for loops

Exercises

Exercise 1
Write a generator which yields a time series for the quadratic map



Inputs to the generator are x0 and n, the length of the series
Plot a series with Matplotlib
Exercise 2
Complete the following code, and test it using this file
def column_iterator(target_file, column_number):
    """A generator function for CSV files.
    When called with a file name target_file (string) and column number 
    column_number (integer), the generator function returns a generator 
    which steps through the elements of column column_number in file
    target_file.
    """
    # put your code here

dates = column_iterator('table.csv', 1) 

for date in dates:
    print date

Solutions

Solution to Exercise 1:
## Filename: quadmap.py
## Author: John Stachurski

import pylab

def qm(x, n):
    i = 0
    while i < n:
        yield x
        x = 4 * (1 - x) * x
        i += 1

h = qm(0.1, 200)

time_series = [x for x in h]
pylab.plot(time_series)
pylab.show()
Solution to Exercise 2:
def column_iterator(target_file, column_number):
    """A generator function for CSV files.
    When called with a file name target_file (string) and column number 
    column_number (integer), the generator function returns a generator 
    which steps through the elements of column column_number in file
    target_file.
    """
    f = open(target_file, 'r')
    for line in f:
        yield line.split(',')[column_number - 1]
    f.close()

dates = column_iterator('table.csv', 1) 

for date in dates:
    print date
 


Handling Exceptions

Runtime errors cause execution of the relevant program to stop
  • Frustrating if you're in the middle of a large computation
  • Sometimes the debugging information is not very informative
  • Disappointing for users of your code
A better way is to add code to your program that deals with errors as they occur

Errors

If you've got this far then you've already seen lots of errors
Here's an example:
>>> def f:
  File "<stdin>", line 1
    def f:
         ^
SyntaxError: invalid syntax
This is a syntax error
  • Illegal syntax cannot be executed
  • Always terminates execution of the program
Here's a different kind of error, unrelated to syntax:
>>> 1 / 0
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ZeroDivisionError: integer division or modulo by zero
Here's another:
>>> x = y
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'y' is not defined
And another:
>>> 'foo' + 6
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: cannot concatenate 'str' and 'int' objects
And another:
>>> X = []
>>> x = X[0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list index out of range
On each occaision, the interpreter informs us of the error type
  • IndexErrorZeroDivisionError, etc.
In Python, these errors are called exceptions
We can catch and deal with exceptions using try-except blocks
Here's a simple example
def f(x):
    try:
        return 1.0 / x 
    except ZeroDivisionError:
        print 'Error: division by zero.  Returned None'
    return None
When we call f we get the following output
>>> f(2)
0.5
>>> f(0)
Error: division by zero.  Returned None
>>> f(0.0)
Error: division by zero.  Returned None
>>> 
The error is caught and execution of the program is not terminated
Note that other error types are not caught
If we are worried the user might pass in a string, we can catch that error too
def f(x):
    try:
        return 1.0 / x 
    except ZeroDivisionError:
        print 'Error: Division by zero.  Returned None'
    except TypeError:
        print 'Error: Unsupported operation.  Returned None'
    return None
Here's what happens
>>> f(2)
0.5
>>> f(0)
Error: Division by zero.  Returned None
>>> f('foo')
Error: Unsupported operation.  Returned None
>>> 
If we feel lazy we can catch these errors together
def f(x):
    try:
        return 1.0 / x 
    except (TypeError, ZeroDivisionError):
        print 'Error: Unsupported operation.  Returned None'
    return None
Here's what happens
>>> f(2)
0.5
>>> f(0)
Error: Unsupported operation.  Returned None
>>> f('foo')
Error: Unsupported operation.  Returned None
If we feel extra lazy we can catch all error types as follows
def f(x):
    try:
        return 1.0 / x 
    except:
        print 'Error.  Returned None'
    return None
In general it's better to be specific
Problem:
Suppose we have a text file numbers.txt containing
prices
3
8

7
21
Using try-except, write a program to read in the contents of the file and sum the numbers, ignoring lines without numbers
Solution:
f = open('numbers.txt')

total = 0.0 
for line in f:
    try:
        total += float(line)
    except ValueError:
        pass

f.close()

print total


Name Resolution

Time for the details

Global and Local Namespaces

These are commonly used terms
Let's try to understand the definitions

Global Namespaces

The global namespace is the namespace of the module currently being executed
To list this namespace, use globals()
For example, suppose that we start the interpreter and begin making assignments
>>> x = 3
>>> dir()
['__builtins__', '__doc__', '__name__', 'x']
We are now working in the module __main__
Hence the namespace for __main__ is now the global namespace
Next, we import a module called amodule
>>> import amodule
The interpreter
  • Creates a namespace for the module
  • Starts executing commands in the module
The namespace amodule.__dict__ is now the global namespace
Once execution of the module finishes, interpreter returns to the module from where the import statement was made
  • In this case it's __main__
Now the namespace of __main__ is the global namespace

Local Namespaces

Suppose that inside a module we have access to a function f
def f(x):
    a = 2
    return a * x
Now we call the function
y = f(1)
The interpreter creates a local namespace for the function, and registers the variables in that namespace
Variables in the namespace are called local variables
After the function returns, the namespace is deallocated (lost)
We can view the contents of the local namespace with locals()
def f(x):
    a = 2
    print locals()
    return a * x
Now we call the function
y = f(1)
{'a': 2, 'x': 1}

The __builtins__ Namespace

We have been using some built-in objects (mainly functions)
  • max(), dir(), str(), list(), len(), range(), type(), etc.
How does access to these names work?
  • These definitions are stored in a module called __builtin__
  • They have there own namespace, called __builtins__
>>> dir()  
['__builtins__', '__doc__', '__name__']
>>> dir(__builtins__)
[... 'iter', 'len', 'license', 'list', 'locals', ...] 
We can access elements of the namespace as follows
>>> __builtins__.max
<built-in function max>
But __builtins__ is special, because we can access them directly as well:
>>> max
<built-in function max>
>>> __builtins__.max == max
True
The reason why this works is explained in the next section...

Name resolution

When we reference a name, how does the Python interpreter find the corresponding value?

The Process of Name Resolution

At any point of execution, there are two or three namespaces which can be accessed directly
  • Directly means without using a dot
    • pi rather than math.pi
If the interpreter is not executing a function call then the namespaces are
  • The global namespace (of the module being executed)
  • The builtin namespace
Suppose that we refer to a name
print x
The interpreter
  • First looks in the global namespace for x
  • If it's not there, looks in the builtin namespace
  • If it's not there, raises a NameError
If the interpreter is executing a function call
y = f(1)
Then the namespaces are
  • The local namespace of f
  • The global namespace (of the module being executed)
  • The builtin namespace
Now the interpreter
  • First looks in the local namespace
  • Then in the global namespace
  • Then in the builtin namespace
  • If it's not there, raises a NameError

Consequences

Consider the script test.py
def g(x):
    a = 1
    x = x + a
    return x

a = 0
y = g(10)
print "a = ", a, "y = ", y
What happens when I run this script?
$ python -i test.py
a = 0 , y = 11
>>> x
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'x' is not defined
  • The global namespace {} is created
  • The function object is created, g is bound to it
    • g stored in the global namespace
  • Next, the global variable a is created, bound to 0
    • a stored in the global namespace
  • Next g is called via y = g(10)
    • Local namespace {} is constructed
    • Local x and a are added to it
      • Local namespace = {'x': 10, 'a': 1}
      • x = x + a uses the local a
      • Local a does not interfere with global a
  • After computations, return value assigned to y
  • Local x and a are discarded (namespace is deallocated)
Note that global a was not affected by local a


Mutable Versus Immutable Parameters

Consider the following code segment
def f(x):
    x = x + 1
    return x
x = 1
print f(x), x 
Prints 2 as the value of f(x) and 1 as the value of x
  • f is registered as a function in the global namespace
  • x bound to 1 in the global namespace
  • Call f(x)
    • Creates a local namespace
    • Adds x to local namespace, bound to 1
    • Rebinds this local x to the new integer object 2
      • The value 1 is now garbage collected
    • Returns the value of local x
    • Local namespace deallocated, local x lost
  • Prints the return value 2 for f(x), and the value 1 for global x
Different story when we use a mutable data type such as a list:
def f(x):
    x[0] = x[0] + 1
    return x
x = [1]
print f(x), x 
Prints [2] as the value of f(x) and same for x
  • f is registered as a function in the global namespace
  • x bound to [1] in the global namespace
  • Call f(x)
    • Creates a local namespace
    • Adds x to local namespace, bound to [1]
    • The list [1] is modified to [2]
    • Returns the list [2]
    • Local namespace deallocated, local x lost
  • Global x has been modified
Conclusion: functions can modify global variables if they are mutable