Saturday, December 24, 2011

Computational economics lecture 29


Generators

A generator is a kind of iterator (i.e., it implements a next() method)
We will study two ways to build generators
  • Generator expressions
  • Generator functions

Generator Expressions

The easiest way to build generators is using generator expressions
Just like a list comprehension, but with round brackets
Here is the list comprehension:
>>> singular = ('dog', 'cat', 'bird')  
>>> type(singular)
<type 'tuple'>
>>> plural = [string + 's' for string in singular]  # Creates a list
>>> plural
['dogs', 'cats', 'birds']
>>> type(plural)
<type 'list'>
And here is the generator expression
>>> singular = ('dog', 'cat', 'bird')  
>>> plural = (string + 's' for string in singular)  # Creates a generator
>>> type(plural)
<type 'generator'>
>>> plural.next()
'dogs'
>>> plural.next()
'cats'
>>> plural.next()
'birds'
Since sum() can be called on iterators, we can do this
>>> sum((x * x for x in range(10)))
285
The function sum() calls next() to get the items, adds successive terms
In fact, we can omit the outer brackets in this case
>>> sum(x * x for x in range(10))
285

Generator Functions

The most flexible way to create generator objects
(Note that this section is technical, and you can probably get by without it)
Here's an example
Example 1
def f():
    yield 'start'
    yield 'middle'
    yield 'end'
Here f() is called a generator function
Looks like a function, uses new keyword yield
Let's see how it works
john@c246:~/sync_dir/teaching/kyoto_08$ python -i temp.py 
>>> type(f)           # f itself is a function
<type 'function'>
>>> gen = f()         # Creates a generator object
>>> gen
<generator object at 0xb7cf31ac>
>>> gen.next()
'start'
>>> gen.next()
'middle'
>>> gen.next()
'end'
>>> gen.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>> 
The function f() is used to create generator objects (in this case gen)
Generators are iterators, because they support a next() method
The first call to gen.next()
  • Executes code in the body of f() until it meets a yield statement
  • Returns that value to the caller of gen.next()
The second call to gen.next()
  • Starts executing from the next line
def f():
    yield 'start'
    yield 'middle'  # This line!
    yield 'end'
  • Continues until the next yield statement
  • Returns that value to the caller of gen.next()
  • Etc.
When the code block ends, throws a StopIteration error
Example 2
Our next example receives an argument x from the caller
def g(x):
    while x < 100:
        yield x
        x = x * x 
Let's see how it works
john@c246:~$ python -i test.py 
>>> g
<function g at 0xb7d6b25c>
>>> gen = g(2)  # Call generator function to make a generator
>>> type(gen)   # gen is an object of type generator
<type 'generator'>
>>> gen.next()  # Generators are iterators 
2
>>> gen.next()
4
>>> gen.next()
16
>>> gen.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>> 
The call gen = g(2) binds gen to a generator
Inside the generator, the name x is bound to 2
When we call gen.next()
  • The body of g() executes until the line yield x
  • The value of x is returned
Note that value of x is retained inside the generator
When we call gen.next() again, execution continues from where it left off
def g(x):
    while x < 100:
        yield x
        x = x * x  # execution continues from here
Continues until yield x, returns the value of x, repeats
When x < 100 fails, throws a StopIteration error
Here's the generator used with for
gen = g(2)
for v in gen:
    print v
Note that the loop inside the generator can be infinite
def g(x):
    while 1:
        yield x
        x = x * x 
Here's how it works
>>> gen = g(3)
>>> gen.next()
3
>>> gen.next()
9
>>> gen.next()
81
>>> gen.next()
6561
>>> gen.next()
43046721
>>> gen.next()
1853020188851841L
Don't use this in a for loop ; )

Advantages of Iterators

What's the advantage of using an iterator here?
Suppose we want to sample a binomial(n,0.5)
One way to do it is as follows
>>> n = 10000000
>>> draws = [random.uniform(0, 1) < 0.5 for i in range(n)]
>>> sum(draws)
But we are creating two huge lists here
  • range(n), and
  • draws
Uses up lots of memory, very slow
If I make n even bigger then my computer refuses to allocate the memory
>>> n = 1000000000
>>> draws = [random.uniform(0, 1) < 0.5 for i in range(n)]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
MemoryError
We can avoid these problems using iterators
Here is the generator function:
import random

def f(n):
    i = 1 
    while i <= n:
        yield random.uniform(0, 1) < 0.5 
        i += 1
Now let's do the sum:
john@c246:~/sync_dir/teaching/kyoto_08$ python -i temp.py 
>>> n = 10000000
>>> draws = f(n)
>>> draws
<generator object at 0xb7d8b2cc>
>>> sum(draws)
4999141
In summary
  • Iterables avoid the need to create big lists/tuples
  • Provide a uniform interface to iteration
    • Can be used transparently in for loops

Exercises

Exercise 1
Write a generator which yields a time series for the quadratic map



Inputs to the generator are x0 and n, the length of the series
Plot a series with Matplotlib
Exercise 2
Complete the following code, and test it using this file
def column_iterator(target_file, column_number):
    """A generator function for CSV files.
    When called with a file name target_file (string) and column number 
    column_number (integer), the generator function returns a generator 
    which steps through the elements of column column_number in file
    target_file.
    """
    # put your code here

dates = column_iterator('table.csv', 1) 

for date in dates:
    print date

Solutions

Solution to Exercise 1:
## Filename: quadmap.py
## Author: John Stachurski

import pylab

def qm(x, n):
    i = 0
    while i < n:
        yield x
        x = 4 * (1 - x) * x
        i += 1

h = qm(0.1, 200)

time_series = [x for x in h]
pylab.plot(time_series)
pylab.show()
Solution to Exercise 2:
def column_iterator(target_file, column_number):
    """A generator function for CSV files.
    When called with a file name target_file (string) and column number 
    column_number (integer), the generator function returns a generator 
    which steps through the elements of column column_number in file
    target_file.
    """
    f = open(target_file, 'r')
    for line in f:
        yield line.split(',')[column_number - 1]
    f.close()

dates = column_iterator('table.csv', 1) 

for date in dates:
    print date