Friday, December 23, 2011

Computational economics Lecture 9


Iterators

Iterators are a uniform interface to stepping through elements in a collection
  • One of the (many) nice features of the Python language...
In this lecture we'll talk about using iterators
In a later lecture we'll learn how to build our own

Definitions

First we define iterators and iterables

Iterators

An iterator is an object with a next() method
For example, file objects (which we met in this lecture) are iterators
Recall that we had a file test.txt with contents
Foo foo
Bar bar
Let's create a file object linked to this file
>>> f = open('test.txt', 'r')
This object has a next() method:
>>> f.next()
'Foo foo\n'
>>> f.next()
'Bar bar\n'
Calling f.next() is essentially the same as calling f.readline()
Other examples are
  • enumerate objects
>>> e = enumerate(['foo', 'bar'])
>>> e.next()
(0, 'foo')
>>> e.next()
(1, 'bar')
  • reader objects from the csv module (which is used to manipulate CSV files)
>>> from csv import reader
>>> nikkei_data = reader(open('table.csv'))  # The reader() function is passed a file object
>>> nikkei_data.next()
['Date', 'Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close']
>>> nikkei_data.next()
['2008-05-19', '14294.52', '14343.19', '14219.08', '14269.61', '133800', '14269.61']
  • objects returned by urllib.urlopen()
>>> import urllib
>>> webpage = urllib.urlopen("http://www.cnn.com")
>>> webpage.next()
'<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN""http://www.w3.org/...' # etc
>>> webpage.next()
'<meta http-equiv="refresh" content="1800;url=?refresh=1">\n'
>>> webpage.next()
'<meta name="Description" content="CNN.com delivers the latest breaking news and information..' # etc 

Iterables

The built-in function iter() can be used for creating iterators from certain objects
An object is said to be iterable if it can be passed to iter()
A good example is a list:
>>> X = ['foo', 'bar']
>>> type(X)
<type 'list'>
>>> Y = iter(X)
>>> type(Y)
<type 'listiterator'>
>>> Y.next()
'foo'
>>> Y.next()
'bar'
>>> Y.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
Another example is a dictionary
>>> d = {'name': 'godzilla', 'height in meters': 10}
>>> d = iter(d)
>>> type(d)
<type 'dictionary-keyiterator'>
>>> d.next()
'height in meters'
>>> d.next()
'name'
The next() method steps through the keys of the dictionary
  • The keys are not ordered, so no notion of "first", "second", etc.
Incidentally, we can get iterators directly
  • d.iterkeys() returns same iterator as iter(d.keys()) or iter(d)
  • d.itervalues() returns same iterator as iter(d.values())
  • d.iteritems() returns same iterator as iter(d.items())
Of course, not all objects are iterable
>>> iter(42)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'int' object is not iterable

Using Iterators

Let's look at some different ways we can use iterators

Iterators in For Loops

A very common use of iterators is in for loops
In fact this is how the for loop works!
for x in iterator:
    <code block>
This is what happens:
  • Interpreter calls iterator.next() and binds x to result
  • Executes code block
  • Repeats until StopIteration error
Remember that in this lecture that we introduced the syntax
f = open('somefile.txt')
for line in f:
    # do something
Now you know how it works:
  • f is bound to an iterator
    • A file object, which implements a next() method
  • Interpreter
    • Calls f.next() and binds line to return value
    • Executes body of loop
    • Repeats until StopIteration error
Another example
for i, x in enumerate(X):
    # do something
Again, enumerate(X) is an iterator
What about this example
X = ['a', 'b']
for x in X:
    print x
Here X is a list (an iterable), not an iterator
Internally, Python calls iter(X) to make an iterator
More generally,
  • for loops work on either iterators or iterables
  • In the second case, the iterable is converted into an iterator
    • iter(iterable)
Here's another example
d = {'name': 'godzilla', 'height in meters': 10}
for key in d:
    # do something
Now you know how this works
Internally, the iterable d is passed to iter()
The resulting iterator steps through the keys of d

Iterators and built-ins

Some built-in functions that act on sequences also work with iterables
  • max()min()sum()all()any()
>>> X = [10, -10]
>>> max(X)
10
>>> Y = iter(X)
>>> type(Y)
<type 'listiterator'>
>>> max(Y)
10

Use and reuse

A major difference in usage is that iterators are depleted by use
>>> X = [10, -10]
>>> Y = iter(X)
>>> max(Y)
10
>>> max(Y)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: max() arg is an empty sequence

Application: Web Data

The application involves downloading data with the module urllib
URL stands for uniform resource locator
Examples:
  • http://www.google.com
  • http://johnstachurski.net/teaching.html
Some URLs have a query string
  • http://www.google.com/search?q=godzilla
The part after (but not including) ? is the query string
Passed to the server as an argument
We can obtain stock price data from Yahoo Finance using query strings, such as
http://ichart.finance.yahoo.com/table.csv?a=00&c=2005&b=01&e=03&d=05&g=d&f=2008&ignore=.csv&s=GOOG
The query string is a collection of field/value pairs, separated by &
The meanings of the main fields are
  • a: start month, base zero (e.g., jan = 0, feb = 1, etc.)
  • b: start day
  • c: start year
  • d: end month, base zero
  • e: end day
  • f: end year
  • g: period (in this case, d = daily)
  • s: ticker symbol for the stock (in this case, Google)
Here is an example of useage
import urllib

base_url = 'http://ichart.finance.yahoo.com/table.csv'

request_data = {'s': 'GOOG',          # Ticker symbol for Google
                'a': '00',            # Start month, base zero
                'b': '01',            # Start day
                'c': '2005',          # Start year
                'd': '05',            # End month, base zero
                'e': '03',            # End day
                'f': '2009',          # End year
                'g': 'd',             # Daily
                'ignore': '.csv'}     # Data type

encoded = urllib.urlencode(request_data)  # Formats the query string
response = urllib.urlopen(base_url + '?' + encoded)
After running this script, we can get successive lines of the data as follows
>>> response.next()
'Date,Open,High,Low,Close,Volume,Adj Close\n'
>>> response.next()
'2009-06-03,426.00,432.46,424.00,431.65,3532800,431.65\n'
>>> response.next()
'2009-06-02,426.25,429.96,423.40,428.40,2623600,428.40\n'
>>> response.next()
'2009-06-01,418.73,429.60,418.53,426.56,3322400,426.56\n'
We see that Google's share price opened at 426.00 on the 3rd of June 2009, etc.
Note: If you have problems runnning this, your internet connection might be using a proxy server
Try googling for some help with urllib and proxy servers
Exercise:
Write a program to print out the percentage change in value since the start of the year for all of the stocks in this file
  • Change is from Jan 1st until the most recent price available
  • Use the last column (i.e., Adj Close) as the price
  • Stock prices should be downloaded at runtime from Yahoo Finance
  • If you can, print returns in order, from largest to smallest
    • Hint: use the sorted() function
A hint: if
line = '2009-06-01,418.73,429.60,418.53,426.56,3322400,426.56\n'
then line.split(',') returns the elements as a list of strings

Solution

## Filename: yahoo_fin.py
## Author: John Stachurski

from urllib import urlopen, urlencode
from datetime import date
from operator import itemgetter

# Record current day and month as strings, month is base zero
today = date.today()
mm = str(today.month - 1)  
dd = str(today.day)

base_url = 'http://ichart.finance.yahoo.com/table.csv'

request_data = {'a': '00',            # Start month, base zero
                'b': '01',            # Start day
                'c': '2008',          # Start year
                'd': mm,              # End month, base zero
                'e': dd,              # End day
                'f': '2008',          # End year
                'g': 'd',             # Daily
                'ignore': '.csv'}     # Data type

# Main loop

portfolio = open('portfolio.txt')  
percent_change = {}
for line in portfolio:
    ticker, company_name = [item.strip() for item in line.split(',')]
    request_data['s'] = ticker
    response = urlopen(base_url + '?' + urlencode(request_data))
    response.next()  # Skip the first line
    prices = [line.split(',')[-1] for line in response]
    old_price, new_price = float(prices[-1]), float(prices[0])    
    percent_change[company_name] = 100 * (new_price - old_price) / old_price
portfolio.close()

items = percent_change.items()

for name, change in sorted(items, key=itemgetter(1), reverse=True):
    print '%-12s %10.2f' % (name, change)



    

0 comments: