Open Mind Tree: Computational economics Lecture 9

Iterators

Iterators are a uniform interface to stepping through elements in a collection

One of the (many) nice features of the Python language...

In this lecture we'll talk about using iterators

In a later lecture we'll learn how to build our own

Definitions

First we define iterators and iterables

Iterators

An iterator is an object with a next() method

For example, file objects (which we met in this lecture) are iterators

Recall that we had a file test.txt with contents

Foo foo
Bar bar

Let's create a file object linked to this file

>>> f = open('test.txt', 'r')

This object has a next() method:

>>> f.next()
'Foo foo\n'
>>> f.next()
'Bar bar\n'

Calling f.next() is essentially the same as calling f.readline()

Other examples are

enumerate objects

>>> e = enumerate(['foo', 'bar'])
>>> e.next()
(0, 'foo')
>>> e.next()
(1, 'bar')

reader objects from the csv module (which is used to manipulate CSV files)

>>> from csv import reader
>>> nikkei_data = reader(open('table.csv'))  # The reader() function is passed a file object
>>> nikkei_data.next()
['Date', 'Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close']
>>> nikkei_data.next()
['2008-05-19', '14294.52', '14343.19', '14219.08', '14269.61', '133800', '14269.61']

objects returned by urllib.urlopen()

>>> import urllib
>>> webpage = urllib.urlopen("http://www.cnn.com")
>>> webpage.next()
'<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN""http://www.w3.org/...' # etc
>>> webpage.next()
'<meta http-equiv="refresh" content="1800;url=?refresh=1">\n'
>>> webpage.next()
'<meta name="Description" content="CNN.com delivers the latest breaking news and information..' # etc

Iterables

The built-in function iter() can be used for creating iterators from certain objects

An object is said to be iterable if it can be passed to iter()

A good example is a list:

>>> X = ['foo', 'bar']
>>> type(X)
<type 'list'>
>>> Y = iter(X)
>>> type(Y)
<type 'listiterator'>
>>> Y.next()
'foo'
>>> Y.next()
'bar'
>>> Y.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

Another example is a dictionary

>>> d = {'name': 'godzilla', 'height in meters': 10}
>>> d = iter(d)
>>> type(d)
<type 'dictionary-keyiterator'>
>>> d.next()
'height in meters'
>>> d.next()
'name'

The next() method steps through the keys of the dictionary

The keys are not ordered, so no notion of "first", "second", etc.

Incidentally, we can get iterators directly

d.iterkeys() returns same iterator as iter(d.keys()) or iter(d)
d.itervalues() returns same iterator as iter(d.values())
d.iteritems() returns same iterator as iter(d.items())

Of course, not all objects are iterable

>>> iter(42)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'int' object is not iterable

Using Iterators

Let's look at some different ways we can use iterators

Iterators in For Loops

A very common use of iterators is in for loops

In fact this is how the for loop works!

for x in iterator:
    <code block>

This is what happens:

Interpreter calls iterator.next() and binds x to result
Executes code block
Repeats until StopIteration error

Remember that in this lecture that we introduced the syntax

f = open('somefile.txt')
for line in f:
    # do something

Now you know how it works:

f is bound to an iterator
- A file object, which implements a next() method
Interpreter
- Calls f.next() and binds line to return value
- Executes body of loop
- Repeats until StopIteration error

Another example

for i, x in enumerate(X):
    # do something

Again, enumerate(X) is an iterator

What about this example

X = ['a', 'b']
for x in X:
    print x

Here X is a list (an iterable), not an iterator

Internally, Python calls iter(X) to make an iterator

More generally,

for loops work on either iterators or iterables
In the second case, the iterable is converted into an iterator
- iter(iterable)

Here's another example

d = {'name': 'godzilla', 'height in meters': 10}
for key in d:
    # do something

Now you know how this works

Internally, the iterable d is passed to iter()

The resulting iterator steps through the keys of d

Iterators and built-ins

Some built-in functions that act on sequences also work with iterables

max(), min(), sum(), all(), any()

>>> X = [10, -10]
>>> max(X)
10
>>> Y = iter(X)
>>> type(Y)
<type 'listiterator'>
>>> max(Y)
10

Use and reuse

A major difference in usage is that iterators are depleted by use

>>> X = [10, -10]
>>> Y = iter(X)
>>> max(Y)
10
>>> max(Y)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: max() arg is an empty sequence

Application: Web Data

The application involves downloading data with the module urllib

URL stands for uniform resource locator

Examples:

http://www.google.com
http://johnstachurski.net/teaching.html

Some URLs have a query string

http://www.google.com/search?q=godzilla

The part after (but not including) ? is the query string

Passed to the server as an argument

We can obtain stock price data from Yahoo Finance using query strings, such as

http://ichart.finance.yahoo.com/table.csv?a=00&c=2005&b=01&e=03&d=05&g=d&f=2008&ignore=.csv&s=GOOG

The query string is a collection of field/value pairs, separated by &

The meanings of the main fields are

a: start month, base zero (e.g., jan = 0, feb = 1, etc.)
b: start day
c: start year
d: end month, base zero
e: end day
f: end year
g: period (in this case, d = daily)
s: ticker symbol for the stock (in this case, Google)

Here is an example of useage

import urllib

base_url = 'http://ichart.finance.yahoo.com/table.csv'

request_data = {'s': 'GOOG',          # Ticker symbol for Google
                'a': '00',            # Start month, base zero
                'b': '01',            # Start day
                'c': '2005',          # Start year
                'd': '05',            # End month, base zero
                'e': '03',            # End day
                'f': '2009',          # End year
                'g': 'd',             # Daily
                'ignore': '.csv'}     # Data type

encoded = urllib.urlencode(request_data)  # Formats the query string
response = urllib.urlopen(base_url + '?' + encoded)

After running this script, we can get successive lines of the data as follows

>>> response.next()
'Date,Open,High,Low,Close,Volume,Adj Close\n'
>>> response.next()
'2009-06-03,426.00,432.46,424.00,431.65,3532800,431.65\n'
>>> response.next()
'2009-06-02,426.25,429.96,423.40,428.40,2623600,428.40\n'
>>> response.next()
'2009-06-01,418.73,429.60,418.53,426.56,3322400,426.56\n'

We see that Google's share price opened at 426.00 on the 3rd of June 2009, etc.

Note: If you have problems runnning this, your internet connection might be using a proxy server

Try googling for some help with urllib and proxy servers

Exercise:

Write a program to print out the percentage change in value since the start of the year for all of the stocks in this file

Change is from Jan 1st until the most recent price available
Use the last column (i.e., Adj Close) as the price
Stock prices should be downloaded at runtime from Yahoo Finance
If you can, print returns in order, from largest to smallest
- Hint: use the sorted() function

A hint: if

line = '2009-06-01,418.73,429.60,418.53,426.56,3322400,426.56\n'

then line.split(',') returns the elements as a list of strings

Solution

## Filename: yahoo_fin.py
## Author: John Stachurski

from urllib import urlopen, urlencode
from datetime import date
from operator import itemgetter

# Record current day and month as strings, month is base zero
today = date.today()
mm = str(today.month - 1)  
dd = str(today.day)

base_url = 'http://ichart.finance.yahoo.com/table.csv'

request_data = {'a': '00',            # Start month, base zero
                'b': '01',            # Start day
                'c': '2008',          # Start year
                'd': mm,              # End month, base zero
                'e': dd,              # End day
                'f': '2008',          # End year
                'g': 'd',             # Daily
                'ignore': '.csv'}     # Data type

# Main loop

portfolio = open('portfolio.txt')  
percent_change = {}
for line in portfolio:
    ticker, company_name = [item.strip() for item in line.split(',')]
    request_data['s'] = ticker
    response = urlopen(base_url + '?' + urlencode(request_data))
    response.next()  # Skip the first line
    prices = [line.split(',')[-1] for line in response]
    old_price, new_price = float(prices[-1]), float(prices[0])    
    percent_change[company_name] = 100 * (new_price - old_price) / old_price
portfolio.close()

items = percent_change.items()

for name, change in sorted(items, key=itemgetter(1), reverse=True):
    print '%-12s %10.2f' % (name, change)

Open Mind Tree

Computational economics Lecture 9

Iterators

Definitions

Iterators

Iterables

Using Iterators

Iterators in For Loops

Iterators and built-ins

Use and reuse

Application: Web Data

Solution

0 comments:

Popular Posts

Visitors

Archives

Infolinks In Text Ads

Featured Posts

Blogger Tips