Iterators
Iterators are a uniform interface to stepping through elements in a collection
- One of the (many) nice features of the Python language...
In this lecture we'll talk about using iterators
In a later lecture we'll learn how to build our own
Definitions
First we define iterators and iterables
Iterators
An iterator is an object with a
next()
method
For example, file objects (which we met in this lecture) are iterators
Recall that we had a file test.txt with contents
Foo foo
Bar bar
Let's create a file object linked to this file
>>> f = open('test.txt', 'r')
This object has a
next()
method:>>> f.next()
'Foo foo\n'
>>> f.next()
'Bar bar\n'
Calling
f.next()
is essentially the same as calling f.readline()
Other examples are
- enumerate objects
>>> e = enumerate(['foo', 'bar'])
>>> e.next()
(0, 'foo')
>>> e.next()
(1, 'bar')
- reader objects from the
csv
module (which is used to manipulate CSV files)
>>> from csv import reader
>>> nikkei_data = reader(open('table.csv')) # The reader() function is passed a file object
>>> nikkei_data.next()
['Date', 'Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close']
>>> nikkei_data.next()
['2008-05-19', '14294.52', '14343.19', '14219.08', '14269.61', '133800', '14269.61']
- objects returned by
urllib.urlopen()
>>> import urllib
>>> webpage = urllib.urlopen("http://www.cnn.com")
>>> webpage.next()
'<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN""http://www.w3.org/...' # etc
>>> webpage.next()
'<meta http-equiv="refresh" content="1800;url=?refresh=1">\n'
>>> webpage.next()
'<meta name="Description" content="CNN.com delivers the latest breaking news and information..' # etc
Iterables
The built-in function
iter()
can be used for creating iterators from certain objects
An object is said to be iterable if it can be passed to
iter()
A good example is a list:
>>> X = ['foo', 'bar']
>>> type(X)
<type 'list'>
>>> Y = iter(X)
>>> type(Y)
<type 'listiterator'>
>>> Y.next()
'foo'
>>> Y.next()
'bar'
>>> Y.next()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
Another example is a dictionary
>>> d = {'name': 'godzilla', 'height in meters': 10}
>>> d = iter(d)
>>> type(d)
<type 'dictionary-keyiterator'>
>>> d.next()
'height in meters'
>>> d.next()
'name'
The
next()
method steps through the keys of the dictionary- The keys are not ordered, so no notion of "first", "second", etc.
Incidentally, we can get iterators directly
d.iterkeys()
returns same iterator asiter(d.keys())
oriter(d)
d.itervalues()
returns same iterator asiter(d.values())
d.iteritems()
returns same iterator asiter(d.items())
Of course, not all objects are iterable
>>> iter(42)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'int' object is not iterable
Using Iterators
Let's look at some different ways we can use iterators
Iterators in For Loops
A very common use of iterators is in
for
loops
In fact this is how the
for
loop works!for x in iterator:
<code block>
This is what happens:
- Interpreter calls
iterator.next()
and bindsx
to result - Executes code block
- Repeats until
StopIteration
error
Remember that in this lecture that we introduced the syntax
f = open('somefile.txt')
for line in f:
# do something
Now you know how it works:
- f is bound to an iterator
- A file object, which implements a
next()
method
- A file object, which implements a
- Interpreter
- Calls
f.next()
and bindsline
to return value - Executes body of loop
- Repeats until
StopIteration
error
- Calls
Another example
for i, x in enumerate(X):
# do something
Again,
enumerate(X)
is an iterator
What about this example
X = ['a', 'b']
for x in X:
print x
Here
X
is a list (an iterable), not an iterator
Internally, Python calls
iter(X)
to make an iterator
More generally,
for
loops work on either iterators or iterables- In the second case, the iterable is converted into an iterator
iter(iterable)
Here's another example
d = {'name': 'godzilla', 'height in meters': 10}
for key in d:
# do something
Now you know how this works
Internally, the iterable
d
is passed to iter()
The resulting iterator steps through the keys of
d
Iterators and built-ins
Some built-in functions that act on sequences also work with iterables
max()
,min()
,sum()
,all()
,any()
>>> X = [10, -10]
>>> max(X)
10
>>> Y = iter(X)
>>> type(Y)
<type 'listiterator'>
>>> max(Y)
10
Use and reuse
A major difference in usage is that iterators are depleted by use
>>> X = [10, -10]
>>> Y = iter(X)
>>> max(Y)
10
>>> max(Y)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: max() arg is an empty sequence
Application: Web Data
The application involves downloading data with the module
urllib
URL stands for uniform resource locator
Examples:
http://www.google.com
http://johnstachurski.net/teaching.html
Some URLs have a query string
http://www.google.com/search?q=godzilla
The part after (but not including)
?
is the query string
Passed to the server as an argument
We can obtain stock price data from Yahoo Finance using query strings, such as
http://ichart.finance.yahoo.com/table.csv?a=00&c=2005&b=01&e=03&d=05&g=d&f=2008&ignore=.csv&s=GOOG
The query string is a collection of field/value pairs, separated by
&
The meanings of the main fields are
- a: start month, base zero (e.g., jan = 0, feb = 1, etc.)
- b: start day
- c: start year
- d: end month, base zero
- e: end day
- f: end year
- g: period (in this case, d = daily)
- s: ticker symbol for the stock (in this case, Google)
Here is an example of useage
import urllib
base_url = 'http://ichart.finance.yahoo.com/table.csv'
request_data = {'s': 'GOOG', # Ticker symbol for Google
'a': '00', # Start month, base zero
'b': '01', # Start day
'c': '2005', # Start year
'd': '05', # End month, base zero
'e': '03', # End day
'f': '2009', # End year
'g': 'd', # Daily
'ignore': '.csv'} # Data type
encoded = urllib.urlencode(request_data) # Formats the query string
response = urllib.urlopen(base_url + '?' + encoded)
After running this script, we can get successive lines of the data as follows
>>> response.next()
'Date,Open,High,Low,Close,Volume,Adj Close\n'
>>> response.next()
'2009-06-03,426.00,432.46,424.00,431.65,3532800,431.65\n'
>>> response.next()
'2009-06-02,426.25,429.96,423.40,428.40,2623600,428.40\n'
>>> response.next()
'2009-06-01,418.73,429.60,418.53,426.56,3322400,426.56\n'
We see that Google's share price opened at 426.00 on the 3rd of June 2009, etc.
Note: If you have problems runnning this, your internet connection might be using a proxy server
Try googling for some help with
urllib
and proxy servers
Exercise:
Write a program to print out the percentage change in value since the start of the year for all of the stocks in this file
- Change is from Jan 1st until the most recent price available
- Use the last column (i.e., Adj Close) as the price
- Stock prices should be downloaded at runtime from Yahoo Finance
- If you can, print returns in order, from largest to smallest
- Hint: use the
sorted()
function
- Hint: use the
A hint: if
line = '2009-06-01,418.73,429.60,418.53,426.56,3322400,426.56\n'
then
line.split(',')
returns the elements as a list of stringsSolution
## Filename: yahoo_fin.py
## Author: John Stachurski
from urllib import urlopen, urlencode
from datetime import date
from operator import itemgetter
# Record current day and month as strings, month is base zero
today = date.today()
mm = str(today.month - 1)
dd = str(today.day)
base_url = 'http://ichart.finance.yahoo.com/table.csv'
request_data = {'a': '00', # Start month, base zero
'b': '01', # Start day
'c': '2008', # Start year
'd': mm, # End month, base zero
'e': dd, # End day
'f': '2008', # End year
'g': 'd', # Daily
'ignore': '.csv'} # Data type
# Main loop
portfolio = open('portfolio.txt')
percent_change = {}
for line in portfolio:
ticker, company_name = [item.strip() for item in line.split(',')]
request_data['s'] = ticker
response = urlopen(base_url + '?' + urlencode(request_data))
response.next() # Skip the first line
prices = [line.split(',')[-1] for line in response]
old_price, new_price = float(prices[-1]), float(prices[0])
percent_change[company_name] = 100 * (new_price - old_price) / old_price
portfolio.close()
items = percent_change.items()
for name, change in sorted(items, key=itemgetter(1), reverse=True):
print '%-12s %10.2f' % (name, change)
0 comments:
Post a Comment