Iterators are a uniform interface to stepping through elements in a collection
In this lecture we'll talk about using iterators
In a later lecture we'll learn how to build our own
First we define iterators and iterables
An iterator is an object with a next() method
For example, file objects (which we met in this lecture) are iterators
Recall that we had a file test.txt with contents
Foo foo
Bar bar
Let's create a file object linked to this file
>>> f = open('test.txt', 'r')
This object has a next() method:
>>> f.next()
'Foo foo\n'
>>> f.next()
'Bar bar\n'
Calling f.next() is essentially the same as calling f.readline()
Other examples are
>>> e = enumerate(['foo', 'bar'])
>>> e.next()
(0, 'foo')
>>> e.next()
(1, 'bar')
csv module (which is used to manipulate CSV files)>>> from csv import reader
>>> nikkei_data = reader(open('table.csv')) # The reader() function is passed a file object
>>> nikkei_data.next()
['Date', 'Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close']
>>> nikkei_data.next()
['2008-05-19', '14294.52', '14343.19', '14219.08', '14269.61', '133800', '14269.61']
urllib.urlopen()>>> import urllib
>>> webpage = urllib.urlopen("http://www.cnn.com")
>>> webpage.next()
'<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN""http://www.w3.org/...' # etc
>>> webpage.next()
'<meta http-equiv="refresh" content="1800;url=?refresh=1">\n'
>>> webpage.next()
'<meta name="Description" content="CNN.com delivers the latest breaking news and information..' # etc
The built-in function iter() can be used for creating iterators from certain objects
An object is said to be iterable if it can be passed to iter()
A good example is a list:
>>> X = ['foo', 'bar']
>>> type(X)
<type 'list'>
>>> Y = iter(X)
>>> type(Y)
<type 'listiterator'>
>>> Y.next()
'foo'
>>> Y.next()
'bar'
>>> Y.next()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
Another example is a dictionary
>>> d = {'name': 'godzilla', 'height in meters': 10}
>>> d = iter(d)
>>> type(d)
<type 'dictionary-keyiterator'>
>>> d.next()
'height in meters'
>>> d.next()
'name'
The next() method steps through the keys of the dictionary
Incidentally, we can get iterators directly
d.iterkeys() returns same iterator as iter(d.keys()) or iter(d)d.itervalues() returns same iterator as iter(d.values())d.iteritems() returns same iterator as iter(d.items())Of course, not all objects are iterable
>>> iter(42)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'int' object is not iterable
Let's look at some different ways we can use iterators
A very common use of iterators is in for loops
In fact this is how the for loop works!
for x in iterator:
<code block>
This is what happens:
iterator.next() and binds x to resultStopIteration errorRemember that in this lecture that we introduced the syntax
f = open('somefile.txt')
for line in f:
# do something
Now you know how it works:
next() methodf.next() and binds line to return valueStopIteration errorAnother example
for i, x in enumerate(X):
# do something
Again, enumerate(X) is an iterator
What about this example
X = ['a', 'b']
for x in X:
print x
Here X is a list (an iterable), not an iterator
Internally, Python calls iter(X) to make an iterator
More generally,
for loops work on either iterators or iterablesiter(iterable)Here's another example
d = {'name': 'godzilla', 'height in meters': 10}
for key in d:
# do something
Now you know how this works
Internally, the iterable d is passed to iter()
The resulting iterator steps through the keys of d
Some built-in functions that act on sequences also work with iterables
max(), min(), sum(), all(), any()>>> X = [10, -10]
>>> max(X)
10
>>> Y = iter(X)
>>> type(Y)
<type 'listiterator'>
>>> max(Y)
10
A major difference in usage is that iterators are depleted by use
>>> X = [10, -10]
>>> Y = iter(X)
>>> max(Y)
10
>>> max(Y)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: max() arg is an empty sequence
The application involves downloading data with the module urllib
URL stands for uniform resource locator
Examples:
http://www.google.comhttp://johnstachurski.net/teaching.htmlSome URLs have a query string
http://www.google.com/search?q=godzillaThe part after (but not including) ? is the query string
Passed to the server as an argument
We can obtain stock price data from Yahoo Finance using query strings, such as
http://ichart.finance.yahoo.com/table.csv?a=00&c=2005&b=01&e=03&d=05&g=d&f=2008&ignore=.csv&s=GOOG
The query string is a collection of field/value pairs, separated by &
The meanings of the main fields are
Here is an example of useage
import urllib
base_url = 'http://ichart.finance.yahoo.com/table.csv'
request_data = {'s': 'GOOG', # Ticker symbol for Google
'a': '00', # Start month, base zero
'b': '01', # Start day
'c': '2005', # Start year
'd': '05', # End month, base zero
'e': '03', # End day
'f': '2009', # End year
'g': 'd', # Daily
'ignore': '.csv'} # Data type
encoded = urllib.urlencode(request_data) # Formats the query string
response = urllib.urlopen(base_url + '?' + encoded)
After running this script, we can get successive lines of the data as follows
>>> response.next()
'Date,Open,High,Low,Close,Volume,Adj Close\n'
>>> response.next()
'2009-06-03,426.00,432.46,424.00,431.65,3532800,431.65\n'
>>> response.next()
'2009-06-02,426.25,429.96,423.40,428.40,2623600,428.40\n'
>>> response.next()
'2009-06-01,418.73,429.60,418.53,426.56,3322400,426.56\n'
We see that Google's share price opened at 426.00 on the 3rd of June 2009, etc.
Note: If you have problems runnning this, your internet connection might be using a proxy server
Try googling for some help with urllib and proxy servers
Exercise:
Write a program to print out the percentage change in value since the start of the year for all of the stocks in this file
sorted() function A hint: if
line = '2009-06-01,418.73,429.60,418.53,426.56,3322400,426.56\n'
then line.split(',') returns the elements as a list of strings
## Filename: yahoo_fin.py
## Author: John Stachurski
from urllib import urlopen, urlencode
from datetime import date
from operator import itemgetter
# Record current day and month as strings, month is base zero
today = date.today()
mm = str(today.month - 1)
dd = str(today.day)
base_url = 'http://ichart.finance.yahoo.com/table.csv'
request_data = {'a': '00', # Start month, base zero
'b': '01', # Start day
'c': '2008', # Start year
'd': mm, # End month, base zero
'e': dd, # End day
'f': '2008', # End year
'g': 'd', # Daily
'ignore': '.csv'} # Data type
# Main loop
portfolio = open('portfolio.txt')
percent_change = {}
for line in portfolio:
ticker, company_name = [item.strip() for item in line.split(',')]
request_data['s'] = ticker
response = urlopen(base_url + '?' + urlencode(request_data))
response.next() # Skip the first line
prices = [line.split(',')[-1] for line in response]
old_price, new_price = float(prices[-1]), float(prices[0])
percent_change[company_name] = 100 * (new_price - old_price) / old_price
portfolio.close()
items = percent_change.items()
for name, change in sorted(items, key=itemgetter(1), reverse=True):
print '%-12s %10.2f' % (name, change)