Friday, December 23, 2011

Computational Economics Lecture 12


Basic Plotting with Matplotlib

Matplotlib is a module for generating plots and graphs in Python
  • Probably the most popular
  • Written to mimic MATLAB plotting functionality
Follow the current download and installation instructions
  • Might need to install NumPy separately
  • If you are going to do that, install SciPy
    • Includes NumPy plus a bunch of other stuff
We will learn more about NumPy and SciPy later

Basic Plots

Here's a simple plot
import pylab  # import the Matplotlib module
X = [1, 2, 3]
Y = [1, 4, 9]
pylab.plot(X, Y) 
pylab.show()
This opens a window with the following plot




The buttons at bottom left allow you to adjust axes, save, etc.
Here I've saved as a png file:




Adding another line to the same plot is straightforward:
import pylab     
X = [1, 2, 3]
Y = [1, 4, 9]
Z = [4, 5, 6]
pylab.plot(X, Y)   
pylab.plot(X, Z) 
pylab.show()




Let's plot the cosine function
import pylab  
X = pylab.linspace(-10, 10, 200)  # A grid on [-10, 10] with 200 points
Y = pylab.cos(X)                  # cos(x) for all x in X
pylab.plot(X, Y)
pylab.show()




We can make it a red line if we prefer
import pylab  
X = pylab.linspace(-10, 10, 200)  
Y = pylab.cos(X)                 
pylab.plot(X, Y, 'r-')
pylab.show()




For a dashed red line use pylab.plot(X, Y, 'r--')




For yellow dots use pylab.plot(X, Y, 'yo')




We can add titles, axis labels and so on
import pylab  
X = pylab.linspace(-10, 10, 200)  
Y = pylab.cos(X)                 
pylab.plot(X, Y, 'yo')
pylab.xlabel('x values')
pylab.ylabel('y values')
pylab.title('Plot of the cosine function.')
pylab.show()




There are many other ways to customize and control the plots
See the user guide at the Matplotlib homepage.

Histograms

Here's a quick example of how to plot a histogram
import pylab  
data = pylab.randn(500)    # 500 draws from the standard normal distribution
pylab.hist(data, bins=40)
pylab.show()




Note that the y-axis gives frequency in the last plot
For a density use pylab.hist(data, bins=40, normed=True)





Exercises

This file contains daily quotes for the Nikkei 225 from Jan 1984 until May 2009, downloaded from Yahoo finance
Here are the first few lines
Date,Open,High,Low,Close,Volume,Adj Close
2009-05-21,9280.35,9286.35,9189.92,9264.15,133200,9264.15
2009-05-20,9372.72,9399.40,9311.61,9344.64,143200,9344.64
2009-05-19,9172.56,9326.75,9166.97,9290.29,167000,9290.29
2009-05-18,9167.05,9167.82,8997.74,9038.69,147800,9038.69
2009-05-15,9150.21,9272.08,9140.90,9265.02,172000,9265.02
Data is comma separated (csv), with most recent date first
For our price data we will use the last column (Adj Close)
Exercise 1:
Plot the data (i.e., the Adj Close column) as a time series
  • Use the File I/O operations in this lecture to extract the data
    • You might like to use the string method split()
    • Note that there is a module called csv for working with csv files
      • But don't use it this time: I want you to practice basic file I/O
  • Make sure your time series is from earliest (i.e., Jan 84) to latest (i.e., May 2009)
Exercise 2:
Write a function that
  • takes a start year and an end year, and
  • plots daily returns (as a percentage)
Daily return = [(today - yesterday) / yesterday] * 100
Exercise 3:
Histogram the daily returns data
If you can, fit a normal density to the data and plot that too
Exercise 4:
Repeat Exercise 1, but using monthly data
  • Extract first quote of each month and plot as a time series
  • Note that first observation is not necessarily on the first day of month
    • first day of the month might be the weekend

Solutions

Solution to Exercises 1--4
## Author: John Stachurski
## Filename: nikkei_plot.py

from __future__ import division
import pylab

# First let's create some functions 

def percent_change(data):
    """ 
    Calculates change in percentages from one data point to the next,  
    where data is an array of numbers.
    """
    percent_change = []
    for next, current in zip(data[1:], data[:-1]):
        percent_change.append(100 * (next - current) / current)
    return percent_change

def seriesplot(data):
    pylab.plot(data)
    pylab.show()

def returnsplot(start_year, end_year, data, dates):
    """
    Plots daily returns from start_year to end_year.
    Parameters: start_year and end_year are integers from 1984 to 2008.  data
    is the price data as a list of floats, and dates is the corresponding list
    of dates.  Each date is a string in the format YYYY-MM-DD.
    """
    plotvals = []
    for value, date in zip(values, dates):
        year = int(date.split('-')[0])  # extract the year
        if start_year <= year <= end_year:
            plotvals.append(value)
    seriesplot(percent_change(plotvals))

def densityplot(data):
    """
    Plots a histogram of daily returns from data, plus fitted normal density.
    """
    dailyreturns = percent_change(data)
    pylab.hist(dailyreturns, bins=200, normed=True)
    m, M = min(dailyreturns), max(dailyreturns)
    mu = pylab.mean(dailyreturns)
    sigma = pylab.std(dailyreturns)
    grid = pylab.linspace(m, M, 100)
    densityvalues = pylab.normpdf(grid, mu, sigma)
    pylab.plot(grid, densityvalues, 'r-')
    pylab.show()

def monthly_returns(data, dates):
    plotdata = []
    # Append the first data entry for plotting
    plotdata.append(data[0])
    # Get the month corresponding to the first data entry
    month = dates[0].split('-')[1]
    for value, date in zip(data, dates):
        current_month = date.split('-')[1]
        if current_month == month:
            pass  # Do nothing
        else:
            plotdata.append(value)
            month = current_month
    seriesplot(plotdata)

#  Now we are ready to read in the data and make the plots

infile = open("table.csv", 'r')
lines = infile.readlines()
infile.close()
del lines[0]     # Remove the first line
lines.reverse()  # Reverse order to start at earliest date

dates = []
values = []
for line in lines:
    elements = line.split(',')
    dates.append(elements[0])
    values.append(float(elements[-1]))

# Solutions to the exercises

exercise_number = int(raw_input("Enter the number of the exercise: "))

if exercise_number == 1:
    seriesplot(values)
elif exercise_number == 2:
    sy = int(raw_input("Enter the start year: "))
    ey = int(raw_input("Enter the end year: "))
    returnsplot(sy, ey, values, dates)
elif exercise_number == 3:
    densityplot(values)
elif exercise_number == 4:
    monthly_returns(values, dates)
else:
    print "Dude, there's no exercise number " + str(exercise_number)

0 comments: