Go to top, next, previous, or johnstachurski.net

Basic Plotting with Matplotlib

Matplotlib is a module for generating plots and graphs in Python

Follow the current download and installation instructions

We will learn more about NumPy and SciPy later

Basic Plots

Here's a simple plot

import pylab  # import the Matplotlib module
X = [1, 2, 3]
Y = [1, 4, 9]
pylab.plot(X, Y) 
pylab.show()

This opens a window with the following plot



The buttons at bottom left allow you to adjust axes, save, etc.

Here I've saved as a png file:



Adding another line to the same plot is straightforward:

import pylab     
X = [1, 2, 3]
Y = [1, 4, 9]
Z = [4, 5, 6]
pylab.plot(X, Y)   
pylab.plot(X, Z) 
pylab.show()



Let's plot the cosine function

import pylab  
X = pylab.linspace(-10, 10, 200)  # A grid on [-10, 10] with 200 points
Y = pylab.cos(X)                  # cos(x) for all x in X
pylab.plot(X, Y)
pylab.show()



We can make it a red line if we prefer

import pylab  
X = pylab.linspace(-10, 10, 200)  
Y = pylab.cos(X)                 
pylab.plot(X, Y, 'r-')
pylab.show()



For a dashed red line use pylab.plot(X, Y, 'r--')



For yellow dots use pylab.plot(X, Y, 'yo')



We can add titles, axis labels and so on

import pylab  
X = pylab.linspace(-10, 10, 200)  
Y = pylab.cos(X)                 
pylab.plot(X, Y, 'yo')
pylab.xlabel('x values')
pylab.ylabel('y values')
pylab.title('Plot of the cosine function.')
pylab.show()



There are many other ways to customize and control the plots

See the user guide at the Matplotlib homepage.

Histograms

Here's a quick example of how to plot a histogram

import pylab  
data = pylab.randn(500)    # 500 draws from the standard normal distribution
pylab.hist(data, bins=40)
pylab.show()



Note that the y-axis gives frequency in the last plot

For a density use pylab.hist(data, bins=40, normed=True)




Exercises

This file contains daily quotes for the Nikkei 225 from Jan 1984 until May 2009, downloaded from Yahoo finance

Here are the first few lines

Date,Open,High,Low,Close,Volume,Adj Close
2009-05-21,9280.35,9286.35,9189.92,9264.15,133200,9264.15
2009-05-20,9372.72,9399.40,9311.61,9344.64,143200,9344.64
2009-05-19,9172.56,9326.75,9166.97,9290.29,167000,9290.29
2009-05-18,9167.05,9167.82,8997.74,9038.69,147800,9038.69
2009-05-15,9150.21,9272.08,9140.90,9265.02,172000,9265.02

Data is comma separated (csv), with most recent date first

For our price data we will use the last column (Adj Close)

Exercise 1:

Plot the data (i.e., the Adj Close column) as a time series

Exercise 2:

Write a function that

Daily return = [(today - yesterday) / yesterday] * 100

Exercise 3:

Histogram the daily returns data

If you can, fit a normal density to the data and plot that too

Exercise 4:

Repeat Exercise 1, but using monthly data

Solutions

Solution to Exercises 1--4

## Author: John Stachurski
## Filename: nikkei_plot.py

from __future__ import division
import pylab

# First let's create some functions 

def percent_change(data):
    """ 
    Calculates change in percentages from one data point to the next,  
    where data is an array of numbers.
    """
    percent_change = []
    for next, current in zip(data[1:], data[:-1]):
        percent_change.append(100 * (next - current) / current)
    return percent_change

def seriesplot(data):
    pylab.plot(data)
    pylab.show()

def returnsplot(start_year, end_year, data, dates):
    """
    Plots daily returns from start_year to end_year.
    Parameters: start_year and end_year are integers from 1984 to 2008.  data
    is the price data as a list of floats, and dates is the corresponding list
    of dates.  Each date is a string in the format YYYY-MM-DD.
    """
    plotvals = []
    for value, date in zip(values, dates):
        year = int(date.split('-')[0])  # extract the year
        if start_year <= year <= end_year:
            plotvals.append(value)
    seriesplot(percent_change(plotvals))

def densityplot(data):
    """
    Plots a histogram of daily returns from data, plus fitted normal density.
    """
    dailyreturns = percent_change(data)
    pylab.hist(dailyreturns, bins=200, normed=True)
    m, M = min(dailyreturns), max(dailyreturns)
    mu = pylab.mean(dailyreturns)
    sigma = pylab.std(dailyreturns)
    grid = pylab.linspace(m, M, 100)
    densityvalues = pylab.normpdf(grid, mu, sigma)
    pylab.plot(grid, densityvalues, 'r-')
    pylab.show()

def monthly_returns(data, dates):
    plotdata = []
    # Append the first data entry for plotting
    plotdata.append(data[0])
    # Get the month corresponding to the first data entry
    month = dates[0].split('-')[1]
    for value, date in zip(data, dates):
        current_month = date.split('-')[1]
        if current_month == month:
            pass  # Do nothing
        else:
            plotdata.append(value)
            month = current_month
    seriesplot(plotdata)

#  Now we are ready to read in the data and make the plots

infile = open("table.csv", 'r')
lines = infile.readlines()
infile.close()
del lines[0]     # Remove the first line
lines.reverse()  # Reverse order to start at earliest date

dates = []
values = []
for line in lines:
    elements = line.split(',')
    dates.append(elements[0])
    values.append(float(elements[-1]))

# Solutions to the exercises

exercise_number = int(raw_input("Enter the number of the exercise: "))

if exercise_number == 1:
    seriesplot(values)
elif exercise_number == 2:
    sy = int(raw_input("Enter the start year: "))
    ey = int(raw_input("Enter the end year: "))
    returnsplot(sy, ey, values, dates)
elif exercise_number == 3:
    densityplot(values)
elif exercise_number == 4:
    monthly_returns(values, dates)
else:
    print "Dude, there's no exercise number " + str(exercise_number)