SciPy provides functions and other objects that work with NumPy arrays:
Very impressive package
Currently being actively developed
When we import SciPy, we also get NumPy
Here's the relevant lines of the SciPy initialization file:
# Import numpy symbols to scipy name space
from numpy import *
from numpy.random import rand, randn
from numpy.fft import fft, ifft
from numpy.lib.scimath import *
# Remove the linalg imported from numpy so that the scipy.linalg package can be
# imported.
del linalg
Most of SciPy resides in submodules
scipy.optimize, scipy.integrate, etc.In the current set up these are not imported into the scipy namespace
Means that you have to import them separately
import scipy.optimize
from scipy.integrate import quad
SciPy includes NumPy, so you can just import SciPy
But personally I like NumPy because it is small and stable
I usually start my programs with
import numpy as np
Then I import bits and pieces from SciPy as needed
from scipy.integrate import quad
from scipy.optimize import brentq
# etc
The scipy.stats package supplies
Recall that numpy.random provides functions for generating random variables
>>> a = 5
>>> b = 5
>>> np.random.beta(a, b, size=3)
array([ 0.23607443, 0.90941746, 0.49067643])
This generates a draw from the distribution (density)
However, sometimes we need access to the density itself, or the cdf, the quantiles, etc.
For this we can use scipy.stats
Here's an example
import numpy as np
from scipy.stats import beta
from matplotlib.pyplot import hist, plot, show
q = beta(5, 5) # Beta(a, b), with a = b = 5
obs = q.rvs(2000) # 2000 observations
hist(obs, bins=40, normed=True)
grid = np.linspace(0.01, 0.99, 100)
plot(grid, q.pdf(grid), 'k-', linewidth=2)
show()
The following plot is produced
In this code we created an rv_frozen object by the call
q = beta(5, 5)
The general syntax is
identifier = scipy.stats.distribution_name(shape_parameters)
where distribution_name is one of the distribution names in dir(scipy.stats)
The shape parameters depend on the specific distribution---use the online help
help(scipy.stats.beta)scipy.stats.beta?After that we called the methods rvs and pdf, which give random variates
and the density respectively
Other methods include cdf, moments, etc. See online help
In fact there are also two keyword arguments, loc and scale:
identifier = scipy.stats.distribution_name(shape_parameters, `loc=c`, `scale=d`)
These transform the original random variable X into
The methods rvs, pdf, cdf, etc. are transformed accordingly
Note that there is an alternative way of calling the methods described above
For example, the previous code can be replaced by
import numpy as np
from scipy.stats import beta
from matplotlib.pyplot import hist, plot, show
obs = beta.rvs(5, 5, size=2000) # 2000 observations
hist(obs, bins=40, normed=True)
grid = np.linspace(0.01, 0.99, 100)
plot(grid, beta.pdf(grid, 5, 5), 'k-', linewidth=2)
show()
There are lots of other functions and routines in scipy.stats
A useful one is scipy.stats.linregress
from scipy import stats
x = # some array like object (list, tuple, NumPy array, etc)
y = # some array like object (list, tuple, NumPy array, etc)
gradient, intercept, r_value, p_value, std_err = stats.linregress(x,y)
However, probably the best way to do statistics in Python is to use rpy2
Some background on rates of convergence
A sequence (x
For example, if λ = 1/2 then the distance between the current value and the limit is at least halved at each step
The sequence converges to x with order q
If q = 2, then called quadratic convergence
The higher is q the better the rate
A root of f on [a, b] is an x such that f(x) = 0.
How to find one?
To simplify matters, suppose that we have a function f on [a, b] which
A simple algorithm is bisection
You know the game where
This is bisection. I leave further explanation to Wikipedia
Here's an implementation of bisection for the function described above
## Filename: bisection.py
## Author: John Stachurski
def bisect(f, a, b, tol=10e-5):
lower, upper = a, b
while upper - lower > tol:
middle = 0.5 * (upper + lower)
if f(middle) > 0: # root is between lower and middle
lower, upper = lower, middle
else: # Root is between middle and upper
lower, upper = middle, upper
print 'Root:', lower
Problem:
Recall the concept of recusive function calls from this lecture
Write a recursive implementation of the bisection function
Solution:
## Filename: bisection2.py
## Author: John Stachurski
def bisect(f, a, b, tol=10e-5):
lower, upper = a, b
if upper - lower < tol:
print 'Root:', lower
else:
middle = 0.5 * (upper + lower)
print 'lower = %.2f, middle = %.2f, upper = %.2f' \
% (lower, middle, upper)
if f(middle) > 0: # root is between lower and middle
bisect(f, lower, middle)
else: # Root is between middle and upper
bisect(f, middle, upper)
Let's test this implementation
import bisection2
from scipy.stats import beta
from random import uniform
c = uniform(0, 1)
f = lambda x: beta.cdf(x, 5, 5) - c
bisection2.bisect(f, 0, 1)
Here's the output
lower = 0.00, middle = 0.50, upper = 1.00
lower = 0.00, middle = 0.25, upper = 0.50
lower = 0.25, middle = 0.38, upper = 0.50
lower = 0.38, middle = 0.44, upper = 0.50
lower = 0.44, middle = 0.47, upper = 0.50
## some lines omitted
Root: 0.448425292969
This bisection algorithm will always converge to the root in our setting
However, the rate of convergence is only linear
An alternative would be to use the Newton-Raphson method
Under certain conditions on the second derivative, convergence is quadratic
But the algorithm is less robust
For some functions it does not converge (cycles, etc.)
This illustrates a general principle
If you have specific knowledge about your function, you can use a more efficient technique
If not, then algorithm choice involves a trade-off between speed of convergence and robustness
In practice, most default algorithms for root finding, optimization and fixed points use hybrid methods
Regarding the scalar (i.e., univariate) case:
In scipy.optimize we find bisect, newton, brentq and some others
Here brentq is a hybrid method, and a good default
Let's test them
from scipy.stats import beta
from random import uniform
import bisection2
from scipy.optimize import bisect, newton, brentq
print 'Our implementation:'
c = uniform(0, 1)
f = lambda x: beta.cdf(x, 5, 5) - c
bisection2.bisect(f, 0, 1)
print 'Scipy bisect:', bisect(f, 0, 1)
print 'Scipy newton:', newton(f, 0.5) # 0.5 = initial condition
print 'Scipy brentq:', brentq(f, 0, 1)
Here's the output
Our implementation:
lower = 0.00, middle = 0.50, upper = 1.00
## Some lines omitted
Root: 0.177001953125
Scipy bisect: 0.177014592494
Scipy newton: 0.177014592494
Scipy brentq: 0.177014592494
Note that bisect, newton and brentq all give the same value
Using IPython's timeit we can get an idea of how long each one takes
In [11]: timeit bisect(f, 0, 1)
100 loops, best of 3: 8.91 ms per loop
In [12]: timeit newton(f, 0.5)
1000 loops, best of 3: 1.33 ms per loop
In [13]: timeit brentq(f, 0, 1)
100 loops, best of 3: 2.11 ms per loop
Here newton (using Newton-Raphson) is best because the function is well behaved
But brentq is not far off, and would work well for a less well behaved function
SciPy has a function for finding (scalar) fixed points too:
>>> from scipy.optimize import fixed_point
>>> fixed_point(lambda x: x**2, 10.0) # 10.0 is an initial guess
1.0
However, current algorithm is not very robust
Might be better to use a root finder such as brentq to find the fixed point
of f
>>> from scipy.optimize import brentq
>>> brentq(lambda x: x - x**2, 0.5, 10)
1.0
Use scipy.optimize.fsolve, a wrapper for a hybrid method in MINPACK
See the docstring for details
Most numerical packages provide only functions for minimization
To find the maximizer of f on D, find the minimizer of -f on D
Minimization is closely related to root finding
As in root finding, nice properties of the function permit use of faster methods
But often we don't know much about the function
In this case it's best to use hybrid methods
For constrained, univariate (i.e., scalar) minimization, a good option is
fminbound
>>> from scipy.optimize import fminbound
>>> fminbound(lambda x: x**2, -1, 2) # Search in [-1, 2]
0.0
Uses Brent's hybrid method
Finds local optima, not global
Multivariate local optimizers: fmin, fmin_powell, fmin_cg, fmin_bfgs, fmin_ncg
Constrained multivariate local optimizers: fmin_l_bfgs_b, fmin_tnc, fmin_cobyla
See the docstrings for details
Most numerical methods compute integral of approximating polynomial
Error depends on quality of fit (of polynomial to integrand)
The relevant module is scipy.integrate
A good default for univariate integration is quad
>>> from scipy.integrate import quad
>>> quad(lambda x: x**2, 0, 1) # Integrate f(x) = x**2 over [0, 1]
(0.33333333333333331, 3.7007434154171879e-15) # Integral, estimate of error
Interface to function in Fortran library QUADPACK
Uses Clenshaw-Curtis quadrature
Other univariate options:
quadrature, romberg (Gaussian quadrature, Romberg integration)Multivariate options
dblquad, tplquadIn high dimensions, use Monte Carlo if possible
We saw that NumPy provides a module for linear algebra called linalg
SciPy also provides a module for linear algebra with the same name
Personally I just use numpy.linalg