SciPy
SciPy provides functions and other objects that work with NumPy arrays:- optimization, numerical integration, statistics, etc.
Currently being actively developed
SciPy versus NumPy
When we import SciPy, we also get NumPyHere's the relevant lines of the SciPy initialization file:
# Import numpy symbols to scipy name space
from numpy import *
from numpy.random import rand, randn
from numpy.fft import fft, ifft
from numpy.lib.scimath import *
# Remove the linalg imported from numpy so that the scipy.linalg package can be
# imported.
del linalg
scipy.optimize
,scipy.integrate
, etc.
scipy
namespaceMeans that you have to import them separately
import scipy.optimize
from scipy.integrate import quad
Which to use, NumPy or SciPy?
SciPy includes NumPy, so you can just import SciPyBut personally I like NumPy because it is small and stable
I usually start my programs with
import numpy as np
from scipy.integrate import quad
from scipy.optimize import brentq
# etc
Statistics
Thescipy.stats
package supplies- numerous random variable objects
- densities, cumulative distributions, random sampling, etc.
- some estimation procedures
- some statistical tests
Random Variables and Distributions
Recall thatnumpy.random
provides functions for generating random variables>>> a = 5
>>> b = 5
>>> np.random.beta(a, b, size=3)
array([ 0.23607443, 0.90941746, 0.49067643])
However, sometimes we need access to the density itself, or the cdf, the quantiles, etc.
For this we can use
scipy.stats
Here's an example
import numpy as np
from scipy.stats import beta
from matplotlib.pyplot import hist, plot, show
q = beta(5, 5) # Beta(a, b), with a = b = 5
obs = q.rvs(2000) # 2000 observations
hist(obs, bins=40, normed=True)
grid = np.linspace(0.01, 0.99, 100)
plot(grid, q.pdf(grid), 'k-', linewidth=2)
show()
In this code we created an
rv_frozen
object by the callq = beta(5, 5)
identifier = scipy.stats.distribution_name(shape_parameters)
distribution_name
is one of the distribution names in dir(scipy.stats)
The shape parameters depend on the specific distribution---use the online help
- You do know how to do this by now, don't you?
- In a Python shell, try
help(scipy.stats.beta)
- In IPython, try
scipy.stats.beta?
- In a Python shell, try
rvs
and pdf
, which give random variates and the density respectivelyOther methods include
cdf
, moments
, etc. See online helpIn fact there are also two keyword arguments,
loc
and scale
:identifier = scipy.stats.distribution_name(shape_parameters, `loc=c`, `scale=d`)
The methods
rvs
, pdf
, cdf
, etc. are transformed accordinglyNote that there is an alternative way of calling the methods described above
For example, the previous code can be replaced by
import numpy as np
from scipy.stats import beta
from matplotlib.pyplot import hist, plot, show
obs = beta.rvs(5, 5, size=2000) # 2000 observations
hist(obs, bins=40, normed=True)
grid = np.linspace(0.01, 0.99, 100)
plot(grid, beta.pdf(grid, 5, 5), 'k-', linewidth=2)
show()
Other Goodies in scipy.stats
There are lots of other functions and routines inscipy.stats
A useful one is
scipy.stats.linregress
from scipy import stats
x = # some array like object (list, tuple, NumPy array, etc)
y = # some array like object (list, tuple, NumPy array, etc)
gradient, intercept, r_value, p_value, std_err = stats.linregress(x,y)
- An interface to R from Python
- Allows you to call R functions from a Python script
Roots and Fixed Points
Some background on rates of convergenceA sequence (x
For example, if λ = 1/2 then the distance between the current value and the limit is at least halved at each step
The sequence converges to x with order q
If q = 2, then called quadratic convergence
The higher is q the better the rate
Case Study: Root finding
A root of f on [a, b] is an x such that f(x) = 0.How to find one?
To simplify matters, suppose that we have a function f on [a, b] which
- is continuous
- has one and only one root, and, moreover,
- satisfies f(a) < 0, while f(b) > 0
Bisection
A simple algorithm is bisectionYou know the game where
- I think of a number between 1 and 100
- You say, is it less than 50?
- If yes, is it less than 25?
- If no, then is it less than 75?
- And so on
Here's an implementation of bisection for the function described above
## Filename: bisection.py
## Author: John Stachurski
def bisect(f, a, b, tol=10e-5):
lower, upper = a, b
while upper - lower > tol:
middle = 0.5 * (upper + lower)
if f(middle) > 0: # root is between lower and middle
lower, upper = lower, middle
else: # Root is between middle and upper
lower, upper = middle, upper
print 'Root:', lower
Recall the concept of recusive function calls from this lecture
Write a recursive implementation of the bisection function
Solution:
## Filename: bisection2.py
## Author: John Stachurski
def bisect(f, a, b, tol=10e-5):
lower, upper = a, b
if upper - lower < tol:
print 'Root:', lower
else:
middle = 0.5 * (upper + lower)
print 'lower = %.2f, middle = %.2f, upper = %.2f' \
% (lower, middle, upper)
if f(middle) > 0: # root is between lower and middle
bisect(f, lower, middle)
else: # Root is between middle and upper
bisect(f, middle, upper)
import bisection2
from scipy.stats import beta
from random import uniform
c = uniform(0, 1)
f = lambda x: beta.cdf(x, 5, 5) - c
bisection2.bisect(f, 0, 1)
lower = 0.00, middle = 0.50, upper = 1.00
lower = 0.00, middle = 0.25, upper = 0.50
lower = 0.25, middle = 0.38, upper = 0.50
lower = 0.38, middle = 0.44, upper = 0.50
lower = 0.44, middle = 0.47, upper = 0.50
## some lines omitted
Root: 0.448425292969
Analysis
This bisection algorithm will always converge to the root in our settingHowever, the rate of convergence is only linear
An alternative would be to use the Newton-Raphson method
Under certain conditions on the second derivative, convergence is quadratic
But the algorithm is less robust
For some functions it does not converge (cycles, etc.)
This illustrates a general principle
- If you have specific knowledge about your function, you can use a more efficient technique
- If not, then algorithm choice involves a trade-off between speed of convergence and robustness
- Combines a fast method with a robust method
- Attempts to use a fast method
- Uses some diagnostics
- If diagnostics are bad, then switches to robust algorithm
Roots and Fixed Points in Scipy
Regarding the scalar (i.e., univariate) case:In
scipy.optimize
we find bisect
, newton
, brentq
and some othersHere
brentq
is a hybrid method, and a good defaultLet's test them
from scipy.stats import beta
from random import uniform
import bisection2
from scipy.optimize import bisect, newton, brentq
print 'Our implementation:'
c = uniform(0, 1)
f = lambda x: beta.cdf(x, 5, 5) - c
bisection2.bisect(f, 0, 1)
print 'Scipy bisect:', bisect(f, 0, 1)
print 'Scipy newton:', newton(f, 0.5) # 0.5 = initial condition
print 'Scipy brentq:', brentq(f, 0, 1)
Our implementation:
lower = 0.00, middle = 0.50, upper = 1.00
## Some lines omitted
Root: 0.177001953125
Scipy bisect: 0.177014592494
Scipy newton: 0.177014592494
Scipy brentq: 0.177014592494
bisect
, newton
and brentq
all give the same valueUsing IPython's
timeit
we can get an idea of how long each one takesIn [11]: timeit bisect(f, 0, 1)
100 loops, best of 3: 8.91 ms per loop
In [12]: timeit newton(f, 0.5)
1000 loops, best of 3: 1.33 ms per loop
In [13]: timeit brentq(f, 0, 1)
100 loops, best of 3: 2.11 ms per loop
newton
(using Newton-Raphson) is best because the function is well behavedBut
brentq
is not far off, and would work well for a less well behaved functionFixed Points
SciPy has a function for finding (scalar) fixed points too:>>> from scipy.optimize import fixed_point
>>> fixed_point(lambda x: x**2, 10.0) # 10.0 is an initial guess
1.0
Might be better to use a root finder such as
brentq
to find the fixed point of f- Search for the root of g(x) := x - f(x)
>>> from scipy.optimize import brentq
>>> brentq(lambda x: x - x**2, 0.5, 10)
1.0
Multivariate Root Finding
Usescipy.optimize.fsolve
, a wrapper for a hybrid method in MINPACKSee the docstring for details
Optimization
Most numerical packages provide only functions for minimizationTo find the maximizer of f on D, find the minimizer of -f on D
Minimization is closely related to root finding
- Newton's method searches for root of the first derivative
- Many variations on this technique
- Gradient decent is another approach using derivatives
But often we don't know much about the function
In this case it's best to use hybrid methods
- Robust, and fast when diagnostics are good
Univariate Optimization
For constrained, univariate (i.e., scalar) minimization, a good option isfminbound
>>> from scipy.optimize import fminbound
>>> fminbound(lambda x: x**2, -1, 2) # Search in [-1, 2]
0.0
Finds local optima, not global
Multivariate Optimization
Multivariate local optimizers:fmin
, fmin_powell
, fmin_cg
, fmin_bfgs
, fmin_ncg
Constrained multivariate local optimizers:
fmin_l_bfgs_b
, fmin_tnc
, fmin_cobyla
See the docstrings for details
Integration
Most numerical methods compute integral of approximating polynomialError depends on quality of fit (of polynomial to integrand)
The relevant module is
scipy.integrate
A good default for univariate integration is
quad
>>> from scipy.integrate import quad
>>> quad(lambda x: x**2, 0, 1) # Integrate f(x) = x**2 over [0, 1]
(0.33333333333333331, 3.7007434154171879e-15) # Integral, estimate of error
Uses Clenshaw-Curtis quadrature
- Based on expansion in terms of Chebychev polynomials
- Accuracy similar to Gaussian quadrature
- But often runs a bit faster
quadrature
,romberg
(Gaussian quadrature, Romberg integration)
dblquad
,tplquad
Linear Algebra
We saw that NumPy provides a module for linear algebra calledlinalg
SciPy also provides a module for linear algebra with the same name
Personally I just use
numpy.linalg
0 comments:
Post a Comment